From brian_osborne at cognia.com Sat Jan 1 11:18:08 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Sat Jan 1 11:15:04 2005 Subject: [Bioperl-l] Bioperl in 2005 In-Reply-To: <001201c4ee95$8f739130$6400a8c0@GOLHARMOBILE1> Message-ID: Ryan, You could post it to bioperl-l, some one will commit it to CVS. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Ryan Golhar Sent: Thursday, December 30, 2004 12:33 PM To: 'Jason Stajich'; 'Bioperl List' Subject: RE: [Bioperl-l] Bioperl in 2005 Hi all, I'd like to contribute a parser module to parse Spidey results. I took the sim4 parser and modified a little bit to properly read in spidey results. Everything else about it works the same as the sim4 parser as far as I can tell. How can I contribute this module? ----- Ryan Golhar Computational Biologist The Informatics Institute at The University of Medicine & Dentistry of NJ Phone: 973-972-5034 Fax: 973-972-7412 Email: golharam@umdnj.edu -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Jason Stajich Sent: Wednesday, December 29, 2004 5:46 PM To: Bioperl List; bioperl-announce-l@bioperl.org Subject: [Bioperl-l] Bioperl in 2005 I just wanted to use the end of the year as a chance to reflect on what we've accomplished in 2004 and think about what 2005 holds for Bioperl. What happened in 2004? First of all, this year has been really has been productive at a level perhaps only appreciated by the folks who read the bioperl-guts-l list which lists the CVS commits. New modules, bugfixes and code improvements have been steadily making their way into the codebase. Not only has there been lots of traffic, but more people are contributing code and fixes. We have also seen increased contributions to the HOWTOs which we hope will be an effective place to explain how to use sets of modules to complete a particular task. We are continually working to improve the documentation. This is a balance between a developer trying to get something accomplished for their own research and wanting other people to use their code (and not wanting to field lots of emails about a particular module). Open source software written solely by volunteers suffers from a reward system which values code over documentation and writing tutorials. We welcome ideas on changes which would help this and are currently thinking about ways to reward the productive documenters as well as coders. We had a chance to have a 5 day Bootcamp in June thanks to Sylvain Foisy, the University of Montreal and the Quebec Bioinformatics Network (BioneQ). We hope to do another one of these in 2006. If there is a general interest in more widespread Bioperl tutorials please forward them to myself or the bioperl list and we can consider how something like this could be organized in conjunction with a conference or meeting. How popular is Bioperl? The 2002 paper has 60+ citations according to Web of Science and we're seeing use in a broader context than just sequence analysis. At least one published paper about modules which were already part of the codebase has appeared suggesting software availability and collaboration can happen prior to publication. The website has been consistently gets around 300,000 hits per month which isn't bad considering that the content doesn't change very much and this is just a site for one toolkit for specific aspect of science. The bioperl-l mailing list has seen an average 341 mails per month (not correcting for spam) which has seen a lot of questions answered and ideas hashed out. How can you help out? I want to use this chance to also appeal to those who use Bioperl and have been sitting on your hands waiting to jump in. It is a collaborative project that only works if new people jump in an contribute ideas and manpower. We've had many examples of people who have just jumped on board the project, fixed some bugs, contributed a module and went on their merry way. We've also had other people who have jumped in, contributed code, and found themselves fully engaged in the project and its internal workings almost immediately. Not to wax poetic, but it was about 5 years ago that fresh out of college, I started reading the mailing list, read Steve Chervitz's email plea for people to "ask not what Bioperl can do for you, ask what you can do for Bioperl" (http://bioperl.org/pipermail/bioperl-l/1999-December/003354.html) and just jumped right in. I can only hope to influence some more folks who might have wanted to contribute but were waiting for the invitation. Well come on over, we'd love to have you taking part. As for some specifics. - Parsing of Species information out from the ORGANISM lines in SwissProt, GenBank, and EMBL is pretty spotty and could take some work. - Some more parsers for formats that people have asked for - a Spidey parser (NCBI's mRNA -> genomic alignment tool) - Work on the Structure modules for dealing with protein structure data - Integrate new applications into bioperl-run and further cleanup the existing modules so they are more consistent - Volunteer to be the next release master. What does the future hold for Bioperl? We expect to have a 1.5 release of bioperl in 1st quarter of 2005 - this is the domain of Aaron Mackey who agreed to be the release master (who has his hands full right now, but I'm sure will ask for help when he needs it). This should incorporate many new modules and bug fixes but be compatible with the 1.4 API as well. Details on the schedule for 1.5 sometime after the holidays. The future depends entirely on who steps up to work on the project next year. In 2005, I am resolving to limit myself from the front guard of mailing list question answering. This is in part finish my PhD research and focus on building more specific tools to support my research questions, but also it is time for other people to contribute and share the spotlight and be a know-it-all. Bioperl is very much a labor of love and it is an integral part of the tools I use in my own work so I expect to focus more directly on those things I need in the coming year and help out where I can. My hope is that some of the new folks who have stepped up to contribute will help by continuing the course we have set to have high quality releases, a full test suite, POD documentation for every module, and overall documentation for using modules in HOWTOs and tutorials. If there are new or unexplored areas the project should consider I hope that you will speak up and suggest them. There is discussion underfoot that a new Bioperl object model may be born. This has been called Bioperl2 and Bioperl-NG. The idea is it would try and create a leaner and cleaner code base which is does things like event-based parsing, autogenerated code for things like getters/setters, and could do things faster and easier than we are currently. Generally there is a lot of legacy code and legacy design in Bioperl and it would be beneficial to have a project that was free of these constraints. At the same time there is an expectation that a project like this would also need to achieve something more than what the current bioperl API cannot do so it incumbent on the new project to have goals that are higher than what Bioperl can do. Thank you I'd like to finally thank some people who have done a lot this year. Of course I'm not going to remember to name everyone, but I just wanted to highlight some folks who have endeavored not only get the toolkit to do what they want, but also to help out other people get started with it. The people who have kept the project going. These are usual suspects how have labored to do the dirty grunt work cleaning up boring bugs, adding documentation, preparing a release, keeping the servers going, etc. They also code too, but wanted to highlight that they have really been critical to keeping the project going by doing the things that most people don't want to bother with. Brian Osborne Aaron Mackey Chris Dagdigian Kyle Jenson (mailing list and site searching at http://search.open-bio.org) Some usual suspects who have been helping maintain their modules and generally being Bioperl knowledgeable on the list: Scott Cain Steve Chervitz Allen Day Donald Jackson Stefan Kirov Hilmar Lapp Josh Lauricha Heikki Lehvaslaiho Chris Mungall Jurgen Plentinckx Lincon Stein There are new several people who have taken up the slack as those before them have drifted onto other commitments. (metaphoric slack of course, not trying to accuse anyone of being a 'slacker'). Thanks for jumping in, fixing bugs, running tests, giving feedback, and just getting involved. It is really encouraging when the project can be a 2-way street and not just a one way flow information going out from a few people who post answers to the list. Richard Adams Sean Davis Rob Edwards Nathan Haigh Marc Logghe Barry Moore Remo Sanges James Thompson Koen van der Drift (Bioperl available via fink on OS X) Thanks also to Peter van Heusden and Electric Genetics which are undertaking a code audit of Bioperl and should have many helpful feedback points for us. I've probably forgotten some people, please post a followup if I have neglected someone as I would like you to be recognized for your work since we don't give out a whole lot else right now. A safe and prosperous New Year to you all. Jason Stajich on behalf of the Bioperl core developers. -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From tex at biocompute.net Sat Jan 1 02:32:37 2005 From: tex at biocompute.net (James Thompson) Date: Sat Jan 1 17:40:43 2005 Subject: [Bioperl-l] Re: Questions about Bio::AlignIO::maf In-Reply-To: <003001c4ee13$b28a8bb0$7347d90a@imcb.astar.edu.sg> Message-ID: Alison, You're right on this, I just committed the fix to maf.pm. This also fixed a range problem in the AlignIO.t test script and the associated humor.maf test data file, I just committed fixes for those as well. Thanks for the fix. :) Cheers, James Thompson On Thu, 30 Dec 2004, Lee Ping Alison wrote: > Hi Mr Thompson, > > Thanks for the reply. I understand the need for the one-based inclusive > coordinate system now; also partly because the major genome browsers use > that. However, since you're using inclusive coords, then shouldn't you add 1 > to $start first before calculating $end, since $start is zero-based? > > Alison. > > ----- Original Message ----- > From: "James Thompson" > To: "Lee Ping Alison" > Cc: "Allen Day" ; "Bioperl" > Sent: Wednesday, December 29, 2004 3:30 PM > Subject: Re: [Bioperl-l] Re: Questions about Bio::AlignIO::maf > > > > Alison (and Allen), > > > > I was the aforementioned bug fixer. :) > > > > Sorry if there's any confusion on this, but AFAIK Bioperl uses an > one-based > > inclusive coordinate system. While maf may have its own opinions on the > best > > way to do coordinates, maf is only one of the formats that are supported > by > > Bio::AlignIO. The consensus in Bioperl appears to be that it makes more > sense > > to use one consistent coordinate system within all of the modules rather > than > > catering to the opinions and idiosyncrasies of all of the possible file > > formats. If we did not fix the off-by-one bug in maf.pm, then would be > > consistency issues with Bio::Align::AlignI objects created from different > file > > formats. > > > > Here's a link to a message from the mailing list that seems relevant to > the > > topic at hand: > > > > http://bioperl.org/pipermail/bioperl-l/2002-June/008309.html > > > > Cheers, > > > > James Thompson > > > > On Wed, 29 Dec 2004, Lee Ping Alison wrote: > > > > > Hi, > > > > > > Mr Day, thanks a lot for helping me with my queries. > > > > > > I've just obtained the most recent bioperl-live code via cvs with the > bug > > > fixes you've mentioned. I'm wondering why the off-by-one bug fix (end = > > > start+size-1) was necessary. I'm thinking that "end = start+size" is > correct. > > > Because the MAF file format by UCSC states that coordinates are > half-open, > > > zero-based. And I have understood it as the coordinates in "maf" module > > > should be (start, end] (start exclusive, end inclusive). I've also tried > > > several coordinates that agree with UCSC Genome Browser which uses > [start, > > > end]. Hence, in my opinion the bug fix was not necessary. > > > > > > Will someone please enlighten me on this? > > > > > > Thank you very much! > > > > > > Alison. > > > > > > ----- Original Message ----- > > > From: Allen Day > > > To: Lee Ping Alison > > > Cc: Bioperl > > > Sent: 29 December, 2004 3:34 PM > > > Subject: Re: Questions about Bio::AlignIO::maf > > > > > > > > > Hi Alison, > > > > > > I did not add strand information as I didn't need it at the time of > > > writing. However, I believe this has come up on list recently and > someone > > > has already patched in strand support, as well as an off-by-one bug in > my > > > code. Can whoever did these patches recently pipe in? Thanks. > > > > > > Alison, please keep the bioperl list CCed in your reply. > > > > > > -Allen > > > > > > On Wed, 29 Dec 2004, Lee Ping Alison wrote: > > > > > > > Dear Mr Day, > > > > > > > > While reading the Bioperl 1.4 documentation for the > "Bio::AlignIO::maf" module, I found your email address and I have some > questions about how to use "maf." > > > > > > > > Am I right to say that the strand information of each sequence in an > "maf" file is not recorded, when the LocateableSeq object is created in the > nextAln() method? I observed that $strand was not one of the arguments in > the call to the constructor. > > > > > > > > If yes, what is the reason for not using the strand information? And > subsequently, if I need to retrieve the strand information, how should I go > about it? > > > > > > > > Thank you very much for answering my queries. > > > > > > > > Best Regards, > > > > Alison > > > > (Institute of Molecular and Cell Biology, Singapore) > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From rj144331 at bcm.tmc.edu Sun Jan 2 17:56:06 2005 From: rj144331 at bcm.tmc.edu (rj144331) Date: Sun Jan 2 17:52:51 2005 Subject: [Bioperl-l] Extract sequences from .msf Message-ID: <41D671E1@webmail.bcm.tmc.edu> Hi, I am a second year graduate student in Baylor College of Medicine, Houston, TX majoring in bioinformatics. I would like to know how to extract protein sequences and store them in fasta format from a html page containing the multiple sequence alignment using perl. Any help would be appreciated. Thanks, regards, Rupashree Jayashankar From hlapp at gnf.org Sun Jan 2 19:28:52 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Sun Jan 2 19:25:37 2005 Subject: [Bioperl-l] score in seqfeature Message-ID: <71B3DC9D-5D1E-11D9-827C-000A959EB4C4@gnf.org> Allen et al, what are the (GFF3-driven?) plans for storing the score property introduced by SeqFeature::Generic? The reason I'm asking is that it doesn't get (de-)serialized in bioperl-db because it's neither defined on SeqFeatureI nor has it been internal stored as a tag/value pair. I'd like to fix this issue, either by pulling it into the annotation bundle in SeqFeature::AnnotationAdapter, or by some other means that maybe is friendlier or more useful to GFF3 minds. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From rasa at obj.hopto.org Mon Jan 3 09:41:10 2005 From: rasa at obj.hopto.org (Rasa Gulbinaite) Date: Mon Jan 3 09:38:56 2005 Subject: [Bioperl-l] Fasta headers Message-ID: <1267.81.7.113.128.1104763270.squirrel@81.7.113.128> Hello, i'm new to bioperl and a bit confused with fasta file headers. I'm working with SNPs and i would like to get only the fasta headers form the fasta file, not the sequences. What would be the best way to do this? Thank you. Rasa From venkat at calmail.berkeley.edu Mon Jan 3 09:23:18 2005 From: venkat at calmail.berkeley.edu (Venky Nandagopal) Date: Mon Jan 3 09:53:49 2005 Subject: [Bioperl-l] Bio::DB::Fasta errors Message-ID: I've been noticing some errors with Bio::DB::Fasta indices. Working with different assemblies of a genome, I've been creating symlinks latest/ to the latest assembly directory, and genome.fasta to the latest assembly fasta file. When I pass latest/genome.fasta to Bio::DB::Fasta, I get a genome.fasta.index file, and retrieval works in my script. But then when I run a different analysis on it, or access the same file after a while, I get undef for sequences I know for sure to be in the database. Reindexing will fix the problem. I'm not certain if this is simply due to the symlinks, or a more general issue with Bio::DB::Fasta. Does anyone have suggestions? Venky -- ___ Venky Nandagopal Graduate Student Eisen Lab UC Berkeley From birney at ebi.ac.uk Mon Jan 3 09:59:13 2005 From: birney at ebi.ac.uk (Ewan Birney) Date: Mon Jan 3 09:55:53 2005 Subject: [Bioperl-l] Fasta headers In-Reply-To: <1267.81.7.113.128.1104763270.squirrel@81.7.113.128> Message-ID: On Mon, 3 Jan 2005, Rasa Gulbinaite wrote: > Hello, > > i'm new to bioperl and a bit confused with fasta file headers. I'm working > with SNPs and i would like to get only the fasta headers form the fasta > file, not the sequences. What would be the best way to do this? Thank you. The desc() method on a sequence object has this - eg: $seqin = Bio::SeqIO->new( -file => 'my_filename' , -format => 'fasta'); while( ($seq = $seqin->next_seq()) ) { $header = $seq->desc(); } > > Rasa > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From sdavis2 at mail.nih.gov Mon Jan 3 10:27:05 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Mon Jan 3 10:24:15 2005 Subject: [Bioperl-l] Fasta headers References: <1267.81.7.113.128.1104763270.squirrel@81.7.113.128> Message-ID: <000901c4f1a8$ae4ba7d0$7d75f345@WATSON> Rasa, You can parse the fasta file with seqio, but if you only want the headers "as-is", something like this from the command line might do: cat fastafile.fa > perl -e 'while (<>) {print "$_\n" if ($_ =~ /^>/)}' Sorry, I didn't test this.... Sean ----- Original Message ----- From: "Rasa Gulbinaite" To: Sent: Monday, January 03, 2005 9:41 AM Subject: [Bioperl-l] Fasta headers > Hello, > > i'm new to bioperl and a bit confused with fasta file headers. I'm working > with SNPs and i would like to get only the fasta headers form the fasta > file, not the sequences. What would be the best way to do this? Thank you. > > Rasa > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From rousse at ccr.jussieu.fr Mon Jan 3 10:53:38 2005 From: rousse at ccr.jussieu.fr (Guillaume Rousse) Date: Mon Jan 3 10:50:14 2005 Subject: [Bioperl-l] Need help for implementing a new TreeIO module Message-ID: <41D96A82.8070202@ccr.jussieu.fr> I'm trying to implement a Bio::TreeIO module for parsing Alogrithm::Cluster::treecluster output, and I need some help about the tree event builder module. Reading the code, I understand I can add elements of type 'tree', 'branch-length', 'id', 'node', and 'leaf', and also add characters. However, I don't really understand how it works... Basically, I know all leaves right from given parameters. Then I parse a result table, line by line, each new line being an internal node whose length is given. So I guess, the code should be similar to: $self->_eventHandler->start_document; $self->_eventHandler->start_element( {'Name' => 'tree'} ); # leaves foreach my $label (@{$self->{_labels}} { $self->_eventHandler->start_element( {'Name' => 'leaf'} ); $self->_eventHandler->characters($label); $self->_eventHandler->end_element( {'Name' => 'leaf'} ); } # nodes foreach my $line (@{$self->{_result}} { $self->_eventHandler->start_element( {'Name' => 'node'} ); # this node result from the merge of two already existing leaves or nodes with a known distance $self->_eventHandler->end_element( {'Name' => 'node'} ); } $self->_eventHandler->end_element( {'Name' => 'tree'} ); my $tree = $self->_eventHandler->end_document; Any help appreciated. -- Any circuit design must contain at least one part which is obsolete, two parts which are unobtainable and three parts which are still under development -- Murphy's Laws on Technology n?23 From rousse at ccr.jussieu.fr Mon Jan 3 10:59:43 2005 From: rousse at ccr.jussieu.fr (Guillaume Rousse) Date: Mon Jan 3 10:56:22 2005 Subject: [Bioperl-l] Installing bioperl-ext-1.4 In-Reply-To: <1102911185.41bd16d165145@webmail2.ec.auckland.ac.nz> References: <200412102127.iBALPiKu021926@portal.open-bio.org> <1102911185.41bd16d165145@webmail2.ec.auckland.ac.nz> Message-ID: <41D96BEF.9020601@ccr.jussieu.fr> bcur001@ec.auckland.ac.nz wrote: > I am wanting to run code to do smith-waterman alignment. From what I can see, I > need the EMBOSS suite, which appears to come as part of bioperl-ext-1.4. > > I have installed bioperl-1.4 fine. when I attempt to install bioperl-ext-1.4 > however, I encounter problems. I've worked my way through a few initial errors, > finding and installing the staden library and the Inline pm (both of which > appear to ahve installed fine), I have, however, finally been stumped. Upon > attempting to run `perl Makefile.PL` from the bioperl-ext-1.4/ directory, I get > the following: > > Writing Makefile for Bio::Ext::Align > Found Staden io_lib "libread" in /usr/local/lib ... > Automatically using the Read.h found in /usr/local/include/io_lib ... > Writing Makefile for Bio::SeqIO::staden::read > Writing Makefile for Bio > One or more DATA sections were not processed by Inline. Sorry, I missed this post. Unless you have really good reasons to do so, you'd better use official contrib packages for EMBOSS, io_lib and bioperl (I'm the maintainer) instead of attempting manual installs. EMBOSS is installed with better defaults as by default installation script, io_lib is patched for wrong headers, and Bioperl has every needed dependencies packaged. Only bioperl-ext is missing, because I never succedeed building it due to problem in Makemaker::Inline. Just try: urpmi emboss perl-Bioperl-Run libio_lib1-devel To have everything installed with needed dependancies. (forget my initial private mail, it was send too early) -- A bad dinner with your wife is worth more than a good one in the company of your mother-in-law. -- A law for married men From jason.stajich at duke.edu Mon Jan 3 11:13:34 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Jan 3 11:10:53 2005 Subject: [Bioperl-l] Need help for implementing a new TreeIO module In-Reply-To: <41D96A82.8070202@ccr.jussieu.fr> References: <41D96A82.8070202@ccr.jussieu.fr> Message-ID: <6ACDABCF-5DA2-11D9-B9B4-000393C44276@duke.edu> Guillaume - Ironic - I was just starting to download a bunch of jpackage stuff and seeing your name everywhere.... Trying to get FOP working so can try and build our docbook HOWTOs on linux. The thing is you need to build the tree by connecting the nodes, so the order they are created in is very important. You can't just build the leaves first and then the (internal) nodes later. You need to build from the top down - if you read a newick format from left to right, that is exactly how we are building the tree up using the EventListener. In a way the builder basically assumes you have already have the tree built, just encoded. So you start with a root node, you add children. For each child you add more children where appropriate until you get to a leaf node and you are done with that recursion. -jason On Jan 3, 2005, at 10:53 AM, Guillaume Rousse wrote: > I'm trying to implement a Bio::TreeIO module for parsing > Alogrithm::Cluster::treecluster output, and I need some help about the > tree event builder module. Reading the code, I understand I can add > elements of type 'tree', 'branch-length', 'id', 'node', and 'leaf', > and also add characters. However, I don't really understand how it > works... > > Basically, I know all leaves right from given parameters. Then I parse > a result table, line by line, each new line being an internal node > whose length is given. So I guess, the code should be similar to: > > $self->_eventHandler->start_document; > $self->_eventHandler->start_element( {'Name' => 'tree'} ); > > # leaves > foreach my $label (@{$self->{_labels}} { > $self->_eventHandler->start_element( {'Name' => 'leaf'} ); > $self->_eventHandler->characters($label); > $self->_eventHandler->end_element( {'Name' => 'leaf'} ); > } > > # nodes > foreach my $line (@{$self->{_result}} { > $self->_eventHandler->start_element( {'Name' => 'node'} ); > # this node result from the merge of two already existing leaves > or nodes with a known distance > $self->_eventHandler->end_element( {'Name' => 'node'} ); > } > > $self->_eventHandler->end_element( {'Name' => 'tree'} ); > my $tree = $self->_eventHandler->end_document; > > Any help appreciated. > -- > Any circuit design must contain at least one part which is obsolete, > two parts which are unobtainable and three parts which are still under > development > -- Murphy's Laws on Technology n?23 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From cain at cshl.org Mon Jan 3 11:34:17 2005 From: cain at cshl.org (Scott Cain) Date: Mon Jan 3 11:31:14 2005 Subject: [Bioperl-l] Re: Bioperl-l Digest, Vol 21, Issue 1 In-Reply-To: <200501031551.j03FpdKs019038@portal.open-bio.org> References: <200501031551.j03FpdKs019038@portal.open-bio.org> Message-ID: <1104770057.3258.27.camel@localhost.localdomain> Hi Hilmar, SeqFeature::Annotated (which is what FeatureIO::gff is using) has a score method that stores the score as a Annotation::SimpleValue, which means it is used like this: my $score = $feature->score->value; which seems to work well for GFF3 (since scores are by definition a single value in GFF3). Scott On Mon, 2005-01-03 at 10:51 -0500, bioperl-l-request@portal.open-bio.org wrote: > Date: Sun, 2 Jan 2005 16:28:52 -0800 > From: Hilmar Lapp > Subject: [Bioperl-l] score in seqfeature > To: Allen Day , Bioperl > Message-ID: <71B3DC9D-5D1E-11D9-827C-000A959EB4C4@gnf.org> > Content-Type: text/plain; charset=US-ASCII; format=flowed > > Allen et al, what are the (GFF3-driven?) plans for storing the score > property introduced by SeqFeature::Generic? > > The reason I'm asking is that it doesn't get (de-)serialized in > bioperl-db because it's neither defined on SeqFeatureI nor has it been > internal stored as a tag/value pair. I'd like to fix this issue, either > by pulling it into the annotation bundle in > SeqFeature::AnnotationAdapter, or by some other means that maybe is > friendlier or more useful to GFF3 minds. > > -hilmar > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From rousse at ccr.jussieu.fr Mon Jan 3 11:38:28 2005 From: rousse at ccr.jussieu.fr (Guillaume Rousse) Date: Mon Jan 3 11:35:05 2005 Subject: [Bioperl-l] Need help for implementing a new TreeIO module In-Reply-To: <6ACDABCF-5DA2-11D9-B9B4-000393C44276@duke.edu> References: <41D96A82.8070202@ccr.jussieu.fr> <6ACDABCF-5DA2-11D9-B9B4-000393C44276@duke.edu> Message-ID: <41D97504.7020808@ccr.jussieu.fr> Jason Stajich wrote: > Guillaume - > > Ironic - I was just starting to download a bunch of jpackage stuff and > seeing your name everywhere.... Trying to get FOP working so can try > and build our docbook HOWTOs on linux. Funny :) Actually, I'm not involved anymore in jpackage project, I left Java since I discovered perl two years ago. If you need help, you'd better ask on the mailing lists, even if they are currently down due to migration problems on the boxes hosting the project. > The thing is you need to build the tree by connecting the nodes, so the > order they are created in is very important. You can't just build the > leaves first and then the (internal) nodes later. You need to build > from the top down - if you read a newick format from left to right, that > is exactly how we are building the tree up using the EventListener. > > In a way the builder basically assumes you have already have the tree > built, just encoded. So you start with a root node, you add children. > For each child you add more children where appropriate until you get to > a leaf node and you are done with that recursion. OK, thanks for the explanations. However, I don't understand how to add branch length informations. I guess leave labels are just introduced using characters() method, right ? -- You aren't Superman -- Murphy's Bush Fire Brigade Laws n?22 From brian_osborne at cognia.com Mon Jan 3 11:44:39 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Mon Jan 3 11:41:26 2005 Subject: [Bioperl-l] Fasta headers In-Reply-To: <1267.81.7.113.128.1104763270.squirrel@81.7.113.128> Message-ID: Rasa, On Unix: >grep '>' fasta-file Anywhere, with Perl: >perl -ne 'print if /^>/' fasta-file Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Rasa Gulbinaite Sent: Monday, January 03, 2005 9:41 AM To: bioperl-l@portal.open-bio.org Subject: [Bioperl-l] Fasta headers Hello, i'm new to bioperl and a bit confused with fasta file headers. I'm working with SNPs and i would like to get only the fasta headers form the fasta file, not the sequences. What would be the best way to do this? Thank you. Rasa _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From davidg at lsi.upc.edu Mon Jan 3 12:00:21 2005 From: davidg at lsi.upc.edu (=?iso-8859-1?Q?David_Garc=EDa_Cort=E9s?=) Date: Mon Jan 3 11:57:09 2005 Subject: [Bioperl-l] Problems parsing Accesion number in FASTA format. Message-ID: <001201c4f1b5$b9b46b40$cf1e5393@Davidg> Hello. I have the "nr" database in FASTA format (downloaded from NCBI website), and i want to retrieve the accession number of each sequence in that database, so I do the following: my $seqsfich = Bio::SeqIO->new(-file=>"nr.fa", '-format' => 'Fasta'); while (my $seq = $seqsfich->next_seq()) { print STDOUT "Sequence accession number: ", $seq->accession, "\n"; } But the results I get are: Sequence accession number: unknown Sequence accession number: unknown Sequence accession number: unknown Sequence accession number: unknown etc... Here you can see a fragment of the "nr.fa" file : >gi|2695847|emb|CAA73704.1| immunoglobulin heavy chain [Acipenser baerii] MGILTALCIIMTALSSVRSDVVLTESGPAVIKPGESHKLSCKASGFTFSSAYMSWVRQAPGKGLEWVAYIYSGGSSTYYA QSVQGRFAISRDDSNSMLYLQMNSLKTEDTAVYYCARGGLGWSLDYWGKGTMITVTSATPSPPTVFPLMESCCLSDISGP VATGCLATGFCLPPRPSRGLINLEKL >gi|2695851|emb|CAA73709.1| immunoglobulin heavy chain [Acipenser baerii] MGILTALCIIMTALSSVRSDVVLTESGPAVVKPGESHKLSCKAAGFTFSSYWMGWVRQTPGKGLEWVSIISAGGSTYYAP SVEGRFTISRDNSNSMLYLQMNSLKTEDTAMYYCARKPETGSYGNISFEHWGKGTMITVTSATPSPPTVFPLMQACCSVD VTGPSATGCLATEF >gi|2695853|emb|CAA73712.1| immunoglobulin heavy chain [Acipenser baerii] MGILTALCIIMTALSSVRSDVVLTESGPAVIKPGESHKLSCKASGFTFSSNNMGWVRQAPGKGLEWVSTISYSVNAYYAQ SVQGRFTISRDDSNSMLYLQMNSLKTEDSAVYYCARESNFNRFDYWGSGTMVTVTNATPSPPTVFPLMQACCSVDVTGPS ATGCLATEF I suppose the accession numbers are: CAA73704.1, CAA73709.1, CAA73712.1|, etc... (??) The thing is, how can I do for Bioperl to parse and recognize them? Thanks in advance. -- David Garc?a Cort?s Instituto Nacional de Bioinform?tica (INB) Nodo Computacional GNHC-2 UPC-CIRI c/. Jordi Girona 1-3 Modul C6-E201 Tel. : 934 011 650 E-08034 Barcelona Fax : 934 017 014 Catalunya (Spain) e-mail: davidg@lsi.upc.edu From jason.stajich at duke.edu Mon Jan 3 12:01:21 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Jan 3 11:57:58 2005 Subject: [Bioperl-l] Need help for implementing a new TreeIO module In-Reply-To: <41D97504.7020808@ccr.jussieu.fr> References: <41D96A82.8070202@ccr.jussieu.fr> <6ACDABCF-5DA2-11D9-B9B4-000393C44276@duke.edu> <41D97504.7020808@ccr.jussieu.fr> Message-ID: <17F1ACAA-5DA9-11D9-B9B4-000393C44276@duke.edu> On Jan 3, 2005, at 11:38 AM, Guillaume Rousse wrote: >> The thing is you need to build the tree by connecting the nodes, so >> the order they are created in is very important. You can't just >> build the leaves first and then the (internal) nodes later. You need >> to build from the top down - if you read a newick format from left to >> right, that is exactly how we are building the tree up using the >> EventListener. >> In a way the builder basically assumes you have already have the tree >> built, just encoded. So you start with a root node, you add >> children. For each child you add more children where appropriate >> until you get to a leaf node and you are done with that recursion. > OK, thanks for the explanations. However, I don't understand how to > add branch length informations. I guess leave labels are just > introduced using characters() method, right ? This would set a branch length for a node. The 'leaf' event is sort of a hack - I can't remember why I had to introduce it - I think to deal with the labeled internal nodes. So to build a leaf node with branch_length $branch_length and name $idstring you want to do: # leaf node $self->_eventHandler->start_element({'Name' => 'node'}); $self->_eventHandler->start_element( { 'Name' => 'branch_length'}); $self->_eventHandler->characters($branch_length); $self->_eventHandler->end_element( {'Name' => 'branch_length'}); $self->_eventHandler->start_element( { 'Name' => 'id'}); $self->_eventHandler->characters($idstring); $self->_eventHandler->end_element( {'Name' => 'id'}); $self->_eventHandler->start_element({'Name' => 'leaf'}); $self->_eventHandler->characters(1); $self->_eventHandler->end_element({'Name' => 'leaf'}); $self->_eventHandler->end_element({'Name' => 'node'}); To build an internal node which has a branch length but no label for example: # Internal Node $self->_eventHandler->start_element({'Name' => 'node'}); $self->_eventHandler->start_element( { 'Name' => 'branch_length'}); $self->_eventHandler->characters($branch_length); $self->_eventHandler->end_element( {'Name' => 'branch_length'}); $self->_eventHandler->start_element({'Name' => 'leaf'}); $self->_eventHandler->characters(0); $self->_eventHandler->end_element({'Name' => 'leaf'}); $self->_eventHandler->end_element({'Name' => 'node'}); See the 'characters' function in Bio;:TreeIO::TreeEventHandler for the different field names and event labels that can be used. If you want to build a node with two leaves, first you have to start with a 'tree' section to tell the handler that this is nested data. Start a 'tree' event, build the node (like the section just above), then build two leaf nodes (like the leaf node section above), then end the 'tree' event. 'tree' is an unfortunate name for the event but don't feel like changing it - a throwback from when I thought I'd only need an initial 'tree' an just 'node' events. $self->_eventHandler->start_document; $self->_eventHandler->start_element({'Name' => 'tree'}); # do internal node # do leaf node # do leaf node $self->_eventHandler->end_element({'Name' => 'tree'}); return $self->_eventHandler->end_document; Hmm - I guess I need to go back and document the event system here and in SearchIO if people are going to develop with it. > -- > You aren't Superman > -- Murphy's Bush Fire Brigade Laws n?22 > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From hlapp at gmx.net Mon Jan 3 12:18:31 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon Jan 3 12:15:36 2005 Subject: [Bioperl-l] Problems parsing Accesion number in FASTA format. In-Reply-To: <001201c4f1b5$b9b46b40$cf1e5393@Davidg> Message-ID: <7D7AC00B-5DAB-11D9-8820-000A959EB4C4@gmx.net> The FASTA parser only sets display_id. It doesn't set the accession number, and it doesn't set primary_id either. IMO, this is the correct behaviour, because the identifier in FASTA headers can come in all sorts of formats. If what you want is to print the identifier part of the description line, print $seq->display_id(). If what you want is to extract the accession number, then parse it out from what display_id returns, using the format you expect it to be in. -hilmar (BTW technically, CAA73704.1 is not the accession - CAA73704 is and 1 is the version; just to illustrate) On Monday, January 3, 2005, at 09:00 AM, David Garc?a Cort?s wrote: > Hello. > > I have the "nr" database in FASTA format (downloaded from NCBI > website), and i want to retrieve the accession number of each sequence > in that database, so I do the following: > > my $seqsfich = Bio::SeqIO->new(-file=>"nr.fa", '-format' => 'Fasta'); > > while (my $seq = $seqsfich->next_seq()) { > print STDOUT "Sequence accession number: ", $seq->accession, "\n"; > } > > But the results I get are: > > Sequence accession number: unknown > Sequence accession number: unknown > Sequence accession number: unknown > Sequence accession number: unknown > etc... > > Here you can see a fragment of the "nr.fa" file > : >> gi|2695847|emb|CAA73704.1| immunoglobulin heavy chain [Acipenser >> baerii] > MGILTALCIIMTALSSVRSDVVLTESGPAVIKPGESHKLSCKASGFTFSSAYMSWVRQAPGKGLEWVAYIY > SGGSSTYYA > QSVQGRFAISRDDSNSMLYLQMNSLKTEDTAVYYCARGGLGWSLDYWGKGTMITVTSATPSPPTVFPLMES > CCLSDISGP > VATGCLATGFCLPPRPSRGLINLEKL >> gi|2695851|emb|CAA73709.1| immunoglobulin heavy chain [Acipenser >> baerii] > MGILTALCIIMTALSSVRSDVVLTESGPAVVKPGESHKLSCKAAGFTFSSYWMGWVRQTPGKGLEWVSIIS > AGGSTYYAP > SVEGRFTISRDNSNSMLYLQMNSLKTEDTAMYYCARKPETGSYGNISFEHWGKGTMITVTSATPSPPTVFP > LMQACCSVD > VTGPSATGCLATEF >> gi|2695853|emb|CAA73712.1| immunoglobulin heavy chain [Acipenser >> baerii] > MGILTALCIIMTALSSVRSDVVLTESGPAVIKPGESHKLSCKASGFTFSSNNMGWVRQAPGKGLEWVSTIS > YSVNAYYAQ > SVQGRFTISRDDSNSMLYLQMNSLKTEDSAVYYCARESNFNRFDYWGSGTMVTVTNATPSPPTVFPLMQAC > CSVDVTGPS > ATGCLATEF > > I suppose the accession numbers are: CAA73704.1, CAA73709.1, > CAA73712.1|, etc... (??) > The thing is, how can I do for Bioperl to parse and recognize them? > > Thanks in advance. > > -- > David Garc?a Cort?s > Instituto Nacional de Bioinform?tica (INB) > Nodo Computacional GNHC-2 UPC-CIRI > c/. Jordi Girona 1-3 > Modul C6-E201 Tel. : 934 011 650 > E-08034 Barcelona Fax : 934 017 014 > Catalunya (Spain) e-mail: davidg@lsi.upc.edu > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From jason.stajich at duke.edu Mon Jan 3 12:18:45 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Jan 3 12:16:19 2005 Subject: [Bioperl-l] Problems parsing Accesion number in FASTA format. In-Reply-To: <001201c4f1b5$b9b46b40$cf1e5393@Davidg> References: <001201c4f1b5$b9b46b40$cf1e5393@Davidg> Message-ID: <861D8C94-5DAB-11D9-B9B4-000393C44276@duke.edu> I though someone was going to centralize this function at some point. Right now there is a _get_accession_version function in Bio::SearchIO::blast. Perhaps someone would care to make a utility module which can export a bunch of useful functions like this? my $seqsfich = Bio::SeqIO->new(-file=>"nr.fa", '-format' => 'Fasta'); while (my $seq = $seqsfich->next_seq()) { my ($acc,$ver) = &_get_accession_version($seq->display_id) $seq->accession_number($acc); $seq->version($ver); print STDOUT "Sequence accession number: ", $seq->accession_number, "\n"; } sub _get_accession_version { my $id = shift; # handle case when this is accidently called as a class method if( ref($id) && $id->isa('Bio::SearchIO') ) { $id = shift; } return undef unless defined $id; my ($acc, $version); if ($id =~ /(gb|emb|dbj|sp|pdb|bbs|ref|lcl)\|(.*)\|(.*)/) { ($acc, $version) = split /\./, $2; } elsif ($id =~ /(pir|prf|pat|gnl)\|(.*)\|(.*)/) { ($acc, $version) = split /\./, $3; } else { #punt, not matching the db's at ftp://ftp.ncbi.nih.gov/blast/db/README #Database Name Identifier Syntax #============================ ======================== #GenBank gb|accession|locus #EMBL Data Library emb|accession|locus #DDBJ, DNA Database of Japan dbj|accession|locus #NBRF PIR pir||entry #Protein Research Foundation prf||name #SWISS-PROT sp|accession|entry name #Brookhaven Protein Data Bank pdb|entry|chain #Patents pat|country|number #GenInfo Backbone Id bbs|number #General database identifier gnl|database|identifier #NCBI Reference Sequence ref|accession|locus #Local Sequence identifier lcl|identifier $acc=$id; } return ($acc,$version); } On Jan 3, 2005, at 12:00 PM, David Garc?a Cort?s wrote: > Hello. > > I have the "nr" database in FASTA format (downloaded from NCBI > website), and i want to retrieve the accession number of each sequence > in that database, so I do the following: > > my $seqsfich = Bio::SeqIO->new(-file=>"nr.fa", '-format' => 'Fasta'); > > while (my $seq = $seqsfich->next_seq()) { > print STDOUT "Sequence accession number: ", $seq->accession, "\n"; > } > > But the results I get are: > > Sequence accession number: unknown > Sequence accession number: unknown > Sequence accession number: unknown > Sequence accession number: unknown > etc... > > Here you can see a fragment of the "nr.fa" file > : >> gi|2695847|emb|CAA73704.1| immunoglobulin heavy chain [Acipenser >> baerii] > MGILTALCIIMTALSSVRSDVVLTESGPAVIKPGESHKLSCKASGFTFSSAYMSWVRQAPGKGLEWVAYIY > SGGSSTYYA > QSVQGRFAISRDDSNSMLYLQMNSLKTEDTAVYYCARGGLGWSLDYWGKGTMITVTSATPSPPTVFPLMES > CCLSDISGP > VATGCLATGFCLPPRPSRGLINLEKL >> gi|2695851|emb|CAA73709.1| immunoglobulin heavy chain [Acipenser >> baerii] > MGILTALCIIMTALSSVRSDVVLTESGPAVVKPGESHKLSCKAAGFTFSSYWMGWVRQTPGKGLEWVSIIS > AGGSTYYAP > SVEGRFTISRDNSNSMLYLQMNSLKTEDTAMYYCARKPETGSYGNISFEHWGKGTMITVTSATPSPPTVFP > LMQACCSVD > VTGPSATGCLATEF >> gi|2695853|emb|CAA73712.1| immunoglobulin heavy chain [Acipenser >> baerii] > MGILTALCIIMTALSSVRSDVVLTESGPAVIKPGESHKLSCKASGFTFSSNNMGWVRQAPGKGLEWVSTIS > YSVNAYYAQ > SVQGRFTISRDDSNSMLYLQMNSLKTEDSAVYYCARESNFNRFDYWGSGTMVTVTNATPSPPTVFPLMQAC > CSVDVTGPS > ATGCLATEF > > I suppose the accession numbers are: CAA73704.1, CAA73709.1, > CAA73712.1|, etc... (??) > The thing is, how can I do for Bioperl to parse and recognize them? > > Thanks in advance. > > -- > David Garc?a Cort?s > Instituto Nacional de Bioinform?tica (INB) > Nodo Computacional GNHC-2 UPC-CIRI > c/. Jordi Girona 1-3 > Modul C6-E201 Tel. : 934 011 650 > E-08034 Barcelona Fax : 934 017 014 > Catalunya (Spain) e-mail: davidg@lsi.upc.edu > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From hlapp at gmx.net Mon Jan 3 12:20:33 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon Jan 3 12:17:18 2005 Subject: [Bioperl-l] Re: score in seqfeature In-Reply-To: <1104770057.3258.27.camel@localhost.localdomain> Message-ID: So, what does this mean for SeqFeature::Generic's future? Any input or comments, e.g. from the people who've had opinions on that before? -hilmar BTW the digest title as email subject sucks. Sorry. On Monday, January 3, 2005, at 08:34 AM, Scott Cain wrote: > Hi Hilmar, > > SeqFeature::Annotated (which is what FeatureIO::gff is using) has a > score method that stores the score as a Annotation::SimpleValue, which > means it is used like this: > > my $score = $feature->score->value; > > which seems to work well for GFF3 (since scores are by definition a > single value in GFF3). > > Scott > > > On Mon, 2005-01-03 at 10:51 -0500, > bioperl-l-request@portal.open-bio.org > wrote: >> Date: Sun, 2 Jan 2005 16:28:52 -0800 >> From: Hilmar Lapp >> Subject: [Bioperl-l] score in seqfeature >> To: Allen Day , Bioperl >> Message-ID: <71B3DC9D-5D1E-11D9-827C-000A959EB4C4@gnf.org> >> Content-Type: text/plain; charset=US-ASCII; format=flowed >> >> Allen et al, what are the (GFF3-driven?) plans for storing the score >> property introduced by SeqFeature::Generic? >> >> The reason I'm asking is that it doesn't get (de-)serialized in >> bioperl-db because it's neither defined on SeqFeatureI nor has it been >> internal stored as a tag/value pair. I'd like to fix this issue, >> either >> by pulling it into the annotation bundle in >> SeqFeature::AnnotationAdapter, or by some other means that maybe is >> friendlier or more useful to GFF3 minds. >> >> -hilmar >> -- >> ------------------------------------------------------------- >> Hilmar Lapp email: lapp at gnf.org >> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 >> ------------------------------------------------------------- > > -- > ----------------------------------------------------------------------- > - > Scott Cain, Ph. D. > cain@cshl.org > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From davidg at lsi.upc.edu Mon Jan 3 12:59:14 2005 From: davidg at lsi.upc.edu (=?iso-8859-1?Q?David_Garc=EDa_Cort=E9s?=) Date: Mon Jan 3 12:55:54 2005 Subject: [Bioperl-l] Problems parsing Accesion number in FASTA format. References: <001201c4f1b5$b9b46b40$cf1e5393@Davidg> <861D8C94-5DAB-11D9-B9B4-000393C44276@duke.edu> Message-ID: <003401c4f1bd$f2e603d0$cf1e5393@Davidg> Thank you very much. Now it works!!!! :-) ----- Original Message ----- From: "Jason Stajich" To: "David Garc?a Cort?s" Cc: Sent: Monday, January 03, 2005 6:18 PM Subject: Re: [Bioperl-l] Problems parsing Accesion number in FASTA format. >I though someone was going to centralize this function at some point. >Right now there is a _get_accession_version function in >Bio::SearchIO::blast. Perhaps someone would care to make a utility module >which can export a bunch of useful functions like this? > > my $seqsfich = Bio::SeqIO->new(-file=>"nr.fa", '-format' => 'Fasta'); > > while (my $seq = $seqsfich->next_seq()) { > my ($acc,$ver) = &_get_accession_version($seq->display_id) > $seq->accession_number($acc); > $seq->version($ver); > print STDOUT "Sequence accession number: ", > $seq->accession_number, "\n"; > } > > sub _get_accession_version { > my $id = shift; > > # handle case when this is accidently called as a class method > if( ref($id) && $id->isa('Bio::SearchIO') ) { > $id = shift; > } > return undef unless defined $id; > my ($acc, $version); > if ($id =~ /(gb|emb|dbj|sp|pdb|bbs|ref|lcl)\|(.*)\|(.*)/) { > ($acc, $version) = split /\./, $2; > } elsif ($id =~ /(pir|prf|pat|gnl)\|(.*)\|(.*)/) { > ($acc, $version) = split /\./, $3; > } else { > #punt, not matching the db's at > ftp://ftp.ncbi.nih.gov/blast/db/README > #Database Name Identifier Syntax > #============================ ======================== > #GenBank gb|accession|locus > #EMBL Data Library emb|accession|locus > #DDBJ, DNA Database of Japan dbj|accession|locus > #NBRF PIR pir||entry > #Protein Research Foundation prf||name > #SWISS-PROT sp|accession|entry name > #Brookhaven Protein Data Bank pdb|entry|chain > #Patents pat|country|number > #GenInfo Backbone Id bbs|number > #General database identifier gnl|database|identifier > #NCBI Reference Sequence ref|accession|locus > #Local Sequence identifier lcl|identifier > $acc=$id; > } > return ($acc,$version); > } > > On Jan 3, 2005, at 12:00 PM, David Garc?a Cort?s wrote: > >> Hello. >> >> I have the "nr" database in FASTA format (downloaded from NCBI website), >> and i want to retrieve the accession number of each sequence in that >> database, so I do the following: >> >> my $seqsfich = Bio::SeqIO->new(-file=>"nr.fa", '-format' => 'Fasta'); >> >> while (my $seq = $seqsfich->next_seq()) { >> print STDOUT "Sequence accession number: ", $seq->accession, "\n"; >> } >> >> But the results I get are: >> >> Sequence accession number: unknown >> Sequence accession number: unknown >> Sequence accession number: unknown >> Sequence accession number: unknown >> etc... >> >> Here you can see a fragment of the "nr.fa" file >> : >>> gi|2695847|emb|CAA73704.1| immunoglobulin heavy chain [Acipenser >>> baerii] >> MGILTALCIIMTALSSVRSDVVLTESGPAVIKPGESHKLSCKASGFTFSSAYMSWVRQAPGKGLEWVAYIY >> SGGSSTYYA >> QSVQGRFAISRDDSNSMLYLQMNSLKTEDTAVYYCARGGLGWSLDYWGKGTMITVTSATPSPPTVFPLMES >> CCLSDISGP >> VATGCLATGFCLPPRPSRGLINLEKL >>> gi|2695851|emb|CAA73709.1| immunoglobulin heavy chain [Acipenser >>> baerii] >> MGILTALCIIMTALSSVRSDVVLTESGPAVVKPGESHKLSCKAAGFTFSSYWMGWVRQTPGKGLEWVSIIS >> AGGSTYYAP >> SVEGRFTISRDNSNSMLYLQMNSLKTEDTAMYYCARKPETGSYGNISFEHWGKGTMITVTSATPSPPTVFP >> LMQACCSVD >> VTGPSATGCLATEF >>> gi|2695853|emb|CAA73712.1| immunoglobulin heavy chain [Acipenser >>> baerii] >> MGILTALCIIMTALSSVRSDVVLTESGPAVIKPGESHKLSCKASGFTFSSNNMGWVRQAPGKGLEWVSTIS >> YSVNAYYAQ >> SVQGRFTISRDDSNSMLYLQMNSLKTEDSAVYYCARESNFNRFDYWGSGTMVTVTNATPSPPTVFPLMQAC >> CSVDVTGPS >> ATGCLATEF >> >> I suppose the accession numbers are: CAA73704.1, CAA73709.1, >> CAA73712.1|, etc... (??) >> The thing is, how can I do for Bioperl to parse and recognize them? >> >> Thanks in advance. >> >> -- >> David Garc?a Cort?s >> Instituto Nacional de Bioinform?tica (INB) >> Nodo Computacional GNHC-2 UPC-CIRI >> c/. Jordi Girona 1-3 >> Modul C6-E201 Tel. : 934 011 650 >> E-08034 Barcelona Fax : 934 017 014 >> Catalunya (Spain) e-mail: davidg@lsi.upc.edu >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > From qfdong at iastate.edu Mon Jan 3 13:08:26 2005 From: qfdong at iastate.edu (Qunfeng) Date: Mon Jan 3 13:08:27 2005 Subject: [Bioperl-l] parse long organism name In-Reply-To: References: <6.1.2.0.2.20041216165914.039e5758@qfdong.mail.iastate.edu> Message-ID: <6.1.2.0.2.20050103113653.03ab1020@qfdong.mail.iastate.edu> I didn't get any error msg. When I parse the organism name with the following methods: my $organism = $seq_object->species->binomial(); my $species = $seq_object->species->species(); my $genus = $seq_object->species->genus(); my $common_name = $seq_object->species->common_name(); I got the following value $organism as Paphiopedilum 'Dark $species as Paphiopedilum $genus as 'Dark $common_name as Paphiopedilum 'Dark Roller' x Paphiopedilum rothschildianum So, the common_name is correct, while binmial(), species(), and genus() all assume that the name is in CORRECT species, genus form. Qunfeng At 10:14 AM 12/17/2004, Hilmar Lapp wrote: >What's the error that you get, if any? > > -hilmar > >On Thursday, December 16, 2004, at 03:00 PM, Qunfeng wrote: > >>For example, >>http://www.ncbi.nlm.nih.gov/entrez/ viewer.fcgi?db=nucleotide&val=47776109 >> >>It has a LONG name: >>>wwwtax.cgi?id=232838>Paphiopedilum 'Dark Roller' x Paphiopedilum >>rothschildianum >> >>Is there anyway in Bioperl to parse out that long name from GenBank >>format file? >> >>Thanks! >> >>Qunfeng _______________________________________________ >>Bioperl-l mailing list >>Bioperl-l@portal.open-bio.org >>http://portal.open-bio.org/mailman/listinfo/bioperl-l >-- >------------------------------------------------------------- From brian_osborne at cognia.com Mon Jan 3 13:11:05 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Mon Jan 3 13:08:29 2005 Subject: [Bioperl-l] Problems parsing Accesion number in FASTA format. In-Reply-To: <001201c4f1b5$b9b46b40$cf1e5393@Davidg> Message-ID: David, The information you need is returned by the display_id() and desc() methods. display_id() will return >(\S+), and desc() returns >\S+\s+(.+). Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of David Garc?a Cort?s Sent: Monday, January 03, 2005 12:00 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] Problems parsing Accesion number in FASTA format. Hello. I have the "nr" database in FASTA format (downloaded from NCBI website), and i want to retrieve the accession number of each sequence in that database, so I do the following: my $seqsfich = Bio::SeqIO->new(-file=>"nr.fa", '-format' => 'Fasta'); while (my $seq = $seqsfich->next_seq()) { print STDOUT "Sequence accession number: ", $seq->accession, "\n"; } But the results I get are: Sequence accession number: unknown Sequence accession number: unknown Sequence accession number: unknown Sequence accession number: unknown etc... Here you can see a fragment of the "nr.fa" file : >gi|2695847|emb|CAA73704.1| immunoglobulin heavy chain [Acipenser baerii] MGILTALCIIMTALSSVRSDVVLTESGPAVIKPGESHKLSCKASGFTFSSAYMSWVRQAPGKGLEWVAYIYSGGSS TYYA QSVQGRFAISRDDSNSMLYLQMNSLKTEDTAVYYCARGGLGWSLDYWGKGTMITVTSATPSPPTVFPLMESCCLSD ISGP VATGCLATGFCLPPRPSRGLINLEKL >gi|2695851|emb|CAA73709.1| immunoglobulin heavy chain [Acipenser baerii] MGILTALCIIMTALSSVRSDVVLTESGPAVVKPGESHKLSCKAAGFTFSSYWMGWVRQTPGKGLEWVSIISAGGST YYAP SVEGRFTISRDNSNSMLYLQMNSLKTEDTAMYYCARKPETGSYGNISFEHWGKGTMITVTSATPSPPTVFPLMQAC CSVD VTGPSATGCLATEF >gi|2695853|emb|CAA73712.1| immunoglobulin heavy chain [Acipenser baerii] MGILTALCIIMTALSSVRSDVVLTESGPAVIKPGESHKLSCKASGFTFSSNNMGWVRQAPGKGLEWVSTISYSVNA YYAQ SVQGRFTISRDDSNSMLYLQMNSLKTEDSAVYYCARESNFNRFDYWGSGTMVTVTNATPSPPTVFPLMQACCSVDV TGPS ATGCLATEF I suppose the accession numbers are: CAA73704.1, CAA73709.1, CAA73712.1|, etc... (??) The thing is, how can I do for Bioperl to parse and recognize them? Thanks in advance. -- David Garc?a Cort?s Instituto Nacional de Bioinform?tica (INB) Nodo Computacional GNHC-2 UPC-CIRI c/. Jordi Girona 1-3 Modul C6-E201 Tel. : 934 011 650 E-08034 Barcelona Fax : 934 017 014 Catalunya (Spain) e-mail: davidg@lsi.upc.edu _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Mon Jan 3 13:41:35 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon Jan 3 13:38:22 2005 Subject: [Bioperl-l] parse long organism name In-Reply-To: <6.1.2.0.2.20050103113653.03ab1020@qfdong.mail.iastate.edu> Message-ID: <186AB058-5DB7-11D9-8BB0-000A959EB4C4@gmx.net> To be honest I'm not even sure what binomial is supposed to return here. The problem originates from the fact that binomial, species, and genus won't store their values redundantly but rather access the classification array (kingdom->order->blah->foo etc) at the expected locations. common_name on the contrary does store it's value itself. I don't feel I'm suited to take this on. If anybody else does please don't hesitate to come forward. My gut reaction would be to push more towards using the taxonomy classes by Jason et al over the Bio::Species class. I'd hope that model would be able to handle such weirdnesses better. -hilmar On Monday, January 3, 2005, at 10:08 AM, Qunfeng wrote: > I didn't get any error msg. > > When I parse the organism name with the following methods: > > my $organism = $seq_object->species->binomial(); > my $species = $seq_object->species->species(); > my $genus = $seq_object->species->genus(); > my $common_name = $seq_object->species->common_name(); > > I got the following value > > $organism as Paphiopedilum 'Dark > $species as Paphiopedilum > $genus as 'Dark > $common_name as Paphiopedilum 'Dark Roller' x Paphiopedilum > rothschildianum > > So, the common_name is correct, while binmial(), species(), and > genus() all assume that the name is in CORRECT species, genus form. > > Qunfeng > > At 10:14 AM 12/17/2004, Hilmar Lapp wrote: >> What's the error that you get, if any? >> >> -hilmar >> >> On Thursday, December 16, 2004, at 03:00 PM, Qunfeng wrote: >> >>> For example, >>> http://www.ncbi.nlm.nih.gov/entrez/ >>> viewer.fcgi?db=nucleotide&val=47776109 >>> >>> It has a LONG name: >>> >> wwwtax.cgi?id=232838>Paphiopedilum 'Dark Roller' x Paphiopedilum >>> rothschildianum >>> >>> Is there anyway in Bioperl to parse out that long name from GenBank >>> format file? >>> >>> Thanks! >>> >>> Qunfeng _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> -- >> ------------------------------------------------------------- > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From golharam at umdnj.edu Mon Jan 3 10:09:28 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Mon Jan 3 14:14:29 2005 Subject: [Bioperl-l] Bioperl in 2005 In-Reply-To: Message-ID: <000901c4f1a6$38dcb4f0$3400a8c0@GOLHARMOBILE1> Good idea... I've attached two files to this message: Results.pm and Exon.pm. They belong in Bio::Tools::Spidey. If the attachments don't come through, I'll paste their contents in a message then... They are used to parse the output of Spidey and work essentially in the same manner that Bio::Tools::Sim4 works. Ryan -----Original Message----- From: Brian Osborne [mailto:brian_osborne@cognia.com] Sent: Saturday, January 01, 2005 11:18 AM To: golharam@umdnj.edu; 'Jason Stajich'; 'Bioperl List' Subject: RE: [Bioperl-l] Bioperl in 2005 Ryan, You could post it to bioperl-l, some one will commit it to CVS. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Ryan Golhar Sent: Thursday, December 30, 2004 12:33 PM To: 'Jason Stajich'; 'Bioperl List' Subject: RE: [Bioperl-l] Bioperl in 2005 Hi all, I'd like to contribute a parser module to parse Spidey results. I took the sim4 parser and modified a little bit to properly read in spidey results. Everything else about it works the same as the sim4 parser as far as I can tell. How can I contribute this module? ----- Ryan Golhar Computational Biologist The Informatics Institute at The University of Medicine & Dentistry of NJ Phone: 973-972-5034 Fax: 973-972-7412 Email: golharam@umdnj.edu -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Jason Stajich Sent: Wednesday, December 29, 2004 5:46 PM To: Bioperl List; bioperl-announce-l@bioperl.org Subject: [Bioperl-l] Bioperl in 2005 I just wanted to use the end of the year as a chance to reflect on what we've accomplished in 2004 and think about what 2005 holds for Bioperl. What happened in 2004? First of all, this year has been really has been productive at a level perhaps only appreciated by the folks who read the bioperl-guts-l list which lists the CVS commits. New modules, bugfixes and code improvements have been steadily making their way into the codebase. Not only has there been lots of traffic, but more people are contributing code and fixes. We have also seen increased contributions to the HOWTOs which we hope will be an effective place to explain how to use sets of modules to complete a particular task. We are continually working to improve the documentation. This is a balance between a developer trying to get something accomplished for their own research and wanting other people to use their code (and not wanting to field lots of emails about a particular module). Open source software written solely by volunteers suffers from a reward system which values code over documentation and writing tutorials. We welcome ideas on changes which would help this and are currently thinking about ways to reward the productive documenters as well as coders. We had a chance to have a 5 day Bootcamp in June thanks to Sylvain Foisy, the University of Montreal and the Quebec Bioinformatics Network (BioneQ). We hope to do another one of these in 2006. If there is a general interest in more widespread Bioperl tutorials please forward them to myself or the bioperl list and we can consider how something like this could be organized in conjunction with a conference or meeting. How popular is Bioperl? The 2002 paper has 60+ citations according to Web of Science and we're seeing use in a broader context than just sequence analysis. At least one published paper about modules which were already part of the codebase has appeared suggesting software availability and collaboration can happen prior to publication. The website has been consistently gets around 300,000 hits per month which isn't bad considering that the content doesn't change very much and this is just a site for one toolkit for specific aspect of science. The bioperl-l mailing list has seen an average 341 mails per month (not correcting for spam) which has seen a lot of questions answered and ideas hashed out. How can you help out? I want to use this chance to also appeal to those who use Bioperl and have been sitting on your hands waiting to jump in. It is a collaborative project that only works if new people jump in an contribute ideas and manpower. We've had many examples of people who have just jumped on board the project, fixed some bugs, contributed a module and went on their merry way. We've also had other people who have jumped in, contributed code, and found themselves fully engaged in the project and its internal workings almost immediately. Not to wax poetic, but it was about 5 years ago that fresh out of college, I started reading the mailing list, read Steve Chervitz's email plea for people to "ask not what Bioperl can do for you, ask what you can do for Bioperl" (http://bioperl.org/pipermail/bioperl-l/1999-December/003354.html) and just jumped right in. I can only hope to influence some more folks who might have wanted to contribute but were waiting for the invitation. Well come on over, we'd love to have you taking part. As for some specifics. - Parsing of Species information out from the ORGANISM lines in SwissProt, GenBank, and EMBL is pretty spotty and could take some work. - Some more parsers for formats that people have asked for - a Spidey parser (NCBI's mRNA -> genomic alignment tool) - Work on the Structure modules for dealing with protein structure data - Integrate new applications into bioperl-run and further cleanup the existing modules so they are more consistent - Volunteer to be the next release master. What does the future hold for Bioperl? We expect to have a 1.5 release of bioperl in 1st quarter of 2005 - this is the domain of Aaron Mackey who agreed to be the release master (who has his hands full right now, but I'm sure will ask for help when he needs it). This should incorporate many new modules and bug fixes but be compatible with the 1.4 API as well. Details on the schedule for 1.5 sometime after the holidays. The future depends entirely on who steps up to work on the project next year. In 2005, I am resolving to limit myself from the front guard of mailing list question answering. This is in part finish my PhD research and focus on building more specific tools to support my research questions, but also it is time for other people to contribute and share the spotlight and be a know-it-all. Bioperl is very much a labor of love and it is an integral part of the tools I use in my own work so I expect to focus more directly on those things I need in the coming year and help out where I can. My hope is that some of the new folks who have stepped up to contribute will help by continuing the course we have set to have high quality releases, a full test suite, POD documentation for every module, and overall documentation for using modules in HOWTOs and tutorials. If there are new or unexplored areas the project should consider I hope that you will speak up and suggest them. There is discussion underfoot that a new Bioperl object model may be born. This has been called Bioperl2 and Bioperl-NG. The idea is it would try and create a leaner and cleaner code base which is does things like event-based parsing, autogenerated code for things like getters/setters, and could do things faster and easier than we are currently. Generally there is a lot of legacy code and legacy design in Bioperl and it would be beneficial to have a project that was free of these constraints. At the same time there is an expectation that a project like this would also need to achieve something more than what the current bioperl API cannot do so it incumbent on the new project to have goals that are higher than what Bioperl can do. Thank you I'd like to finally thank some people who have done a lot this year. Of course I'm not going to remember to name everyone, but I just wanted to highlight some folks who have endeavored not only get the toolkit to do what they want, but also to help out other people get started with it. The people who have kept the project going. These are usual suspects how have labored to do the dirty grunt work cleaning up boring bugs, adding documentation, preparing a release, keeping the servers going, etc. They also code too, but wanted to highlight that they have really been critical to keeping the project going by doing the things that most people don't want to bother with. Brian Osborne Aaron Mackey Chris Dagdigian Kyle Jenson (mailing list and site searching at http://search.open-bio.org) Some usual suspects who have been helping maintain their modules and generally being Bioperl knowledgeable on the list: Scott Cain Steve Chervitz Allen Day Donald Jackson Stefan Kirov Hilmar Lapp Josh Lauricha Heikki Lehvaslaiho Chris Mungall Jurgen Plentinckx Lincon Stein There are new several people who have taken up the slack as those before them have drifted onto other commitments. (metaphoric slack of course, not trying to accuse anyone of being a 'slacker'). Thanks for jumping in, fixing bugs, running tests, giving feedback, and just getting involved. It is really encouraging when the project can be a 2-way street and not just a one way flow information going out from a few people who post answers to the list. Richard Adams Sean Davis Rob Edwards Nathan Haigh Marc Logghe Barry Moore Remo Sanges James Thompson Koen van der Drift (Bioperl available via fink on OS X) Thanks also to Peter van Heusden and Electric Genetics which are undertaking a code audit of Bioperl and should have many helpful feedback points for us. I've probably forgotten some people, please post a followup if I have neglected someone as I would like you to be recognized for your work since we don't give out a whole lot else right now. A safe and prosperous New Year to you all. Jason Stajich on behalf of the Bioperl core developers. -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l -------------- next part -------------- A non-text attachment was scrubbed... Name: Exon.pm Type: application/octet-stream Size: 5201 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050103/65408f91/Exon-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: Results.pm Type: application/octet-stream Size: 12607 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050103/65408f91/Results-0001.obj From Peter.Robinson at t-online.de Mon Jan 3 17:40:01 2005 From: Peter.Robinson at t-online.de (Peter Robinson) Date: Mon Jan 3 17:36:25 2005 Subject: [Bioperl-l] Entrez Gene and bioperl-db In-Reply-To: <2ED9C47A-5898-11D9-AC01-000A959EB4C4@gmx.net> References: <2ED9C47A-5898-11D9-AC01-000A959EB4C4@gmx.net> Message-ID: <1104792001.3186.17.camel@localhost.localdomain> Hi Bioperlers, hi Hilmar, after some thinking I have embarked on a lex/yacc parser for the Entrez Gene ASN.1 format as the way of least resistance, although I am not sure how that would fit in to BioPerl. If anyone is interested in this (or has a better idea of how to go about it..), please drop me a line. In the meantime I have been looking at writing code to parse some of the "easy" Entrez gene documents, starting off with gene_info. This file includes the NCBI taxon id for each entry. I would like to convert this to a Bio::Species object to pass to the following my $seq = $self->sequence_factory->create( -verbose => $self->verbose(), -accession_number => $geneID, -desc => $description, -display_id => $symbol, -species => ??? -annotation => $ann); and saw the Bio::Taxonomy::FactoryI code, which appears to want to do this sort of thing. However, the code for that is pretty preliminary. Is anyone working on this at the moment? Or is there a better way of doing this (it seems a shame not to provide the actual species name if one has the taxid...) best Peter On Tue, 2004-12-28 at 07:17, Hilmar Lapp wrote: > Great to hear that someone is giving this a shot. Yes at this point is > appears that NCBI is only offering the ASN.1, not a conversion to XML. > Their asn2xml tool will not work with this ASN.1 format either, just > checked it to be sure. They do seem to be mulling the option of XML > though on the Gene FAQ. Maybe if enough people get in their ears they > will spend some effort towards that. After all, the entrez gene web > interface can display XML on demand - even though it looks fairly > hideous. > > There is no ASN.1 support in bioperl at all. Also, ASN.1 support in > perl is actually thin - there is Convert::ASN1 at version 0.18 two > years ago that I could find ... doesn't make me feel warm and fuzzy. > > In the absence of any XML available from NCBI, gene_info might be the > best start. An option could be to check for the presence of the other > tab-delimited files and use those that are present. These are > tab-delimited and hence the format itself is trivial so you can focus > entirely on setting up a Bio::Seq plus annotation that's > comparable/compatible to what the current SeqIO::locuslink does. > > My $0.02 (worth less and less almost every day). > > -hilmar > > On Thursday, December 23, 2004, at 10:51 AM, Peter Robinson wrote: > > > Hi, > > > > I have been thinking about given a BioPerl EntrezGene parser a try > > since > > I have been a heavy user of locus link to date. One issue is that the > > files that correspond to LL_tmpl (which was a flat file) are now in asn > > format > > http://www.ncbi.nlm.nih.gov/entrez/query/static/help/ > > genehelp.html#query > > Although I saw some mention of ASN support in Bioperl by googling, I > > can't seem to find any module that does this in the present > > distribution. What is the status on that? In any case, I will be > > working > > on this in the next month or two and if anything nice comes of it I > > will > > send it to you / BioPerpl. > > > > best wishes & happy holidays > > > > Peter > > > > On Tue, 2004-12-14 at 09:00, Hilmar Lapp wrote: > >> Since load_seqdatabase.pl will use bioperl's SeqIO parsers for parsing > >> any input file, what you're asking is whether or not there is a SeqIO > >> parser for NCBI Gene. > >> > >> The answer to that question is no, not yet. Anybody who feels > >> motivated > >> is welcome to give it a try ... Since I'll need it, I'll write the > >> parser if nobody else does within the next 3 months, but I'm not going > >> to promise when exactly this will happen. > >> > >> -hilmar > >> > >> On Monday, December 13, 2004, at 08:03 AM, Law, Annie wrote: > >> > >>> Hi, > >>> > >>> I was wondering with regards to bioperl-db the scripts and schema and > >>> load_seqdatabase.pl has there been preparation for integration of > >>> Entrez > >>> gene information when locuslink is phased out? Or if it has already > >>> been > >>> changed could somebody point > >>> me to the documentation or changed code? > >>> > >>> Thanks, > >>> Annie. > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l@portal.open-bio.org > >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > > -- > > Peter N. Robinson > > peter.robinson@t-online.de > > peter.robinson@charite.de > > http://www.charite.de/ch/medgen/robinson/ > > > > -- Peter N. Robinson peter.robinson@t-online.de peter.robinson@charite.de http://www.charite.de/ch/medgen/robinson/ From jason.stajich at duke.edu Tue Jan 4 09:33:30 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Jan 4 09:34:55 2005 Subject: [Bioperl-l] Re: bug 1727 In-Reply-To: <1104835270.13556.9.camel@sb289.gbf-braunschweig.de> References: <41D32343.4070007@gbf.de> <348C2892-59E3-11D9-B264-000393C44276@gmail.com> <1104835270.13556.9.camel@sb289.gbf-braunschweig.de> Message-ID: <9ADB6CF6-5E5D-11D9-9C0C-000393C44276@duke.edu> There is some guessing done if you do not supply a -format => $formatstring when you initialize the Bio::SeqIO object. Please direct the questions to the mailing list. -jason On Jan 4, 2005, at 5:41 AM, Guido Dieterich wrote: > Hi Jason, > > does the? Bio::SeqIO checks if a file or filehandle is in the > appropiate format, eg. a fasta format? > It seems that not, or? > > Guido > > > > > > > Bio::SeqIO::swiss I believe. > > -jason > On Dec 29, 2004, at 4:36 PM, gdi wrote: > > > Hi jason, > > > > in which module is the bug (was)? > > > > Best regards Guido > > > > > -- > Jason Stajich > jason.stajich-at-gmail.com or jason-at-bioperl.org > http://jason.open-bio.org > > -- > > > Dr. Guido Dieterich > Dipl.-Biologe > > BioComputing > SB - Strukturbiologie \==-| > GBF - Gesellschaft fuer Biotechnologische Forschung \=/ > 0010010010100101110010 > German Research Centre for Biotechnology /-\ > /-==| > 0010100100111101010010 > WWW: http://www.gbf.de _/_/_/ _/_/_/ _/_/_/ |==-/ > EMAIL: gdi@gbf.de _/ _/ _/ _/ _/ \=/ > 0100100100010010010101 > _/ _/ _/ _/ /\ > Mascheroder Weg 1 _/ _/ _/_/_/ _/_/_/ /=-\ > 1101001010100101010101 > D-38124 Braunschweig _/ _/ _/ _/ _/ > Tel: +(49) 531 6181 745 _/ _/ _/ _/ _/ > FAX: +(49) 531 2612 388 _/_/_/ _/_/_/ _/ > > http://www.gbf.de/sb > > > Es ist nicht genug, zu wissen, man muss auch anwenden. > Es ist nicht genug, zu wollen, man muss auch tun. > JOHANN WOLFGANG VON GOETHE > Deutscher Dichter > (1749 - 1832) > > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 2492 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050104/dd97b16d/attachment.bin From rousse at ccr.jussieu.fr Tue Jan 4 08:11:36 2005 From: rousse at ccr.jussieu.fr (Guillaume Rousse) Date: Tue Jan 4 09:34:57 2005 Subject: [Bioperl-l] Need help for implementing a new TreeIO module In-Reply-To: <17F1ACAA-5DA9-11D9-B9B4-000393C44276@duke.edu> References: <41D96A82.8070202@ccr.jussieu.fr> <6ACDABCF-5DA2-11D9-B9B4-000393C44276@duke.edu> <41D97504.7020808@ccr.jussieu.fr> <17F1ACAA-5DA9-11D9-B9B4-000393C44276@duke.edu> Message-ID: <41DA9608.9080900@ccr.jussieu.fr> Jason Stajich wrote: > If you want to build a node with two leaves, first you have to start > with a 'tree' section to tell the handler that this is nested data. > Start a 'tree' event, build the node (like the section just above), then > build two leaf nodes (like the leaf node section above), then end the > 'tree' event. 'tree' is an unfortunate name for the event but don't > feel like changing it - a throwback from when I thought I'd only need an > initial 'tree' an just 'node' events. > > $self->_eventHandler->start_document; > $self->_eventHandler->start_element({'Name' => 'tree'}); > # do internal node > # do leaf node > # do leaf node > $self->_eventHandler->end_element({'Name' => 'tree'}); > return $self->_eventHandler->end_document; OK, done, but I still have an issue with each internal node connecting two leaves, producing a third intermediate leaf. I don't know if the problems comes from me or from bioperl. Here is my code, along with a test script. I you don't want to install Algorithm::Cluster to test, the input data is something as: -1: 5 4 0.000 -2: 7 6 0.000 -3: 10 11 0.010 -4: 2 0 0.090 -5: -3 12 0.095 -6: 1 -4 0.115 -7: -5 9 0.143 -8: -1 3 0.250 -9: -2 -7 0.618 -10: -8 -6 0.639 -11: 8 -10 5.805 -12: -9 -11 28.056 Where the first column is internal node id, the second and third one the children id for each node, and the fourth one the distance between the children. I also patched svggraph to use parameters instead of hard-coded values, and also to allow some normalisation for the branches lengths, in such a way that it would be easy to add new normalisation functions, including arbitrary code. Patch attached too. -- No flight ever leaves on time unless you are running late and need the delay to make the flight -- Murphy's Laws for Frequent Flyers n?1 -------------- next part -------------- #!/usr/bin/perl use Algorithm::Cluster; use Bio::TreeIO; use strict; my $weight = [ 1,1 ]; my $data = [ [ 1.1, 1.2 ], [ 1.4, 1.3 ], [ 1.1, 1.5 ], [ 2.0, 1.5 ], [ 1.7, 1.9 ], [ 1.7, 1.9 ], [ 5.7, 5.9 ], [ 5.7, 5.9 ], [ 3.1, 3.3 ], [ 5.4, 5.3 ], [ 5.1, 5.5 ], [ 5.0, 5.5 ], [ 5.1, 5.2 ], ]; my $mask = [ [ 1, 1 ], [ 1, 1 ], [ 1, 1 ], [ 1, 1 ], [ 1, 1 ], [ 1, 1 ], [ 1, 1 ], [ 1, 1 ], [ 1, 1 ], [ 1, 1 ], [ 1, 1 ], [ 1, 1 ], [ 1, 1 ], ]; my $labels = [ qw/a b c d e f g h i j k l m/ ]; my %params = ( applyscale => 0, transpose => 0, method => 'a', dist => 'e', data => $data, mask => $mask, weight => $weight, ); my ($result, $linkdist); my ($i,$j); ($result, $linkdist) = Algorithm::Cluster::treecluster(%params); $i=0; foreach(@{$result}) { printf("%3d: %3d %3d %7.3f\n",-1-$i,$_->[0],$_->[1],$linkdist->[$i]); ++$i; } my $in = new Bio::TreeIO( -format => 'cluster', -result => $result, -linkdist => $linkdist, -labels => $labels, ); my $out = new Bio::TreeIO( -format => 'svggraph', -file => '>output.svg' ); $out->write_tree($in->next_tree()); -------------- next part -------------- # $Id: nexus.pm,v 1.2 2003/12/06 18:10:26 jason Exp $ # # BioPerl module for Bio::TreeIO::cluster # # Contributed by Guillaume Rousse # # Copyright INRIA # # You may distribute this module under the same terms as perl itself # POD documentation - main docs before the code =head1 NAME Bio::TreeIO::cluster - A TreeIO driver module for parsing Algorithm::Cluster::treecluster output =head1 SYNOPSIS # do not use this module directly use Bio::TreeIO; use Algorithm::Cluster::treecluster; my ($result, $linkdist) = Algorithm::Cluster::treecluster( distances => $matrix ); my $treeio = new Bio::TreeIO( -format => 'cluster', -result => $result, -linkdist => $linkdist, -labels => $labels ); my $tree = $treeio->next_tree; =head1 DESCRIPTION This is a driver module for parsing Algorithm::Cluster::treecluster output. =head1 FEEDBACK =head2 Mailing Lists User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to the Bioperl mailing list. Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://bioperl.org/MailList.shtml - About the mailing lists =head2 Reporting Bugs Report bugs to the Bioperl bug tracking system to help us keep track of the bugs and their resolution. Bug reports can be submitted via the web: http://bugzilla.bioperl.org/ =head1 AUTHOR - Guillaume Rousse Email Guillaume-dot-Rousse-at-inria-dot-fr Describe contact details here =head1 CONTRIBUTORS Additional contributors names and emails here =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ =cut # Let the code begin... package Bio::TreeIO::cluster; use vars qw(@ISA); use strict; use Bio::TreeIO; use Bio::Event::EventGeneratorI; use IO::String; @ISA = qw(Bio::TreeIO); sub _initialize { my ($self, %args) = @_; $self->{_result} = $args{'-result'}; $self->{_linkdist} = $args{'-linkdist'}; $self->{_labels} = $args{'-labels'}; $self->SUPER::_initialize(%args); } =head2 next_tree Title : next_tree Usage : my $tree = $treeio->next_tree Function: Gets the next tree in the stream Returns : Bio::Tree::TreeI Args : none =cut sub next_tree { my ($self) = @_; $self->_eventHandler->start_document(); # build tree from the root $self->_eventHandler->start_element({Name => 'tree'}); $self->_recurse(-1, 0); $self->_recurse(-1, 1); $self->_eventHandler->end_element({Name => 'tree'}); return $self->_eventHandler->end_document; } sub _recurse { my ($self, $line, $column) = @_; my $id = $self->{_result}->[$line]->[$column]; if ($id >= 0) { # leaf $self->debug("leaf $id\n"); $self->debug("distance $self->{_linkdist}->[$line]\n"); $self->debug("label $self->{_labels}->[$id]\n"); $self->_eventHandler->start_element({Name => 'node'}); $self->_eventHandler->start_element({Name => 'branch_length'}); $self->_eventHandler->characters($self->{_linkdist}->[$line]); $self->_eventHandler->end_element({Name => 'branch_length'}); $self->_eventHandler->start_element({Name => 'id'}); $self->_eventHandler->characters($self->{_labels}->[$id]); $self->_eventHandler->end_element({Name => 'id'}); $self->_eventHandler->start_element({Name => 'leaf'}); $self->_eventHandler->characters(1); $self->_eventHandler->end_element({Name => 'leaf'}); $self->_eventHandler->end_element({Name => 'node'}); } else { # internal node $self->debug("internal node $id\n"); $self->debug("distance $self->{_linkdist}->[$line]\n"); $self->_eventHandler->start_element({Name => 'node'}); $self->_eventHandler->start_element({Name => 'branch_length'}); $self->_eventHandler->characters($self->{_linkdist}->[$line]); $self->_eventHandler->end_element({Name => 'branch_length'}); $self->_eventHandler->start_element({Name => 'leaf'}); $self->_eventHandler->characters(0); $self->_eventHandler->end_element({Name => 'leaf'}); $self->_eventHandler->start_element({Name => 'tree'}); my $child_id = - ($id + 1); $self->_recurse($child_id, 0); $self->_recurse($child_id, 1); $self->_eventHandler->end_element({Name => 'tree'}); $self->_eventHandler->end_element({Name => 'node'}); } } =head2 write_tree Title : write_tree Usage : Function: Sorry not possible with this format Returns : none Args : none =cut sub write_tree{ $_[0]->throw("Sorry the format 'cluster' can only be used as an input format"); } 1; -------------- next part -------------- --- /usr/lib/perl5/vendor_perl/5.8.6/Bio/TreeIO/svggraph.pm 2003-11-28 07:27:16.000000000 +0100 +++ Bio/TreeIO/svggraph.pm 2005-01-04 13:57:14.265334869 +0100 @@ -86,22 +86,16 @@ @ISA = qw(Bio::TreeIO ); -=head2 new - - Title : new - Usage : my $obj = new Bio::TreeIO::svggraph(); - Function: Builds a new Bio::TreeIO::svggraph object - Returns : Bio::TreeIO::svggraph - Args : - - -=cut - -sub new { - my($class,@args) = @_; - - my $self = $class->SUPER::new(@args); - +sub _initialize { + my ($self, %args) = @_; + $self->{_width} = $args{'-width'} || 1600; + $self->{_height} = $args{'-height'} || 1000; + $self->{_margin} = $args{'-margin'} || 30; + $self->{_stroke} = $args{'-stroke'} || 'black'; + $self->{_stroke_width} = $args{'-stroke_width'} || 2; + $self->{_font_size} = $args{'-font_size'} || '10px'; + $self->{_normalize} = $args{'-normalize'}; + $self->SUPER::_initialize(%args); } =head2 write_tree @@ -116,28 +110,35 @@ sub write_tree{ my ($self,$tree) = @_; - my $line = _write_tree_Helper($tree->get_root_node); + my $line = $self->_write_tree_Helper($tree->get_root_node); $self->_print($line. "\n"); $self->flush if $self->_flush_on_write && defined $self->_fh; return; } sub _write_tree_Helper { - my ($node) = @_; + my ($self,$node) = @_; - #this needs to be parameterized - my $graph = SVG::Graph->new(width=>1600,height=>1000,margin=>30); + my $graph = SVG::Graph->new( + width => $self->{_width}, + height => $self->{_height}, + margin => $self->{_margin} + ); my $group0 = $graph->add_frame; my $tree = SVG::Graph::Data::Tree->new; my $root = SVG::Graph::Data::Node->new; $root->name($node->id); - _decorateRoot($root, $node->each_Descendent()); + $self->_decorateRoot($root, $node->each_Descendent()); $tree->root($root); $group0->add_data($tree); - #this needs to be parameterized - $group0->add_glyph('tree', stroke=>'black','stroke-width'=>2,'font-size'=>'10px'); + $group0->add_glyph( + 'tree', + 'stroke' => $self->{_stroke}, + 'stroke-width' => $self->{_stroke_width}, + 'font-size' => $self->{_font_size} + ); return($graph->draw); } @@ -156,16 +157,21 @@ =cut sub _decorateRoot{ - my $previousNode = shift; - my @children = @_; - foreach my $child (@children) - { - my $currNode = SVG::Graph::Data::Node->new; - $currNode->branch_label($child->id); - $currNode->branch_length($child->branch_length); - $previousNode->add_daughter($currNode); - _decorateRoot($currNode, $child->each_Descendent()); - } + my ($self,$previousNode,@children) = @_; + foreach my $child (@children) { + my $currNode = SVG::Graph::Data::Node->new; + $currNode->branch_label($child->id); + my $length = $child->branch_length; + CASE: { + if ($self->{_normalize} eq 'log') { + $length = log($length + 1); + last CASE; + } + } + $currNode->branch_length($length); + $previousNode->add_daughter($currNode); + $self->_decorateRoot($currNode, $child->each_Descendent()); + } } =head2 next_tree From jason.stajich at duke.edu Tue Jan 4 10:45:53 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Jan 4 10:42:27 2005 Subject: [Bioperl-l] Need help for implementing a new TreeIO module In-Reply-To: <41DA9608.9080900@ccr.jussieu.fr> References: <41D96A82.8070202@ccr.jussieu.fr> <6ACDABCF-5DA2-11D9-B9B4-000393C44276@duke.edu> <41D97504.7020808@ccr.jussieu.fr> <17F1ACAA-5DA9-11D9-B9B4-000393C44276@duke.edu> <41DA9608.9080900@ccr.jussieu.fr> Message-ID: Okay. It is committed in CVS. Can see about getting you a CVS account if you want to adapt this more. I made the argument initialization a little more bioperl-like (using _rearrange). Your example code produces a sensible tree I believe, can you confirm that it works fine on your end too? (it may take up to 15 minutes for the anon CVS repository to sync from the read-write one). -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From rasa at obj.hopto.org Tue Jan 4 11:31:07 2005 From: rasa at obj.hopto.org (Rasa Gulbinaite) Date: Tue Jan 4 11:28:52 2005 Subject: [Bioperl-l] Fasta headers In-Reply-To: References: <1267.81.7.113.128.1104763270.squirrel@81.7.113.128> Message-ID: <1221.81.7.113.128.1104856267.squirrel@81.7.113.128> Thank you all very much. It was really helpful. Rasa > Rasa, > > On Unix: > >>grep '>' fasta-file > > Anywhere, with Perl: > >>perl -ne 'print if /^>/' fasta-file > > > Brian O. > > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Rasa > Gulbinaite > Sent: Monday, January 03, 2005 9:41 AM > To: bioperl-l@portal.open-bio.org > Subject: [Bioperl-l] Fasta headers > > > Hello, > > i'm new to bioperl and a bit confused with fasta file headers. I'm working > with SNPs and i would like to get only the fasta headers form the fasta > file, not the sequences. What would be the best way to do this? Thank you. > > Rasa > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > From dcn208 at nyu.edu Tue Jan 4 12:06:50 2005 From: dcn208 at nyu.edu (Damion Colin Nero) Date: Tue Jan 4 14:50:55 2005 Subject: [Bioperl-l] Script Request Message-ID: <4c46b64c3694.4c36944c46b6@nyu.edu> An HTML attachment was scrubbed... URL: http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050104/5764a4c2/attachment.htm From Guido.Dieterich at gbf.de Tue Jan 4 09:44:58 2005 From: Guido.Dieterich at gbf.de (Guido Dieterich) Date: Tue Jan 4 14:51:03 2005 Subject: [Bioperl-l] format checking Message-ID: <1104849898.13555.13.camel@sb289.gbf-braunschweig.de> Hi bioperlaner, $seqIO = Bio::SeqIO->new(-fh => \*FH, -format=>'fasta'); I tried this as a fasta file gene_name> PPPPGGGAAAAA # any Sequence this does not create an error does the Bio::SeqIO checks if a file or filehandle is in the appr opiate format,eg. a fasta format? It seems that not, or? Guido -- Dr. Guido Dieterich Dipl.-Biologe BioComputing SB - Strukturbiologie \==-| GBF - Gesellschaft fuer Biotechnologische Forschung \=/ 0010010010100101110010 German Research Centre for Biotechnology /-\ /-==| 0010100100111101010010 WWW: http://www.gbf.de _/_/_/ _/_/_/ _/_/_/ |==-/ EMAIL: gdi@gbf.de _/ _/ _/ _/ _/ \=/ 0100100100010010010101 _/ _/ _/ _/ /\ Mascheroder Weg 1 _/ _/ _/_/_/ _/_/_/ /=-\ 1101001010100101010101 D-38124 Braunschweig _/ _/ _/ _/ _/ Tel: +(49) 531 6181 745 _/ _/ _/ _/ _/ FAX: +(49) 531 2612 388 _/_/_/ _/_/_/ _/ http://www.gbf.de/sb Es ist nicht genug, zu wissen, man muss auch anwenden. Es ist nicht genug, zu wollen, man muss auch tun. JOHANN WOLFGANG VON GOETHE Deutscher Dichter (1749 - 1832) From rousse at ccr.jussieu.fr Tue Jan 4 11:55:55 2005 From: rousse at ccr.jussieu.fr (Guillaume Rousse) Date: Tue Jan 4 14:51:12 2005 Subject: [Bioperl-l] Need help for implementing a new TreeIO module In-Reply-To: References: <41D96A82.8070202@ccr.jussieu.fr> <6ACDABCF-5DA2-11D9-B9B4-000393C44276@duke.edu> <41D97504.7020808@ccr.jussieu.fr> <17F1ACAA-5DA9-11D9-B9B4-000393C44276@duke.edu> <41DA9608.9080900@ccr.jussieu.fr> Message-ID: <41DACA9B.9090108@ccr.jussieu.fr> Jason Stajich wrote: > Okay. It is committed in CVS. Can see about getting you a CVS account > if you want to adapt this more. > > I made the argument initialization a little more bioperl-like (using > _rearrange). Your example code produces a sensible tree I believe, can > you confirm that it works fine on your end too? (it may take up to 15 > minutes for the anon CVS repository to sync from the read-write one). It's OK, apart the issue reported earlier about final internal nodes. I'm joining the graphic output, and a dump of the tree. -- Undetectable errors are infinite in variety, in contrast to detectable errors, which by definition are limited. -- Murphy's Laws of Computer Programming n?14 -------------- next part -------------- A non-text attachment was scrubbed... Name: test.svg.bz2 Type: application/octet-stream Size: 1343 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050104/2ce0ada9/test.svg.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: test.dump.bz2 Type: application/octet-stream Size: 1087 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050104/2ce0ada9/test.dump.obj From sdavis2 at mail.nih.gov Tue Jan 4 15:21:42 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue Jan 4 15:20:01 2005 Subject: [Bioperl-l] Script Request In-Reply-To: <4c46b64c3694.4c36944c46b6@nyu.edu> References: <4c46b64c3694.4c36944c46b6@nyu.edu> Message-ID: <3F7EC9F4-5E8E-11D9-B000-000D933565E8@mail.nih.gov> Damion, While not a direct answer, you might look at the PDL library. PDL has numeric functions for working with matrices. As for averaging, you can simply add the numbers that you are interested in and divide by the number of elements. There are many ways to do this in perl. Check out: http://www.bu.edu/linguistics/UG/course/lx865/lab-perl.html Sean On Jan 4, 2005, at 12:06 PM, Damion Colin Nero wrote: > I am looking for a perl script that can average small groups of > numbers down columns (i.e. 50 out of 500 numbers).? I know this is a > simple thing to do but I am a new user and have been having problems > with it so if you have a script that I could use or that might be?a > good reference please let me know.? thanks. > > Damion Nero > Coruzzi Lab > Department of Biology > New York University > 766 Waverly building > New York, NY 10003-6688 > Tel: (212) 998-3963 > email: dcn208@nyu.edu > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From razi at genet.sickkids.on.ca Tue Jan 4 15:44:15 2005 From: razi at genet.sickkids.on.ca (Razi Khaja) Date: Tue Jan 4 15:40:52 2005 Subject: [Bioperl-l] Script Request In-Reply-To: <4c46b64c3694.4c36944c46b6@nyu.edu> Message-ID: <20050104204415.31989.qmail@web51607.mail.yahoo.com> Not sure if this is what you want but try the script below. You should read O'Reilly Learning Perl http://www.oreilly.com/catalog/lperl3/ #!/usr/bin/perl use strict; my( $file, $col ) = @ARGV; my $total=0; my $n=0; open( FILE, $file ); while( ) { my @field = split(/\s+/, $_); $total += $field[$col]; $n++; } close( FILE ); my $avg = $total / $n; print "$avg\n"; Razi Damion Colin Nero wrote: I am looking for a perl script that can average small groups of numbers down columns (i.e. 50 out of 500 numbers). I know this is a simple thing to do but I am a new user and have been having problems with it so if you have a script that I could use or that might be a good reference please let me know. thanks. Damion Nero Coruzzi Lab Department of Biology New York University 766 Waverly building New York, NY 10003-6688 Tel: (212) 998-3963 email: dcn208@nyu.edu _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l /** * Razi Khaja, Bioinformatics Analyst * The Hospital for Sick Children, Toronto * The Centre for Applied Genomics, www.tcag.ca * Tel 416-813-7032, Fax 416-813-8319 */ From brian_osborne at cognia.com Tue Jan 4 15:46:03 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Tue Jan 4 15:43:23 2005 Subject: [Bioperl-l] format checking In-Reply-To: <1104849898.13555.13.camel@sb289.gbf-braunschweig.de> Message-ID: Guido, The answer is sometimes yes, sometimes no, Bioperl doesn't appear to be consistent. This is Bug 1508: http://bugzilla.bioperl.org/show_bug.cgi?id=1508 Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Guido Dieterich Sent: Tuesday, January 04, 2005 9:45 AM To: Bioperl List Subject: [Bioperl-l] format checking Hi bioperlaner, $seqIO = Bio::SeqIO->new(-fh => \*FH, -format=>'fasta'); I tried this as a fasta file gene_name> PPPPGGGAAAAA # any Sequence this does not create an error does the Bio::SeqIO checks if a file or filehandle is in the appr opiate format,eg. a fasta format? It seems that not, or? Guido -- Dr. Guido Dieterich Dipl.-Biologe BioComputing SB - Strukturbiologie \==-| GBF - Gesellschaft fuer Biotechnologische Forschung \=/ 0010010010100101110010 German Research Centre for Biotechnology /-\ /-==| 0010100100111101010010 WWW: http://www.gbf.de _/_/_/ _/_/_/ _/_/_/ |==-/ EMAIL: gdi@gbf.de _/ _/ _/ _/ _/ \=/ 0100100100010010010101 _/ _/ _/ _/ /\ Mascheroder Weg 1 _/ _/ _/_/_/ _/_/_/ /=-\ 1101001010100101010101 D-38124 Braunschweig _/ _/ _/ _/ _/ Tel: +(49) 531 6181 745 _/ _/ _/ _/ _/ FAX: +(49) 531 2612 388 _/_/_/ _/_/_/ _/ http://www.gbf.de/sb Es ist nicht genug, zu wissen, man muss auch anwenden. Es ist nicht genug, zu wollen, man muss auch tun. JOHANN WOLFGANG VON GOETHE Deutscher Dichter (1749 - 1832) _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Tue Jan 4 16:03:42 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Jan 4 16:00:21 2005 Subject: [Bioperl-l] Entrez Gene and bioperl-db In-Reply-To: <1104871954.3102.24.camel@localhost.localdomain> References: <2ED9C47A-5898-11D9-AC01-000A959EB4C4@gmx.net> <1104792001.3186.17.camel@localhost.localdomain> <0F5A3AE4-5DDA-11D9-AA3C-000393C44276@duke.edu> <1104871954.3102.24.camel@localhost.localdomain> Message-ID: <1DA5FD5C-5E94-11D9-9C0C-000393C44276@duke.edu> On Jan 4, 2005, at 3:52 PM, Peter Robinson wrote: > Hi Jason, > > thanks for the advice. It seems as if the documentation of > Bio::DB::Taxonomy is a bit out of sync. > my $db = new Bio::DB::Taxonomy(-source => 'flatfile' > -nodesfile => $nodesfile, > -namesfile => $namefile); > What does 'flatfile' refer to here? It is not apparent upon looking at > the code for new. > See Bio::DB::Taxonomy::flatfile for more information. As I mentioned in the mail I sent, flatfile is for downloading the taxonomy DB from NCBI. This lets you run it locally using an indexed (BerkelyDB via DB_File) version of the file. You must need the most up-to-date verion of the modules - works fine for me for both the entrez and flatfile code, but you may have to upgrade off of the 1.4.0 release. Code from CVS or the bioperl-1.5 RC1 code should work fine. > I had somewhat better luck using the entrez version, but I got a > pretty amusing error > message: > > MSG: can't create a species object for Homo sapiens (human) because it > isn't a species but is a '' instead > > ### > Full error and a dump of the script follow: > > my $db = new Bio::DB::Taxonomy(-source => 'entrez'); # > my $taxaid = $db->get_taxonid('Homo sapiens'); > my $species = $db->get_Taxonomy_Node(-taxonid => '9606'); > print Dumper($species); > > ### > > Use of uninitialized value in string eq at > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 192. > Use of uninitialized value in sprintf at > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 201. > > -------------------- WARNING --------------------- > MSG: can't create a species object for Homo sapiens (human) because it > isn't a species but is a '' instead > --------------------------------------------------- > Use of uninitialized value in string eq at > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 192. > Use of uninitialized value in sprintf at > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 201. > > -------------------- WARNING --------------------- > MSG: can't create a species object for Homo sapiens (human) because it > isn't a species but is a '' instead > --------------------------------------------------- > $VAR1 = { > 'TaxId' => '9606', > 'Division' => 'mammals', > 'GeneNumber' => '32775', > 'Rank' => 'species', > 'ProtNumber' => '247791', > 'ScientificName' => 'Homo sapiens', > 'CommonName' => 'human', > 'NucNumber' => '9025800', > 'GenNumber' => '25', > 'StructNumber' => '5638' > }; > peter@anna:~/programs/bioperlTest$ > > > --best, peter > > On Mon, 2005-01-03 at 23:51, Jason Stajich wrote: >> Bio::DB::Taxonomy is the factory code - it is pretty easy to get a >> species object (or equivalent) using this code. But you cannot (or >> could not when I wrote this, not sure of the current status) get the >> full classification from the NCBI taxonomy retrieval via cgi. i.e. >> you >> can only get genus and species for a taxon id and I don't know how to >> walk up the hierarchy using the web API. Earlier emails to NCBI >> seemed >> to indicate this is all they intended to provide, but not sure what >> the >> current status is. >> >> my $db = new Bio::DB::Taxonomy(-source => 'entrez'); # use NCBI >> Entrez >> over HTTP >> my $taxaid = $db->get_taxonid('Homo sapiens'); >> my $taxonnode = $db->get_Taxonomy_Node(-taxonid => '9606'); >> >> You can get the full classification if you use the >> Bio::DB::Taxonomy::flatfile factory which requires you to have >> downloaded the taxonomy db flatfile from NCBI. Since this is more >> reliable (and faster) it is what I have tended to use for grouping >> sets >> of seqDB search results, etc. >> >> -jason >> On Jan 3, 2005, at 5:40 PM, Peter Robinson wrote: >> >>> Hi Bioperlers, hi Hilmar, >>> >>> after some thinking I have embarked on a lex/yacc parser for the >>> Entrez >>> Gene ASN.1 format as the way of least resistance, although I am not >>> sure >>> how that would fit in to BioPerl. If anyone is interested in this (or >>> has a better idea of how to go about it..), please drop me a line. >>> >>> In the meantime I have been looking at writing code to parse some of >>> the >>> "easy" Entrez gene documents, starting off with gene_info. This file >>> includes the NCBI taxon id for each entry. I would like to convert >>> this >>> to a Bio::Species object to pass to the following >>> my $seq = $self->sequence_factory->create( >>> -verbose => $self->verbose(), >>> -accession_number => $geneID, >>> -desc => $description, >>> -display_id => $symbol, >>> -species => ??? >>> -annotation => $ann); >>> >>> and saw the Bio::Taxonomy::FactoryI code, which appears to want to do >>> this sort of thing. However, the code for that is pretty preliminary. >>> Is >>> anyone working on this at the moment? Or is there a better way of >>> doing >>> this (it seems a shame not to provide the actual species name if one >>> has >>> the taxid...) >>> >>> best >>> >>> Peter >>> >>> >>> >>> On Tue, 2004-12-28 at 07:17, Hilmar Lapp wrote: >>>> Great to hear that someone is giving this a shot. Yes at this point >>>> is >>>> appears that NCBI is only offering the ASN.1, not a conversion to >>>> XML. >>>> Their asn2xml tool will not work with this ASN.1 format either, just >>>> checked it to be sure. They do seem to be mulling the option of XML >>>> though on the Gene FAQ. Maybe if enough people get in their ears >>>> they >>>> will spend some effort towards that. After all, the entrez gene web >>>> interface can display XML on demand - even though it looks fairly >>>> hideous. >>>> >>>> There is no ASN.1 support in bioperl at all. Also, ASN.1 support in >>>> perl is actually thin - there is Convert::ASN1 at version 0.18 two >>>> years ago that I could find ... doesn't make me feel warm and fuzzy. >>>> >>>> In the absence of any XML available from NCBI, gene_info might be >>>> the >>>> best start. An option could be to check for the presence of the >>>> other >>>> tab-delimited files and use those that are present. These are >>>> tab-delimited and hence the format itself is trivial so you can >>>> focus >>>> entirely on setting up a Bio::Seq plus annotation that's >>>> comparable/compatible to what the current SeqIO::locuslink does. >>>> >>>> My $0.02 (worth less and less almost every day). >>>> >>>> -hilmar >>>> >>>> On Thursday, December 23, 2004, at 10:51 AM, Peter Robinson wrote: >>>> >>>>> Hi, >>>>> >>>>> I have been thinking about given a BioPerl EntrezGene parser a try >>>>> since >>>>> I have been a heavy user of locus link to date. One issue is that >>>>> the >>>>> files that correspond to LL_tmpl (which was a flat file) are now in >>>>> asn >>>>> format >>>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/help/ >>>>> genehelp.html#query >>>>> Although I saw some mention of ASN support in Bioperl by googling, >>>>> I >>>>> can't seem to find any module that does this in the present >>>>> distribution. What is the status on that? In any case, I will be >>>>> working >>>>> on this in the next month or two and if anything nice comes of it I >>>>> will >>>>> send it to you / BioPerpl. >>>>> >>>>> best wishes & happy holidays >>>>> >>>>> Peter >>>>> >>>>> On Tue, 2004-12-14 at 09:00, Hilmar Lapp wrote: >>>>>> Since load_seqdatabase.pl will use bioperl's SeqIO parsers for >>>>>> parsing >>>>>> any input file, what you're asking is whether or not there is a >>>>>> SeqIO >>>>>> parser for NCBI Gene. >>>>>> >>>>>> The answer to that question is no, not yet. Anybody who feels >>>>>> motivated >>>>>> is welcome to give it a try ... Since I'll need it, I'll write the >>>>>> parser if nobody else does within the next 3 months, but I'm not >>>>>> going >>>>>> to promise when exactly this will happen. >>>>>> >>>>>> -hilmar >>>>>> >>>>>> On Monday, December 13, 2004, at 08:03 AM, Law, Annie wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I was wondering with regards to bioperl-db the scripts and schema >>>>>>> and >>>>>>> load_seqdatabase.pl has there been preparation for integration of >>>>>>> Entrez >>>>>>> gene information when locuslink is phased out? Or if it has >>>>>>> already >>>>>>> been >>>>>>> changed could somebody point >>>>>>> me to the documentation or changed code? >>>>>>> >>>>>>> Thanks, >>>>>>> Annie. >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l@portal.open-bio.org >>>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>> -- >>>>> Peter N. Robinson >>>>> peter.robinson@t-online.de >>>>> peter.robinson@charite.de >>>>> http://www.charite.de/ch/medgen/robinson/ >>>>> >>>>> >>> -- >>> Peter N. Robinson >>> peter.robinson@t-online.de >>> peter.robinson@charite.de >>> http://www.charite.de/ch/medgen/robinson/ >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> -- >> Jason Stajich >> jason.stajich at duke.edu >> http://www.duke.edu/~jes12/ > -- > Peter N. Robinson > peter.robinson@t-online.de > peter.robinson@charite.de > http://www.charite.de/ch/medgen/robinson/ > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From jateu001 at uni-duesseldorf.de Wed Jan 5 07:10:31 2005 From: jateu001 at uni-duesseldorf.de (Jan Teune) Date: Wed Jan 5 07:07:31 2005 Subject: [Bioperl-l] Bio::Biblio Message-ID: <41DBD937.3090506@uni-duesseldorf.de> Hello @ all, I'm writing a small script to fetch PubMed-Articles. Since two weeks before Christmas, I have a problem to fetch Articles. Below is some Code and the Error-Message: #!/usr/bin/perl -w use Bio::Biblio; my $pmid = "15542139"; my $biblio = new Bio::Biblio( -access => 'soap', -location => 'http://industry.ebi.ac.uk/soap/openBQS', -destroy_on_exit => '0', ); my $citation = $biblio->get_by_id($pmid); print $citation; The Error-Message: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: --- TRANSPORT ERROR --- 502 Proxy Error STACK: Error::throw STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:328 STACK: try{} block /usr/share/perl5/Bio/DB/Biblio/soap.pm:119 STACK: SOAP::Lite::call /usr/share/perl5/SOAP/Lite.pm:3006 STACK: try{} block /usr/share/perl5/SOAP/Lite.pm:2950 STACK: Bio::DB::Biblio::soap::get_by_id /usr/share/perl5/Bio/DB/Biblio/soap.pm:368 STACK: ./bibliotest.pl:9 ----------------------------------------------------------- I'm happy for any kind of help, Jan :-) From khufaz83 at yahoo.com Wed Jan 5 09:34:53 2005 From: khufaz83 at yahoo.com (hafiz hafiz) Date: Wed Jan 5 09:31:40 2005 Subject: [Bioperl-l] Change format in CGI. Message-ID: <20050105143453.93548.qmail@web52509.mail.yahoo.com> hii.. i have done to built searching sequence by seqIO in url but i still can't change format using seqIO in URL why? please refer this url: http://tiara.cs.usm.my/Bioprotein.html first select searching by sequence and then select libary swissprot and enter a sequence. after that u can see XML, swissprot, fasta and Pir Menu. that the change format function and it still not working in url but it working well in our terminal. ________________________________________________________________________ Yahoo! Messenger - Communicate instantly..."Ping" your friends today! Download Messenger Now http://uk.messenger.yahoo.com/download/index.html From suzuki at cbl.umces.edu Wed Jan 5 12:51:18 2005 From: suzuki at cbl.umces.edu (Marcelino Suzuki) Date: Wed Jan 5 12:48:05 2005 Subject: [Bioperl-l] OS X bioperl, staden/read, install problems Message-ID: <67117554-5F42-11D9-AC27-0003939E064E@cbl.umces.edu> Well, I tried all I could, but I keep getting the same error as srikanth patury: http://bioperl.org/pipermail/bioperl-guts-l/2004-June/016855.html Got to the point I got no errors in the make step of bioperl-ext I installed io_lib (1.8.11) I copied os.h and config.h to /usr/local/include/io_lib I changed os.h to remove the "<:" and ">" around config.h I ranlib /usr/local/lib/libread.a after an error message I also ranlib /usr/local/bioperl-ext-1.4/Bio/Ext/Align/libs/libsw.a Is there any other thing that needs to be done. I am using perl 5.8.1 Thanks Marcelino suzuki suzuki at cbl dot umces dot edu From Marc.Logghe at devgen.com Wed Jan 5 14:10:12 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Wed Jan 5 14:07:17 2005 Subject: [Bioperl-l] Bio::Biblio Message-ID: Happy Newyear to you all ! Jan, there is nothing to worry about concerning your code. It is just that there are some electricity problems at the EBI which caused the soap server to be down for while. So, most of the soap services are currently unavailable. The original message you find on the Taverna mailing list: http://sourceforge.net/mailarchive/forum.php?thread_id=6274312&forum_id=35847 So, I guess there is nothing that you can do besides waiting ;-) HTH, Marc -----Oorspronkelijk bericht----- Van: bioperl-l-bounces@portal.open-bio.org namens Jan Teune Verzonden: wo 5-1-2005 13:10 Aan: bioperl-l@bioperl.org Onderwerp: [Bioperl-l] Bio::Biblio Hello @ all, I'm writing a small script to fetch PubMed-Articles. Since two weeks before Christmas, I have a problem to fetch Articles. Below is some Code and the Error-Message: #!/usr/bin/perl -w use Bio::Biblio; my $pmid = "15542139"; my $biblio = new Bio::Biblio( -access => 'soap', -location => 'http://industry.ebi.ac.uk/soap/openBQS', -destroy_on_exit => '0', ); my $citation = $biblio->get_by_id($pmid); print $citation; The Error-Message: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: --- TRANSPORT ERROR --- 502 Proxy Error STACK: Error::throw STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:328 STACK: try{} block /usr/share/perl5/Bio/DB/Biblio/soap.pm:119 STACK: SOAP::Lite::call /usr/share/perl5/SOAP/Lite.pm:3006 STACK: try{} block /usr/share/perl5/SOAP/Lite.pm:2950 STACK: Bio::DB::Biblio::soap::get_by_id /usr/share/perl5/Bio/DB/Biblio/soap.pm:368 STACK: ./bibliotest.pl:9 ----------------------------------------------------------- I'm happy for any kind of help, Jan :-) _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From golharam at umdnj.edu Wed Jan 5 15:41:33 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Wed Jan 5 15:33:03 2005 Subject: [Bioperl-l] Error parsing Genbank file Message-ID: <007501c4f366$f1a14bb0$a6028a0a@GOLHARMOBILE1> Hi all, I have a Genbank file that Bio::SeqIO:genbank.pm is choking on. The entry is just a WGS entry referencing a bunch of other entries. It does on line 492 with the error "Unexpected error in feature table for Skipping feature, attempting to recover". I'm using the following code: #!/usr/bin/perl use strict; use Bio::SeqIO; my $usage = "$0 \n"; my $file = shift or die $usage; my $outfilename = shift or die $usage; my $infile = Bio::SeqIO->new('-file' => "<$file", '-format' => "genbank"); my $outfile = Bio::SeqIO->new(-'file' => ">$outfilename", '-format' => "fasta"); while (my $seq = $infile->next_seq) { # print STDERR $seq->accession_number,"\n"; $outfile->write_seq($seq); } Here is the contents of the genbank entry: LOCUS CAAB01000000 12381 rc DNA linear VRT 22-AUG-2002 DEFINITION Takifugu rubripes whole genome shotgun sequencing project. ACCESSION CAAB00000000 VERSION CAAB00000000.1 GI:22418063 KEYWORDS WGS. SOURCE Takifugu rubripes (Fugu rubripes) ORGANISM Takifugu rubripes Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Actinopterygii; Neopterygii; Teleostei; Euteleostei; Neoteleostei; Acanthomorpha; Acanthopterygii; Percomorpha; Tetraodontiformes; Tetradontoidea; Tetraodontidae; Takifugu. REFERENCE 1 (bases 1 to 12381) AUTHORS The Fugu Genome Sequencing Consortium. TITLE Direct Submission JOURNAL Submitted (01-JUL-2002) The Fugu Genome Sequencing Consortium, http://www.fugubase.org/ http://www.jgi.doe.gov/fugu COMMENT The Takifugu rubripes whole genome shotgun (WGS) project has the project accession CAAB00000000. This version of the project (01) has the accession number CAAB01000000, and consists of sequences CAAB01000001-CAAB01012381. FEATURES Location/Qualifiers source 1..12381 /organism="Takifugu rubripes" /mol_type="genomic DNA" /db_xref="taxon:31033" WGS CAAB01000001-CAAB01012381 // ----- Ryan Golhar Computational Biologist The Informatics Institute at The University of Medicine & Dentistry of NJ Phone: 973-972-5034 Fax: 973-972-7412 Email: golharam@umdnj.edu From kmdaily at indiana.edu Wed Jan 5 15:48:57 2005 From: kmdaily at indiana.edu (Daily, Kenneth Michael) Date: Wed Jan 5 15:46:08 2005 Subject: [Bioperl-l] reading multiple swissprot records from a single file Message-ID: I'm having trouble using bioperl to parse a file with multiple (thousands) of swissprot records in them. Is there a way to do this with SeqIO and such? The way I understand it, if I use a filehandle to read in the data, it still is expecting only one record in the file. Can I use a FH to read in a record, which ends with //, then put this variable into a SeqIO object to manpulate it? I need to look at each record and decide if I want to keep it based on the features it has. I have a program using standard parsing techniques but want to do this with bioperl if possible. Thanks for any help. Kenny Daily IU School of Informatics kmdaily at indiana dot edu From brian_osborne at cognia.com Wed Jan 5 16:07:28 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Wed Jan 5 16:06:14 2005 Subject: [Bioperl-l] reading multiple swissprot records from a single file In-Reply-To: Message-ID: Kenny, It would be something like: use strict; use Bio::SeqIO; my $seqio = Bio::SeqIO->(-file => "sprot42.dat", -format => "swiss"); while (my $seqobj = $seqio->next_seq) { # you now have a Sequence object, you can check its features } This would be the "standard" way. Yes, SeqIO understands a file handle as well but there's no need to do it that way, I don't think. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Daily, Kenneth Michael Sent: Wednesday, January 05, 2005 3:49 PM To: bioperl-l@portal.open-bio.org Subject: [Bioperl-l] reading multiple swissprot records from a single file I'm having trouble using bioperl to parse a file with multiple (thousands) of swissprot records in them. Is there a way to do this with SeqIO and such? The way I understand it, if I use a filehandle to read in the data, it still is expecting only one record in the file. Can I use a FH to read in a record, which ends with //, then put this variable into a SeqIO object to manpulate it? I need to look at each record and decide if I want to keep it based on the features it has. I have a program using standard parsing techniques but want to do this with bioperl if possible. Thanks for any help. Kenny Daily IU School of Informatics kmdaily at indiana dot edu _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Wed Jan 5 16:36:55 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Jan 5 16:33:27 2005 Subject: [Bioperl-l] Error parsing Genbank file In-Reply-To: <007501c4f366$f1a14bb0$a6028a0a@GOLHARMOBILE1> References: <007501c4f366$f1a14bb0$a6028a0a@GOLHARMOBILE1> Message-ID: We can't parse WGS files. The fix it needs is very similar to how we handle CONTIG entries if you want to have a go at fixing it. On Jan 5, 2005, at 3:41 PM, Ryan Golhar wrote: > Hi all, > > I have a Genbank file that Bio::SeqIO:genbank.pm is choking on. The > entry is just a WGS entry referencing a bunch of other entries. It > does > on line 492 with the error "Unexpected error in feature table for > Skipping feature, attempting to recover". > > I'm using the following code: > > #!/usr/bin/perl > > use strict; > use Bio::SeqIO; > > my $usage = "$0 \n"; > my $file = shift or die $usage; > my $outfilename = shift or die $usage; > > my $infile = Bio::SeqIO->new('-file' => "<$file", > '-format' => "genbank"); > > my $outfile = Bio::SeqIO->new(-'file' => ">$outfilename", > '-format' => "fasta"); > > while (my $seq = $infile->next_seq) { > # print STDERR $seq->accession_number,"\n"; > > $outfile->write_seq($seq); > } > > Here is the contents of the genbank entry: > > LOCUS CAAB01000000 12381 rc DNA linear VRT > 22-AUG-2002 > DEFINITION Takifugu rubripes whole genome shotgun sequencing project. > ACCESSION CAAB00000000 > VERSION CAAB00000000.1 GI:22418063 > KEYWORDS WGS. > SOURCE Takifugu rubripes (Fugu rubripes) > ORGANISM Takifugu rubripes > Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; > Euteleostomi; > Actinopterygii; Neopterygii; Teleostei; Euteleostei; > Neoteleostei; > Acanthomorpha; Acanthopterygii; Percomorpha; > Tetraodontiformes; > Tetradontoidea; Tetraodontidae; Takifugu. > REFERENCE 1 (bases 1 to 12381) > AUTHORS The Fugu Genome Sequencing Consortium. > TITLE Direct Submission > JOURNAL Submitted (01-JUL-2002) The Fugu Genome Sequencing > Consortium, > http://www.fugubase.org/ http://www.jgi.doe.gov/fugu > COMMENT The Takifugu rubripes whole genome shotgun (WGS) project > has > the > project accession CAAB00000000. This version of the > project > (01) > has the accession number CAAB01000000, and consists of > sequences > CAAB01000001-CAAB01012381. > FEATURES Location/Qualifiers > source 1..12381 > /organism="Takifugu rubripes" > /mol_type="genomic DNA" > /db_xref="taxon:31033" > WGS CAAB01000001-CAAB01012381 > // > > > > ----- > Ryan Golhar > Computational Biologist > The Informatics Institute at > The University of Medicine & Dentistry of NJ > > Phone: 973-972-5034 > Fax: 973-972-7412 > Email: golharam@umdnj.edu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From jason.stajich at duke.edu Wed Jan 5 16:44:57 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Jan 5 16:41:27 2005 Subject: [Bioperl-l] reading multiple swissprot records from a single file In-Reply-To: References: Message-ID: <0ABA4437-5F63-11D9-AC45-000393C44276@duke.edu> It reads a stream of data which is delimited by the '//'. It only processes one at a time. You just keep calling next_seq until it gets to the end of the file or filehandle. That is why we typically construct the usage with a while loop. For example if you wanted to make a new file which only had your keepers in it. my $in = Bio::SeqIO->new(-format => 'swiss', -file => 'sprot42.dat'); my $out = Bio::SeqIO->new(-format=> 'swiss', -file =>'>keepers.swiss'); while( my $seq =$in->next_seq ) { my $keep = 0; for my $feature ($seq->get_SeqFeatures ) { # figure out if feature criteria is met, if so, set $keep =1; } if($keep) { $out->write_seq($seq); } } If you wanted to use a filehandle instead of a file just use the -fh parameter instead of -file. See Bio::Root::IO for more information. This might be useful if you were streaming in zcat [zcat reads gzipped files and produces a stream of the unzipped data]. open(FH, "zcat sprot42.dat.gz |") || die("could not open file with zcat"); # the trailing '|' is necessary to tell perl to pipe the output my $in = Bio::SeqIO->new(-fh => \*FH, -format=> 'swiss'); OR save the handle in a variable my $fh; open($fh, "zcat sprot42.dat.gz |") || die("could not open file with zcat"); # the trailing '|' is necessary to tell perl to pipe the output my $in = Bio::SeqIO->new(-fh => $fh, -format=> 'swiss'); -jason On Jan 5, 2005, at 3:48 PM, Daily, Kenneth Michael wrote: > I'm having trouble using bioperl to parse a file with multiple > (thousands) of swissprot records in them. Is there a way to do this > with SeqIO and such? The way I understand it, if I use a filehandle to > read in the data, it still is expecting only one record in the file. > Can I use a FH to read in a record, which ends with //, then put this > variable into a SeqIO object to manpulate it? I need to look at each > record and decide if I want to keep it based on the features it has. I > have a program using standard parsing techniques but want to do this > with bioperl if possible. Thanks for any help. > > Kenny Daily > IU School of Informatics > kmdaily at indiana dot edu > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From allenday at ucla.edu Wed Jan 5 18:41:43 2005 From: allenday at ucla.edu (Allen Day) Date: Wed Jan 5 17:39:42 2005 Subject: [Bioperl-l] Bio::Biblio In-Reply-To: References: Message-ID: Jan, You might consider using the NCBI database via their 'eutils' service instead. Have a look at Bio::DB::Biblio::eutils. I find it's more reliable and that PubMed is more up-to-date and complete than that EBI server. -Allen On Wed, 5 Jan 2005, Marc Logghe wrote: > Happy Newyear to you all ! > > Jan, there is nothing to worry about concerning your code. > It is just that there are some electricity problems at the EBI which caused the soap server to be down for while. > So, most of the soap services are currently unavailable. > The original message you find on the Taverna mailing list: > http://sourceforge.net/mailarchive/forum.php?thread_id=6274312&forum_id=35847 > > So, I guess there is nothing that you can do besides waiting ;-) > > HTH, > Marc > > > -----Oorspronkelijk bericht----- > Van: bioperl-l-bounces@portal.open-bio.org namens Jan Teune > Verzonden: wo 5-1-2005 13:10 > Aan: bioperl-l@bioperl.org > Onderwerp: [Bioperl-l] Bio::Biblio > > Hello @ all, > I'm writing a small script to fetch PubMed-Articles. Since two weeks > before Christmas, I have a problem to fetch Articles. Below is some Code > and the Error-Message: > > #!/usr/bin/perl -w > use Bio::Biblio; > my $pmid = "15542139"; > my $biblio = new Bio::Biblio( > -access => 'soap', > -location => 'http://industry.ebi.ac.uk/soap/openBQS', > -destroy_on_exit => '0', > ); > my $citation = $biblio->get_by_id($pmid); > print $citation; > The Error-Message: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: --- TRANSPORT ERROR --- > 502 Proxy Error > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:328 > STACK: try{} block /usr/share/perl5/Bio/DB/Biblio/soap.pm:119 > STACK: SOAP::Lite::call /usr/share/perl5/SOAP/Lite.pm:3006 > STACK: try{} block /usr/share/perl5/SOAP/Lite.pm:2950 > STACK: Bio::DB::Biblio::soap::get_by_id > /usr/share/perl5/Bio/DB/Biblio/soap.pm:368 > STACK: ./bibliotest.pl:9 > ----------------------------------------------------------- > > I'm happy for any kind of help, > > Jan :-) > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From wes.barris at csiro.au Wed Jan 5 17:56:08 2005 From: wes.barris at csiro.au (Wes Barris) Date: Wed Jan 5 17:52:40 2005 Subject: [Bioperl-l] SeqIO fails on masked sequences In-Reply-To: References: Message-ID: <41DC7088.7010101@csiro.au> Nathan Haigh wrote: > Ok, the "bug" seems to have been introduced in the last update to Bio::PrimarySeq.pm (v1.83) where X was added to the list of > ambiguous characters in the _guess_alphabet subroutine. > > Brian - do you remember why/what this was for? > > Nathan Hi Nathan, I was just curious if you have found anything out regarding this? > > > >>-----Original Message----- >>From: Marc Logghe [mailto:Marc.Logghe@devgen.com] >>Sent: 16 December 2004 10:44 >>To: nathanhaigh@ukonline.co.uk; Wes Barris >>Cc: Bioperl Mailing List >>Subject: RE: [Bioperl-l] SeqIO fails on masked sequences >> >> >>>When I use the script you supplied, I get the exception shown below. >>> >>>I'll try to get to the bottom of this. >>> >>>In the meantime, what OS are you both using and what version >>>of Bioperl? >>> >> >>Ah, yes that explains. Too much fiddling with PERL5LIB is not good ;-) >>I did not realize I was acutally using bioperl 1.4.0. There it worked. >>It fails indeed when using bioperl-release-1-5-0-rc1. >>Apologies for confusing you people. >>Cheers, >>Marc >>--- >>avast! Antivirus: Inbound message clean. >>Virus Database (VPS): 0451-1, 14/12/2004 >>Tested on: 16/12/2004 10:47:57 >>avast! is copyright (c) 2000-2003 ALWIL Software. >>http://www.avast.com >> >> > > > --- > avast! Antivirus: Outbound message clean. > Virus Database (VPS): 0451-1, 14/12/2004 > Tested on: 16/12/2004 11:21:54 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > > -- Wes Barris E-Mail: Wes.Barris@csiro.au From amackey at pcbi.upenn.edu Wed Jan 5 18:28:24 2005 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Wed Jan 5 18:25:16 2005 Subject: [Bioperl-l] OS X bioperl, staden/read, install problems In-Reply-To: <67117554-5F42-11D9-AC27-0003939E064E@cbl.umces.edu> References: <67117554-5F42-11D9-AC27-0003939E064E@cbl.umces.edu> Message-ID: <41DC7818.8060104@pcbi.upenn.edu> Try editing Bio/SeqIO/staden/read.pm to include "-lz" in LIBS -Aaron Marcelino Suzuki wrote: > Well, I tried all I could, but I keep getting the same error as > srikanth patury: > > http://bioperl.org/pipermail/bioperl-guts-l/2004-June/016855.html > > Got to the point I got no errors in the make step of bioperl-ext > > I installed io_lib (1.8.11) > I copied os.h and config.h to /usr/local/include/io_lib > I changed os.h to remove the "<:" and ">" around config.h > I ranlib /usr/local/lib/libread.a > after an error message I also > ranlib /usr/local/bioperl-ext-1.4/Bio/Ext/Align/libs/libsw.a > > Is there any other thing that needs to be done. > > I am using perl 5.8.1 > > Thanks > > Marcelino suzuki > > suzuki at cbl dot umces dot edu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From rfsouza at cecm.usp.br Wed Jan 5 16:08:45 2005 From: rfsouza at cecm.usp.br (rfsouza@cecm.usp.br) Date: Wed Jan 5 18:37:00 2005 Subject: [Bioperl-l] Bug in SeqIO/swiss.pm Message-ID: <34156.143.107.52.69.1104959325.squirrel@webmail.cecm.usp.br> Hi, I have found what might be a bug in the SeqIO parser for Swissprot flat files (swiss.pm). The error message printed is Invalid [] range "a-S" before HERE mark in regex m/^Cotton leaf curl Gezira virus - [Okra-S << HERE hambat]$/ at /home/users/rfsouza/projects/geral/lib/perl5/site_perl/5.8.1/Bio/SeqIO/swiss.pm line 985, line 10. and the Swissprot entry is pasted below. The problem is a match operator at line 985: 984 #if the organism belongs to taxid 32644 then no Bio::Species object. 985 return if grep { /^$binomial$/ } @Unknown_names; I managed to fix this and have swiss.pm to parse the entire Uniprot release 2.1 by adding this line $binomial =~ s/(\[|\])/\\$1/g; just before line 985. Would anybody like to add this fix to the CVS version of swiss.pm? Since this is the only entry which swiss.pm was not able to parse, out of 1520915 entries in Uniprot, I was considering if it is not an annotation error in Uniprot, violating their own standard... Greeting and happy new year :). Robson #============== ID Q8UYF6 STANDARD; PRT; 258 AA. AC Q8UYF6; DT 01-MAR-2002 (TrEMBLrel. 20, Created) DT 01-MAR-2002 (TrEMBLrel. 20, Last sequence update) DT 01-MAR-2004 (TrEMBLrel. 26, Last annotation update) DE Coat protein. OS Cotton leaf curl Gezira virus - [Okra-Shambat]. OC Viruses; ssDNA viruses; Geminiviridae; Begomovirus. OX NCBI_TaxID=268964; RN [1] RP SEQUENCE FROM N.A. RA Idris A.M., Brown J.K.; RT "Molecular analysis of cotton leaf curl virus-Sudan reveals an RT evolutionary history of recombination."; RL Virus Genes 0:0-0(2002). DR EMBL; AY036008; AAK64541.1; -. DR GO; GO:0019028; C:viral capsid; IEA. DR GO; GO:0005198; F:structural molecule activity; IEA. DR InterPro; IPR000650; Gem_coat_AR1. DR InterPro; IPR000263; GV_A/BR1_coat. DR Pfam; PF00844; Gemini_coat; 1. DR PRINTS; PR00224; GEMCOATAR1. DR PRINTS; PR00223; GEMCOATARBR1. DR ProDom; PD000901; Gem_coat_AR1; 1. KW Coat protein. SQ SEQUENCE 258 AA; 29778 MW; 6FB1960A9D8763DD CRC64; MSKRPADIII STPASKVRRR LNFDSPGLSS ARAPTVLVTN KRRSWTNRPT YRKPRMYRMY RSPDVPKGCE GPCKVQSYEQ RDDIKHTGIV RCVSDVTKGV GITHRTGKRF TIKSIYILGK VWMDDNIKKQ NHTNNVMFFL VRDRRPYGNS PLDFGQVFNM FDNEPSTATV KNDLRDHFQV LRKFTATVIG GPSGMKEQAL VRRFYRINSQ IVYNHQEAGK FENHTENAIL LYMACTHASN PVYATLKIRI YFYDSVSN // From jason.stajich at duke.edu Wed Jan 5 20:14:51 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Jan 5 20:12:01 2005 Subject: [Bioperl-l] Bug in SeqIO/swiss.pm In-Reply-To: <34156.143.107.52.69.1104959325.squirrel@webmail.cecm.usp.br> References: <34156.143.107.52.69.1104959325.squirrel@webmail.cecm.usp.br> Message-ID: <5DA94780-5F80-11D9-A383-000393C44276@duke.edu> Thanks for the report. I believe I fixed this in my Nov 22 commit- revision 1.84- of Bio/SeqIO/swiss.pm so it will be in bioperl 1.5 or it is currently available from the code in CVS. -jason On Jan 5, 2005, at 4:08 PM, rfsouza@cecm.usp.br wrote: > Hi, > > I have found what might be a bug in the SeqIO parser for Swissprot > flat files > (swiss.pm). The error message printed is > > Invalid [] range "a-S" before HERE mark in regex m/^Cotton leaf curl > Gezira virus - [Okra-S << HERE hambat]$/ at > /home/users/rfsouza/projects/geral/lib/perl5/site_perl/5.8.1/Bio/ > SeqIO/swiss.pm > line 985, line 10. > > and the Swissprot entry is pasted below. The problem is a match > operator > at line 985: > > 984 #if the organism belongs to taxid 32644 then no Bio::Species > object. > 985 return if grep { /^$binomial$/ } @Unknown_names; > > I managed to fix this and have swiss.pm to parse the entire Uniprot > release > 2.1 by adding this line > > $binomial =~ s/(\[|\])/\\$1/g; > > just before line 985. Would anybody like to add this fix to the CVS > version of swiss.pm? Since this is the only entry which swiss.pm was > not > able to > parse, out of 1520915 entries in Uniprot, I was considering if it is > not an > annotation error in Uniprot, violating their own standard... > > Greeting and happy new year :). > Robson > > #============== > > ID Q8UYF6 STANDARD; PRT; 258 AA. > AC Q8UYF6; > DT 01-MAR-2002 (TrEMBLrel. 20, Created) > DT 01-MAR-2002 (TrEMBLrel. 20, Last sequence update) > DT 01-MAR-2004 (TrEMBLrel. 26, Last annotation update) > DE Coat protein. > OS Cotton leaf curl Gezira virus - [Okra-Shambat]. > OC Viruses; ssDNA viruses; Geminiviridae; Begomovirus. > OX NCBI_TaxID=268964; > RN [1] > RP SEQUENCE FROM N.A. > RA Idris A.M., Brown J.K.; > RT "Molecular analysis of cotton leaf curl virus-Sudan reveals an > RT evolutionary history of recombination."; > RL Virus Genes 0:0-0(2002). > DR EMBL; AY036008; AAK64541.1; -. > DR GO; GO:0019028; C:viral capsid; IEA. > DR GO; GO:0005198; F:structural molecule activity; IEA. > DR InterPro; IPR000650; Gem_coat_AR1. > DR InterPro; IPR000263; GV_A/BR1_coat. > DR Pfam; PF00844; Gemini_coat; 1. > DR PRINTS; PR00224; GEMCOATAR1. > DR PRINTS; PR00223; GEMCOATARBR1. > DR ProDom; PD000901; Gem_coat_AR1; 1. > KW Coat protein. > SQ SEQUENCE 258 AA; 29778 MW; 6FB1960A9D8763DD CRC64; > MSKRPADIII STPASKVRRR LNFDSPGLSS ARAPTVLVTN KRRSWTNRPT YRKPRMYRMY > RSPDVPKGCE GPCKVQSYEQ RDDIKHTGIV RCVSDVTKGV GITHRTGKRF TIKSIYILGK > VWMDDNIKKQ NHTNNVMFFL VRDRRPYGNS PLDFGQVFNM FDNEPSTATV KNDLRDHFQV > LRKFTATVIG GPSGMKEQAL VRRFYRINSQ IVYNHQEAGK FENHTENAIL LYMACTHASN > PVYATLKIRI YFYDSVSN > // > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From suzuki at cbl.umces.edu Wed Jan 5 21:20:57 2005 From: suzuki at cbl.umces.edu (Marcelino Suzuki) Date: Wed Jan 5 21:17:37 2005 Subject: [Bioperl-l] OS X bioperl, staden/read, install problems In-Reply-To: <41DC7818.8060104@pcbi.upenn.edu> References: <67117554-5F42-11D9-AC27-0003939E064E@cbl.umces.edu> <41DC7818.8060104@pcbi.upenn.edu> Message-ID: <99CA1AE8-5F89-11D9-AC27-0003939E064E@cbl.umces.edu> Aaron. Your suggestion worked, but I had to do some tweeking, since editing the file Bio/SeqIO/staden/read.pm on the directory that I untared from current_ext_stable.tar resulted in no errrors for 'make test' and 'make install' but I still got errors running a test file. I edited read.pm as suggested and in /Library/Perl/5.8.1/Bio/SeqIO/staden/read.pm the ran the make in that directory. I have no errors so I assume it is finally installed. This was really quite a puzzle so I hope the following will help someone else who is trying to install bioperl-ext under OSX (I guess the same is true for other systems). For someone else is trying to install bioperl-ext : I did not have Inline so: sudo perl -MCPAN -e 'install Inline' then I installed the staden io_lib 1.8.11 that is at ftp://ftp.mrc-lmb.cam.ac.uk/pub/staden/io_lib/ after untaring in /usr/local cd /usr/local/io_lib-1.8.11 ./configure sudo make sudo make install no problems at all under OSX Panther After numerous suggestions from the web I moved both os.h and config.h from /usr/local/io_lib-1.8.11 to /usr/local/include/io_lib. I edited os.h to remove the "<:" and ">" around config.h. Then I untared in /usr/local the current_ext_stable.tar from the bioperl distribution, and: cd /usr/local/bioperl-ext-1.4 ++++ I edited (following Aaron Mackey's suggestion) /usr/local/bioperl-ext-1.4/Bio/SeqIO/staden/read.pm and added the -lz option to LIBS in line 81, then ranlib /usr/local/lib/libread.a perl Makefile.pl IOLIB_LIB=/usr/local/lib IOLIB_INC=/usr/local/include/io_lib make I got an error message and did ranlib /usr/local/bioperl-ext-1.4/Bio/Ext/Align/libs/libsw.a make make test make install No errors so I ran a test and still got the same old error: The extension 'Bio::SeqIO::staden::read' is not properly installed in path: '/Library/Perl/5.8.1' I repeated everything after ++++ above in the directory /Library/Perl The 'make test' did not work, but now I don't get the error running my perl test script Interestingly I tried to install the whole thing the exact same way on my powerbook, and I am having yet a different type of error: like in http://bioperl.org/pipermail/bioperl-l/2004-January/014481.htm Marcelino On Jan 5, 2005, at 6:28 PM, Aaron J. Mackey wrote: > > Try editing Bio/SeqIO/staden/read.pm to include "-lz" in LIBS > > -Aaron > > Marcelino Suzuki wrote: > >> Well, I tried all I could, but I keep getting the same error as >> srikanth patury: >> http://bioperl.org/pipermail/bioperl-guts-l/2004-June/016855.html >> Got to the point I got no errors in the make step of bioperl-ext >> I installed io_lib (1.8.11) >> I copied os.h and config.h to /usr/local/include/io_lib >> I changed os.h to remove the "<:" and ">" around config.h >> I ranlib /usr/local/lib/libread.a >> after an error message I also >> ranlib /usr/local/bioperl-ext-1.4/Bio/Ext/Align/libs/libsw.a >> Is there any other thing that needs to be done. >> I am using perl 5.8.1 >> Thanks >> Marcelino suzuki >> suzuki at cbl dot umces dot edu >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> ======================================================================== ==== oOOOOo Marcelino Suzuki, Assistant Professor oOOO Chesapeake Biological Lab - Univ of Maryland Center Environm Science oOOOOOo. PO Box 38, One Williams St Solomons, MD 20688 .oOOOOOOOOOo. suzuki@cbl.umces.edu - http://cbl.umces.edu .oOOOOOOOOOOOOOOooo.. Ph 410-326-7291 FAX 410-326-7341 000000000000000000000000000000000000000000000000000000000000000000000000 0000 From kishua2000 at hotmail.com Thu Jan 6 04:26:30 2005 From: kishua2000 at hotmail.com (kishua2000 kishua) Date: Thu Jan 6 11:31:20 2005 Subject: [Bioperl-l] Can't get length Message-ID: An HTML attachment was scrubbed... URL: http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050106/4fa704d4/attachment.htm From tex at biocompute.net Wed Jan 5 20:56:53 2005 From: tex at biocompute.net (James Thompson) Date: Thu Jan 6 12:05:21 2005 Subject: [Bioperl-l] Can't get length In-Reply-To: Message-ID: Try using $seq->length() instead of $seq.length(). If you still have problems, mail the list again and be sure to include your input file. Also, unless you're using a very large sequence file, use the 'fasta' format rather than the 'largefasta' format. Cheers, James Thompson On Thu, 6 Jan 2005, kishua2000 kishua wrote: > Hello, > ? > I'm using seqIO object to load a DNA?sequence in a?fasta-like format (">" > + some info, in the header). The header doesn't contain any info about > the length of the sequence. > ? > $in? = Bio::SeqIO->new(-file =>$fastaFile , '-format' => 'largefasta'); > Then I load my sequence to a seq object > ? > my $seq = $in->next_seq() > ? > but?when I try to get the length of the sequence I get 0 > ? > $len=$seq.length(); #----> 0 > ? > so how to get simple?length ? > ? > > ________________________________________________________________________________ > Don't just search. Find. MSN Search Check out the new MSN Search! > From paulo.david at netvisao.pt Thu Jan 6 12:10:40 2005 From: paulo.david at netvisao.pt (Paulo Almeida) Date: Thu Jan 6 12:06:43 2005 Subject: [Bioperl-l] Can't get length In-Reply-To: References: Message-ID: <41DD7110.3040005@netvisao.pt> Hi, Did you try $seq->length ? -Paulo kishua2000 kishua wrote: > Hello, > > I'm using seqIO object to load a DNA sequence in a fasta-like format > (">" + some info, in the header). The header doesn't contain any info > about the length of the sequence. > > $in = Bio::SeqIO->new(-file =>$fastaFile , '-format' => 'largefasta'); > Then I load my sequence to a seq object > > my $seq = $in->next_seq() > > but when I try to get the length of the sequence I get 0 > > $len=$seq.length(); #----> 0 > > so how to get simple length ? From golharam at umdnj.edu Thu Jan 6 16:21:18 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Thu Jan 6 16:14:07 2005 Subject: [Bioperl-l] Error parsing Genbank file In-Reply-To: Message-ID: <001101c4f435$a9722290$a6028a0a@GOLHARMOBILE1> What is the fix for CONTIG entries.... BTW- I'm new to bioperl... Ryan -----Original Message----- From: Jason Stajich [mailto:jason.stajich@duke.edu] Sent: Wednesday, January 05, 2005 4:37 PM To: golharam@umdnj.edu Cc: 'Bioperl List' Subject: Re: [Bioperl-l] Error parsing Genbank file We can't parse WGS files. The fix it needs is very similar to how we handle CONTIG entries if you want to have a go at fixing it. On Jan 5, 2005, at 3:41 PM, Ryan Golhar wrote: > Hi all, > > I have a Genbank file that Bio::SeqIO:genbank.pm is choking on. The > entry is just a WGS entry referencing a bunch of other entries. It > does on line 492 with the error "Unexpected error in feature table for > Skipping feature, attempting to recover". > > I'm using the following code: > > #!/usr/bin/perl > > use strict; > use Bio::SeqIO; > > my $usage = "$0 \n"; > my $file = shift or die $usage; > my $outfilename = shift or die $usage; > > my $infile = Bio::SeqIO->new('-file' => "<$file", > '-format' => "genbank"); > > my $outfile = Bio::SeqIO->new(-'file' => ">$outfilename", > '-format' => "fasta"); > > while (my $seq = $infile->next_seq) { > # print STDERR $seq->accession_number,"\n"; > > $outfile->write_seq($seq); > } > > Here is the contents of the genbank entry: > > LOCUS CAAB01000000 12381 rc DNA linear VRT > 22-AUG-2002 > DEFINITION Takifugu rubripes whole genome shotgun sequencing project. > ACCESSION CAAB00000000 > VERSION CAAB00000000.1 GI:22418063 > KEYWORDS WGS. > SOURCE Takifugu rubripes (Fugu rubripes) > ORGANISM Takifugu rubripes > Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; > Euteleostomi; > Actinopterygii; Neopterygii; Teleostei; Euteleostei; > Neoteleostei; > Acanthomorpha; Acanthopterygii; Percomorpha; > Tetraodontiformes; > Tetradontoidea; Tetraodontidae; Takifugu. > REFERENCE 1 (bases 1 to 12381) > AUTHORS The Fugu Genome Sequencing Consortium. > TITLE Direct Submission > JOURNAL Submitted (01-JUL-2002) The Fugu Genome Sequencing > Consortium, > http://www.fugubase.org/ http://www.jgi.doe.gov/fugu > COMMENT The Takifugu rubripes whole genome shotgun (WGS) project > has > the > project accession CAAB00000000. This version of the > project > (01) > has the accession number CAAB01000000, and consists of > sequences > CAAB01000001-CAAB01012381. > FEATURES Location/Qualifiers > source 1..12381 > /organism="Takifugu rubripes" > /mol_type="genomic DNA" > /db_xref="taxon:31033" > WGS CAAB01000001-CAAB01012381 > // > > > > ----- > Ryan Golhar > Computational Biologist > The Informatics Institute at > The University of Medicine & Dentistry of NJ > > Phone: 973-972-5034 > Fax: 973-972-7412 > Email: golharam@umdnj.edu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From jason.stajich at duke.edu Thu Jan 6 17:14:04 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu Jan 6 17:10:55 2005 Subject: [Bioperl-l] Error parsing Genbank file In-Reply-To: <001101c4f435$a9722290$a6028a0a@GOLHARMOBILE1> References: <001101c4f435$a9722290$a6028a0a@GOLHARMOBILE1> Message-ID: <468B79D3-6030-11D9-BE08-000393C44276@duke.edu> Fixed in CVS. You can grab the changes from http://cvs.open-bio.org/ Index: Bio/SeqIO/genbank.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-live/Bio/SeqIO/genbank.pm,v retrieving revision 1.116 diff -r1.116 genbank.pm 71a72 > wgs - Should contain a Bio::Annotation::SimpleValue object 465,466c466 < last if(($buffer =~ /^BASE/o) || ($buffer =~ /^ORIGIN/o) || < ($buffer =~ /^CONTIG/o) ); --- > last if( $buffer =~ /^BASE|ORIGIN|CONTIG|WGS/o); 517a518,522 > } elsif( s/^WGS\s+// ) { > chomp; > $annotation->add_Annotation( > 'wgs', > Bio::Annotation::SimpleValue->new(-value => $_)); 522c527,528 < } --- > > } else { warn($_); } 775a782,788 > # deal with WGS > foreach my $wgs ( $seq->annotation->get_Annotations('wgs') ) { > $self->_print(sprintf ("%-11s %s\n",'WGS', > $wgs->value)); > $self->_show_dna(0); > } > On Jan 6, 2005, at 4:21 PM, Ryan Golhar wrote: > What is the fix for CONTIG entries.... > > BTW- I'm new to bioperl... > > Ryan > > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich@duke.edu] > Sent: Wednesday, January 05, 2005 4:37 PM > To: golharam@umdnj.edu > Cc: 'Bioperl List' > Subject: Re: [Bioperl-l] Error parsing Genbank file > > > We can't parse WGS files. The fix it needs is very similar to how we > handle CONTIG entries if you want to have a go at fixing it. > > On Jan 5, 2005, at 3:41 PM, Ryan Golhar wrote: > >> Hi all, >> >> I have a Genbank file that Bio::SeqIO:genbank.pm is choking on. The >> entry is just a WGS entry referencing a bunch of other entries. It >> does on line 492 with the error "Unexpected error in feature table for >> Skipping feature, attempting to recover". >> >> I'm using the following code: >> >> #!/usr/bin/perl >> >> use strict; >> use Bio::SeqIO; >> >> my $usage = "$0 \n"; >> my $file = shift or die $usage; >> my $outfilename = shift or die $usage; >> >> my $infile = Bio::SeqIO->new('-file' => "<$file", >> '-format' => "genbank"); >> >> my $outfile = Bio::SeqIO->new(-'file' => ">$outfilename", >> '-format' => "fasta"); >> >> while (my $seq = $infile->next_seq) { >> # print STDERR $seq->accession_number,"\n"; >> >> $outfile->write_seq($seq); >> } >> >> Here is the contents of the genbank entry: >> >> LOCUS CAAB01000000 12381 rc DNA linear VRT >> 22-AUG-2002 >> DEFINITION Takifugu rubripes whole genome shotgun sequencing project. >> ACCESSION CAAB00000000 >> VERSION CAAB00000000.1 GI:22418063 >> KEYWORDS WGS. >> SOURCE Takifugu rubripes (Fugu rubripes) >> ORGANISM Takifugu rubripes >> Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; >> Euteleostomi; >> Actinopterygii; Neopterygii; Teleostei; Euteleostei; >> Neoteleostei; >> Acanthomorpha; Acanthopterygii; Percomorpha; >> Tetraodontiformes; >> Tetradontoidea; Tetraodontidae; Takifugu. >> REFERENCE 1 (bases 1 to 12381) >> AUTHORS The Fugu Genome Sequencing Consortium. >> TITLE Direct Submission >> JOURNAL Submitted (01-JUL-2002) The Fugu Genome Sequencing >> Consortium, >> http://www.fugubase.org/ http://www.jgi.doe.gov/fugu >> COMMENT The Takifugu rubripes whole genome shotgun (WGS) project >> has >> the >> project accession CAAB00000000. This version of the >> project >> (01) >> has the accession number CAAB01000000, and consists of >> sequences >> CAAB01000001-CAAB01012381. >> FEATURES Location/Qualifiers >> source 1..12381 >> /organism="Takifugu rubripes" >> /mol_type="genomic DNA" >> /db_xref="taxon:31033" >> WGS CAAB01000001-CAAB01012381 >> // >> >> >> >> ----- >> Ryan Golhar >> Computational Biologist >> The Informatics Institute at >> The University of Medicine & Dentistry of NJ >> >> Phone: 973-972-5034 >> Fax: 973-972-7412 >> Email: golharam@umdnj.edu >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From Peter.Robinson at t-online.de Thu Jan 6 15:44:27 2005 From: Peter.Robinson at t-online.de (Peter Robinson) Date: Thu Jan 6 20:02:46 2005 Subject: [Bioperl-l] Entrez Gene and bioperl-db In-Reply-To: <1DA5FD5C-5E94-11D9-9C0C-000393C44276@duke.edu> References: <2ED9C47A-5898-11D9-AC01-000A959EB4C4@gmx.net> <1104792001.3186.17.camel@localhost.localdomain> <0F5A3AE4-5DDA-11D9-AA3C-000393C44276@duke.edu> <1104871954.3102.24.camel@localhost.localdomain> <1DA5FD5C-5E94-11D9-9C0C-000393C44276@duke.edu> Message-ID: <1105044266.3084.27.camel@localhost.localdomain> Dear Bioperlers, I have started looking at writing some modules to parse the new Entrez gene, which is kind of an expanded LocusLink. The really interesting files are species specific and are in the ASN.1 format, and I am still experimenting around with the best way of parsing them. To get started, I am looking at the tab-delimited flat files. It seems to me that it would be interesting to be able to parse gene_info and gene2accession using the Bio::SeqIO system, the other files such as gene2unigene seem less suited for this (the latter has just two entries which could be parsed ad hoc easily enough). In any case, I am sending a proposed module Bio::SeqIO::geneinfo.pm as well as a test script (which contains a small excerpt of gene_info in the data section) for comments and criticism to the list. I am presently working on another module for Bio::SeqIO::gene2accession and plan to write a demo script using both modules to convert NCBI accession numbers to MGI accession numbers (which is something one might want to do in order to use Gene Ontology for affymetrix data, although one needs additional work for probesets which are only related to ESTs). For the moment it seemed better to just parse in the NCBI taxon id into the Bio::Species object (only this info is supplied by gene_info), and expect users who need the information to use the taxonomy support of other Bioperl modules in their scripts. I will continue to work on parsing the species specific ASN.1 files, but I will be trying a combination of lex/yacc/C to do this. If that works I will look into trying perl support for lex/yacc for potential use in Bioperl, but since I am not sure how long this will take me, I do not want to scare off anyone else who would like to give this a shot. best, peter On Tue, 2005-01-04 at 22:03, Jason Stajich wrote: > On Jan 4, 2005, at 3:52 PM, Peter Robinson wrote: > > > Hi Jason, > > > > thanks for the advice. It seems as if the documentation of > > Bio::DB::Taxonomy is a bit out of sync. > > my $db = new Bio::DB::Taxonomy(-source => 'flatfile' > > -nodesfile => $nodesfile, > > -namesfile => $namefile); > > What does 'flatfile' refer to here? It is not apparent upon looking at > > the code for new. > > > See Bio::DB::Taxonomy::flatfile for more information. As I mentioned > in the mail I sent, flatfile is for downloading the taxonomy DB from > NCBI. This lets you run it locally using an indexed (BerkelyDB via > DB_File) version of the file. > > You must need the most up-to-date verion of the modules - works fine > for me for both the entrez and flatfile code, but you may have to > upgrade off of the 1.4.0 release. Code from CVS or the bioperl-1.5 RC1 > code should work fine. > > > > > I had somewhat better luck using the entrez version, but I got a > > pretty amusing error > > message: > > > > MSG: can't create a species object for Homo sapiens (human) because it > > isn't a species but is a '' instead > > > > ### > > Full error and a dump of the script follow: > > > > my $db = new Bio::DB::Taxonomy(-source => 'entrez'); # > > my $taxaid = $db->get_taxonid('Homo sapiens'); > > my $species = $db->get_Taxonomy_Node(-taxonid => '9606'); > > print Dumper($species); > > > > ### > > > > Use of uninitialized value in string eq at > > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 192. > > Use of uninitialized value in sprintf at > > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 201. > > > > -------------------- WARNING --------------------- > > MSG: can't create a species object for Homo sapiens (human) because it > > isn't a species but is a '' instead > > --------------------------------------------------- > > Use of uninitialized value in string eq at > > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 192. > > Use of uninitialized value in sprintf at > > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 201. > > > > -------------------- WARNING --------------------- > > MSG: can't create a species object for Homo sapiens (human) because it > > isn't a species but is a '' instead > > --------------------------------------------------- > > $VAR1 = { > > 'TaxId' => '9606', > > 'Division' => 'mammals', > > 'GeneNumber' => '32775', > > 'Rank' => 'species', > > 'ProtNumber' => '247791', > > 'ScientificName' => 'Homo sapiens', > > 'CommonName' => 'human', > > 'NucNumber' => '9025800', > > 'GenNumber' => '25', > > 'StructNumber' => '5638' > > }; > > peter@anna:~/programs/bioperlTest$ > > > > > > --best, peter > > > > On Mon, 2005-01-03 at 23:51, Jason Stajich wrote: > >> Bio::DB::Taxonomy is the factory code - it is pretty easy to get a > >> species object (or equivalent) using this code. But you cannot (or > >> could not when I wrote this, not sure of the current status) get the > >> full classification from the NCBI taxonomy retrieval via cgi. i.e. > >> you > >> can only get genus and species for a taxon id and I don't know how to > >> walk up the hierarchy using the web API. Earlier emails to NCBI > >> seemed > >> to indicate this is all they intended to provide, but not sure what > >> the > >> current status is. > >> > >> my $db = new Bio::DB::Taxonomy(-source => 'entrez'); # use NCBI > >> Entrez > >> over HTTP > >> my $taxaid = $db->get_taxonid('Homo sapiens'); > >> my $taxonnode = $db->get_Taxonomy_Node(-taxonid => '9606'); > >> > >> You can get the full classification if you use the > >> Bio::DB::Taxonomy::flatfile factory which requires you to have > >> downloaded the taxonomy db flatfile from NCBI. Since this is more > >> reliable (and faster) it is what I have tended to use for grouping > >> sets > >> of seqDB search results, etc. > >> > >> -jason > >> On Jan 3, 2005, at 5:40 PM, Peter Robinson wrote: > >> > >>> Hi Bioperlers, hi Hilmar, > >>> > >>> after some thinking I have embarked on a lex/yacc parser for the > >>> Entrez > >>> Gene ASN.1 format as the way of least resistance, although I am not > >>> sure > >>> how that would fit in to BioPerl. If anyone is interested in this (or > >>> has a better idea of how to go about it..), please drop me a line. > >>> > >>> In the meantime I have been looking at writing code to parse some of > >>> the > >>> "easy" Entrez gene documents, starting off with gene_info. This file > >>> includes the NCBI taxon id for each entry. I would like to convert > >>> this > >>> to a Bio::Species object to pass to the following > >>> my $seq = $self->sequence_factory->create( > >>> -verbose => $self->verbose(), > >>> -accession_number => $geneID, > >>> -desc => $description, > >>> -display_id => $symbol, > >>> -species => ??? > >>> -annotation => $ann); > >>> > >>> and saw the Bio::Taxonomy::FactoryI code, which appears to want to do > >>> this sort of thing. However, the code for that is pretty preliminary. > >>> Is > >>> anyone working on this at the moment? Or is there a better way of > >>> doing > >>> this (it seems a shame not to provide the actual species name if one > >>> has > >>> the taxid...) > >>> > >>> best > >>> > >>> Peter > >>> > >>> > >>> > >>> On Tue, 2004-12-28 at 07:17, Hilmar Lapp wrote: > >>>> Great to hear that someone is giving this a shot. Yes at this point > >>>> is > >>>> appears that NCBI is only offering the ASN.1, not a conversion to > >>>> XML. > >>>> Their asn2xml tool will not work with this ASN.1 format either, just > >>>> checked it to be sure. They do seem to be mulling the option of XML > >>>> though on the Gene FAQ. Maybe if enough people get in their ears > >>>> they > >>>> will spend some effort towards that. After all, the entrez gene web > >>>> interface can display XML on demand - even though it looks fairly > >>>> hideous. > >>>> > >>>> There is no ASN.1 support in bioperl at all. Also, ASN.1 support in > >>>> perl is actually thin - there is Convert::ASN1 at version 0.18 two > >>>> years ago that I could find ... doesn't make me feel warm and fuzzy. > >>>> > >>>> In the absence of any XML available from NCBI, gene_info might be > >>>> the > >>>> best start. An option could be to check for the presence of the > >>>> other > >>>> tab-delimited files and use those that are present. These are > >>>> tab-delimited and hence the format itself is trivial so you can > >>>> focus > >>>> entirely on setting up a Bio::Seq plus annotation that's > >>>> comparable/compatible to what the current SeqIO::locuslink does. > >>>> > >>>> My $0.02 (worth less and less almost every day). > >>>> > >>>> -hilmar > >>>> > >>>> On Thursday, December 23, 2004, at 10:51 AM, Peter Robinson wrote: > >>>> > >>>>> Hi, > >>>>> > >>>>> I have been thinking about given a BioPerl EntrezGene parser a try > >>>>> since > >>>>> I have been a heavy user of locus link to date. One issue is that > >>>>> the > >>>>> files that correspond to LL_tmpl (which was a flat file) are now in > >>>>> asn > >>>>> format > >>>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/help/ > >>>>> genehelp.html#query > >>>>> Although I saw some mention of ASN support in Bioperl by googling, > >>>>> I > >>>>> can't seem to find any module that does this in the present > >>>>> distribution. What is the status on that? In any case, I will be > >>>>> working > >>>>> on this in the next month or two and if anything nice comes of it I > >>>>> will > >>>>> send it to you / BioPerpl. > >>>>> > >>>>> best wishes & happy holidays > >>>>> > >>>>> Peter > >>>>> > >>>>> On Tue, 2004-12-14 at 09:00, Hilmar Lapp wrote: > >>>>>> Since load_seqdatabase.pl will use bioperl's SeqIO parsers for > >>>>>> parsing > >>>>>> any input file, what you're asking is whether or not there is a > >>>>>> SeqIO > >>>>>> parser for NCBI Gene. > >>>>>> > >>>>>> The answer to that question is no, not yet. Anybody who feels > >>>>>> motivated > >>>>>> is welcome to give it a try ... Since I'll need it, I'll write the > >>>>>> parser if nobody else does within the next 3 months, but I'm not > >>>>>> going > >>>>>> to promise when exactly this will happen. > >>>>>> > >>>>>> -hilmar > >>>>>> > >>>>>> On Monday, December 13, 2004, at 08:03 AM, Law, Annie wrote: > >>>>>> > >>>>>>> Hi, > >>>>>>> > >>>>>>> I was wondering with regards to bioperl-db the scripts and schema > >>>>>>> and > >>>>>>> load_seqdatabase.pl has there been preparation for integration of > >>>>>>> Entrez > >>>>>>> gene information when locuslink is phased out? Or if it has > >>>>>>> already > >>>>>>> been > >>>>>>> changed could somebody point > >>>>>>> me to the documentation or changed code? > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Annie. > >>>>>>> _______________________________________________ > >>>>>>> Bioperl-l mailing list > >>>>>>> Bioperl-l@portal.open-bio.org > >>>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>> > >>>>>>> > >>>>> -- > >>>>> Peter N. Robinson > >>>>> peter.robinson@t-online.de > >>>>> peter.robinson@charite.de > >>>>> http://www.charite.de/ch/medgen/robinson/ > >>>>> > >>>>> > >>> -- > >>> Peter N. Robinson > >>> peter.robinson@t-online.de > >>> peter.robinson@charite.de > >>> http://www.charite.de/ch/medgen/robinson/ > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l@portal.open-bio.org > >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> -- > >> Jason Stajich > >> jason.stajich at duke.edu > >> http://www.duke.edu/~jes12/ > > -- > > Peter N. Robinson > > peter.robinson@t-online.de > > peter.robinson@charite.de > > http://www.charite.de/ch/medgen/robinson/ > > > > > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Peter N. Robinson peter.robinson@t-online.de peter.robinson@charite.de http://www.charite.de/ch/medgen/robinson/ -------------- next part -------------- A non-text attachment was scrubbed... Name: geneinfo.pm Type: application/x-perl Size: 10931 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050106/6754c375/geneinfo.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: geneinfotest.pl Type: application/x-perl Size: 11184 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050106/6754c375/geneinfotest.bin From skirov at utk.edu Thu Jan 6 21:33:05 2005 From: skirov at utk.edu (Stefan A Kirov) Date: Thu Jan 6 21:29:57 2005 Subject: [Bioperl-l] Entrez Gene and bioperl-db In-Reply-To: <1105044266.3084.27.camel@localhost.localdomain> References: <2ED9C47A-5898-11D9-AC01-000A959EB4C4@gmx.net> <1104792001.3186.17.camel@localhost.localdomain> <0F5A3AE4-5DDA-11D9-AA3C-000393C44276@duke.edu> <1104871954.3102.24.camel@localhost.localdomain> <1DA5FD5C-5E94-11D9-9C0C-000393C44276@duke.edu> <1105044266.3084.27.camel@localhost.localdomain> Message-ID: Peter, Why unigene can't be added as Bio::Annotation object for example? Peter, would you mind if I give you a hand, as I am also doing some Entrez Gene DB parsing. Hilmar, Getting back to your post, I have some concern about automatic parsing of multiple files (if I got this right...). Say if one downloads the whole Entrez Gene stuff and all is OK I don't see why this can't be done. But if something goes wrong (and occasionally it will), it will be really hard for the user to understand he misses parts of the data. Of course this could be done through warnings, but what about people who intentionally parse part of the DB? I guess one can add something like -suppress_warning=>1/0. Another issue that comes to mind is the approach of a stream is fine for people with the whole DB on their minds. But of you need particular record, I guess you you could index the files, but this totally different game. Any volunteers? On Thu, 6 Jan 2005, Peter Robinson wrote: >Dear Bioperlers, > >I have started looking at writing some modules to parse the new Entrez >gene, which is kind of an expanded LocusLink. The really interesting >files are species specific and are in the ASN.1 format, and I am still >experimenting around with the best way of parsing them. To get started, >I am looking at the tab-delimited flat files. It seems to me that it >would be interesting to be able to parse gene_info and gene2accession >using the Bio::SeqIO system, the other files such as gene2unigene seem >less suited for this (the latter has just two entries which could be >parsed ad hoc easily enough). > >In any case, I am sending a proposed module Bio::SeqIO::geneinfo.pm as >well as a test script (which contains a small excerpt of gene_info in >the data section) for comments and criticism to the list. I am presently >working on another module for Bio::SeqIO::gene2accession and plan to >write a demo script using both modules to convert NCBI accession numbers >to MGI accession numbers (which is something one might want to do in >order to use Gene Ontology for affymetrix data, although one needs >additional work for probesets which are only related to ESTs). > >For the moment it seemed better to just parse in the NCBI taxon id into >the Bio::Species object (only this info is supplied by gene_info), and >expect users who need the information to use the taxonomy support of >other Bioperl modules in their scripts. > >I will continue to work on parsing the species specific ASN.1 files, but >I will be trying a combination of lex/yacc/C to do this. If that works I >will look into trying perl support for lex/yacc for potential use in >Bioperl, but since I am not sure how long this will take me, I do not >want to scare off anyone else who would like to give this a shot. > >best, >peter > > >On Tue, 2005-01-04 at 22:03, Jason Stajich wrote: >> On Jan 4, 2005, at 3:52 PM, Peter Robinson wrote: >> >> > Hi Jason, >> > >> > thanks for the advice. It seems as if the documentation of >> > Bio::DB::Taxonomy is a bit out of sync. >> > my $db = new Bio::DB::Taxonomy(-source => 'flatfile' >> > -nodesfile => $nodesfile, >> > -namesfile => $namefile); >> > What does 'flatfile' refer to here? It is not apparent upon looking at >> > the code for new. >> > >> See Bio::DB::Taxonomy::flatfile for more information. As I mentioned >> in the mail I sent, flatfile is for downloading the taxonomy DB from >> NCBI. This lets you run it locally using an indexed (BerkelyDB via >> DB_File) version of the file. >> >> You must need the most up-to-date verion of the modules - works fine >> for me for both the entrez and flatfile code, but you may have to >> upgrade off of the 1.4.0 release. Code from CVS or the bioperl-1.5 RC1 >> code should work fine. >> >> >> >> > I had somewhat better luck using the entrez version, but I got a >> > pretty amusing error >> > message: >> > >> > MSG: can't create a species object for Homo sapiens (human) because it >> > isn't a species but is a '' instead >> > >> > ### >> > Full error and a dump of the script follow: >> > >> > my $db = new Bio::DB::Taxonomy(-source => 'entrez'); # >> > my $taxaid = $db->get_taxonid('Homo sapiens'); >> > my $species = $db->get_Taxonomy_Node(-taxonid => '9606'); >> > print Dumper($species); >> > >> > ### >> > >> > Use of uninitialized value in string eq at >> > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 192. >> > Use of uninitialized value in sprintf at >> > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 201. >> > >> > -------------------- WARNING --------------------- >> > MSG: can't create a species object for Homo sapiens (human) because it >> > isn't a species but is a '' instead >> > --------------------------------------------------- >> > Use of uninitialized value in string eq at >> > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 192. >> > Use of uninitialized value in sprintf at >> > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 201. >> > >> > -------------------- WARNING --------------------- >> > MSG: can't create a species object for Homo sapiens (human) because it >> > isn't a species but is a '' instead >> > --------------------------------------------------- >> > $VAR1 = { >> > 'TaxId' => '9606', >> > 'Division' => 'mammals', >> > 'GeneNumber' => '32775', >> > 'Rank' => 'species', >> > 'ProtNumber' => '247791', >> > 'ScientificName' => 'Homo sapiens', >> > 'CommonName' => 'human', >> > 'NucNumber' => '9025800', >> > 'GenNumber' => '25', >> > 'StructNumber' => '5638' >> > }; >> > peter@anna:~/programs/bioperlTest$ >> > >> > >> > --best, peter >> > >> > On Mon, 2005-01-03 at 23:51, Jason Stajich wrote: >> >> Bio::DB::Taxonomy is the factory code - it is pretty easy to get a >> >> species object (or equivalent) using this code. But you cannot (or >> >> could not when I wrote this, not sure of the current status) get the >> >> full classification from the NCBI taxonomy retrieval via cgi. i.e. >> >> you >> >> can only get genus and species for a taxon id and I don't know how to >> >> walk up the hierarchy using the web API. Earlier emails to NCBI >> >> seemed >> >> to indicate this is all they intended to provide, but not sure what >> >> the >> >> current status is. >> >> >> >> my $db = new Bio::DB::Taxonomy(-source => 'entrez'); # use NCBI >> >> Entrez >> >> over HTTP >> >> my $taxaid = $db->get_taxonid('Homo sapiens'); >> >> my $taxonnode = $db->get_Taxonomy_Node(-taxonid => '9606'); >> >> >> >> You can get the full classification if you use the >> >> Bio::DB::Taxonomy::flatfile factory which requires you to have >> >> downloaded the taxonomy db flatfile from NCBI. Since this is more >> >> reliable (and faster) it is what I have tended to use for grouping >> >> sets >> >> of seqDB search results, etc. >> >> >> >> -jason >> >> On Jan 3, 2005, at 5:40 PM, Peter Robinson wrote: >> >> >> >>> Hi Bioperlers, hi Hilmar, >> >>> >> >>> after some thinking I have embarked on a lex/yacc parser for the >> >>> Entrez >> >>> Gene ASN.1 format as the way of least resistance, although I am not >> >>> sure >> >>> how that would fit in to BioPerl. If anyone is interested in this (or >> >>> has a better idea of how to go about it..), please drop me a line. >> >>> >> >>> In the meantime I have been looking at writing code to parse some of >> >>> the >> >>> "easy" Entrez gene documents, starting off with gene_info. This file >> >>> includes the NCBI taxon id for each entry. I would like to convert >> >>> this >> >>> to a Bio::Species object to pass to the following >> >>> my $seq = $self->sequence_factory->create( >> >>> -verbose => $self->verbose(), >> >>> -accession_number => $geneID, >> >>> -desc => $description, >> >>> -display_id => $symbol, >> >>> -species => ??? >> >>> -annotation => $ann); >> >>> >> >>> and saw the Bio::Taxonomy::FactoryI code, which appears to want to do >> >>> this sort of thing. However, the code for that is pretty preliminary. >> >>> Is >> >>> anyone working on this at the moment? Or is there a better way of >> >>> doing >> >>> this (it seems a shame not to provide the actual species name if one >> >>> has >> >>> the taxid...) >> >>> >> >>> best >> >>> >> >>> Peter >> >>> >> >>> >> >>> >> >>> On Tue, 2004-12-28 at 07:17, Hilmar Lapp wrote: >> >>>> Great to hear that someone is giving this a shot. Yes at this point >> >>>> is >> >>>> appears that NCBI is only offering the ASN.1, not a conversion to >> >>>> XML. >> >>>> Their asn2xml tool will not work with this ASN.1 format either, just >> >>>> checked it to be sure. They do seem to be mulling the option of XML >> >>>> though on the Gene FAQ. Maybe if enough people get in their ears >> >>>> they >> >>>> will spend some effort towards that. After all, the entrez gene web >> >>>> interface can display XML on demand - even though it looks fairly >> >>>> hideous. >> >>>> >> >>>> There is no ASN.1 support in bioperl at all. Also, ASN.1 support in >> >>>> perl is actually thin - there is Convert::ASN1 at version 0.18 two >> >>>> years ago that I could find ... doesn't make me feel warm and fuzzy. >> >>>> >> >>>> In the absence of any XML available from NCBI, gene_info might be >> >>>> the >> >>>> best start. An option could be to check for the presence of the >> >>>> other >> >>>> tab-delimited files and use those that are present. These are >> >>>> tab-delimited and hence the format itself is trivial so you can >> >>>> focus >> >>>> entirely on setting up a Bio::Seq plus annotation that's >> >>>> comparable/compatible to what the current SeqIO::locuslink does. >> >>>> >> >>>> My $0.02 (worth less and less almost every day). >> >>>> >> >>>> -hilmar >> >>>> >> >>>> On Thursday, December 23, 2004, at 10:51 AM, Peter Robinson wrote: >> >>>> >> >>>>> Hi, >> >>>>> >> >>>>> I have been thinking about given a BioPerl EntrezGene parser a try >> >>>>> since >> >>>>> I have been a heavy user of locus link to date. One issue is that >> >>>>> the >> >>>>> files that correspond to LL_tmpl (which was a flat file) are now in >> >>>>> asn >> >>>>> format >> >>>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/help/ >> >>>>> genehelp.html#query >> >>>>> Although I saw some mention of ASN support in Bioperl by googling, >> >>>>> I >> >>>>> can't seem to find any module that does this in the present >> >>>>> distribution. What is the status on that? In any case, I will be >> >>>>> working >> >>>>> on this in the next month or two and if anything nice comes of it I >> >>>>> will >> >>>>> send it to you / BioPerpl. >> >>>>> >> >>>>> best wishes & happy holidays >> >>>>> >> >>>>> Peter >> >>>>> >> >>>>> On Tue, 2004-12-14 at 09:00, Hilmar Lapp wrote: >> >>>>>> Since load_seqdatabase.pl will use bioperl's SeqIO parsers for >> >>>>>> parsing >> >>>>>> any input file, what you're asking is whether or not there is a >> >>>>>> SeqIO >> >>>>>> parser for NCBI Gene. >> >>>>>> >> >>>>>> The answer to that question is no, not yet. Anybody who feels >> >>>>>> motivated >> >>>>>> is welcome to give it a try ... Since I'll need it, I'll write the >> >>>>>> parser if nobody else does within the next 3 months, but I'm not >> >>>>>> going >> >>>>>> to promise when exactly this will happen. >> >>>>>> >> >>>>>> -hilmar >> >>>>>> >> >>>>>> On Monday, December 13, 2004, at 08:03 AM, Law, Annie wrote: >> >>>>>> >> >>>>>>> Hi, >> >>>>>>> >> >>>>>>> I was wondering with regards to bioperl-db the scripts and schema >> >>>>>>> and >> >>>>>>> load_seqdatabase.pl has there been preparation for integration of >> >>>>>>> Entrez >> >>>>>>> gene information when locuslink is phased out? Or if it has >> >>>>>>> already >> >>>>>>> been >> >>>>>>> changed could somebody point >> >>>>>>> me to the documentation or changed code? >> >>>>>>> >> >>>>>>> Thanks, >> >>>>>>> Annie. >> >>>>>>> _______________________________________________ >> >>>>>>> Bioperl-l mailing list >> >>>>>>> Bioperl-l@portal.open-bio.org >> >>>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >>>>>>> >> >>>>>>> >> >>>>> -- >> >>>>> Peter N. Robinson >> >>>>> peter.robinson@t-online.de >> >>>>> peter.robinson@charite.de >> >>>>> http://www.charite.de/ch/medgen/robinson/ >> >>>>> >> >>>>> >> >>> -- >> >>> Peter N. Robinson >> >>> peter.robinson@t-online.de >> >>> peter.robinson@charite.de >> >>> http://www.charite.de/ch/medgen/robinson/ >> >>> >> >>> _______________________________________________ >> >>> Bioperl-l mailing list >> >>> Bioperl-l@portal.open-bio.org >> >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >>> >> >>> >> >> -- >> >> Jason Stajich >> >> jason.stajich at duke.edu >> >> http://www.duke.edu/~jes12/ >> > -- >> > Peter N. Robinson >> > peter.robinson@t-online.de >> > peter.robinson@charite.de >> > http://www.charite.de/ch/medgen/robinson/ >> > >> > >> -- >> Jason Stajich >> jason.stajich at duke.edu >> http://www.duke.edu/~jes12/ >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >-- >Peter N. Robinson >peter.robinson@t-online.de >peter.robinson@charite.de >http://www.charite.de/ch/medgen/robinson/ > From Peter.Robinson at t-online.de Fri Jan 7 01:51:03 2005 From: Peter.Robinson at t-online.de (Peter Robinson) Date: Fri Jan 7 01:47:20 2005 Subject: [Bioperl-l] Entrez Gene and bioperl-db In-Reply-To: References: <2ED9C47A-5898-11D9-AC01-000A959EB4C4@gmx.net> <1104792001.3186.17.camel@localhost.localdomain> <0F5A3AE4-5DDA-11D9-AA3C-000393C44276@duke.edu> <1104871954.3102.24.camel@localhost.localdomain> <1DA5FD5C-5E94-11D9-9C0C-000393C44276@duke.edu> <1105044266.3084.27.camel@localhost.localdomain> Message-ID: <1105080663.3142.16.camel@localhost.localdomain> Hi Stefan, happy to team up with you for Entrez Gene parsing. Since gene2unigene has entries of the form "geneid\tunigeneid", it didnt seem worth the trouble putting this information into a Bio::Annotation object in isolation. On the other hand, parsing multiple Entrez Gene files at once in order to synthesize various forms of infomration about an Entrez Gene id seemed to depart from the style of the rest of Bio::SeqIO code. Suggestions/thoughts, anyone? -peter On Fri, 2005-01-07 at 03:33, Stefan A Kirov wrote: > Peter, > Why unigene can't be added as Bio::Annotation object for example? Peter, > would you mind if I give you a hand, as I am also doing some Entrez Gene > DB parsing. > Hilmar, > Getting back to your post, I have some concern about automatic > parsing of multiple files (if I got this right...). Say if one downloads > the whole Entrez Gene stuff and all is OK I don't see why this can't be > done. But if something goes wrong (and occasionally it will), it will be > really hard for the user to understand he misses parts of the data. Of > course this could be done through warnings, but what about people who > intentionally parse part of the DB? I guess one can add something like > -suppress_warning=>1/0. > Another issue that comes to mind is the approach of a stream is fine for > people with the whole DB on their minds. But of you need particular > record, I guess you you could index the files, but this totally different > game. Any volunteers? > > > On Thu, 6 Jan 2005, Peter Robinson wrote: > > >Dear Bioperlers, > > > >I have started looking at writing some modules to parse the new Entrez > >gene, which is kind of an expanded LocusLink. The really interesting > >files are species specific and are in the ASN.1 format, and I am still > >experimenting around with the best way of parsing them. To get started, > >I am looking at the tab-delimited flat files. It seems to me that it > >would be interesting to be able to parse gene_info and gene2accession > >using the Bio::SeqIO system, the other files such as gene2unigene seem > >less suited for this (the latter has just two entries which could be > >parsed ad hoc easily enough). > > > >In any case, I am sending a proposed module Bio::SeqIO::geneinfo.pm as > >well as a test script (which contains a small excerpt of gene_info in > >the data section) for comments and criticism to the list. I am presently > >working on another module for Bio::SeqIO::gene2accession and plan to > >write a demo script using both modules to convert NCBI accession numbers > >to MGI accession numbers (which is something one might want to do in > >order to use Gene Ontology for affymetrix data, although one needs > >additional work for probesets which are only related to ESTs). > > > >For the moment it seemed better to just parse in the NCBI taxon id into > >the Bio::Species object (only this info is supplied by gene_info), and > >expect users who need the information to use the taxonomy support of > >other Bioperl modules in their scripts. > > > >I will continue to work on parsing the species specific ASN.1 files, but > >I will be trying a combination of lex/yacc/C to do this. If that works I > >will look into trying perl support for lex/yacc for potential use in > >Bioperl, but since I am not sure how long this will take me, I do not > >want to scare off anyone else who would like to give this a shot. > > > >best, > >peter > > > > > >On Tue, 2005-01-04 at 22:03, Jason Stajich wrote: > >> On Jan 4, 2005, at 3:52 PM, Peter Robinson wrote: > >> > >> > Hi Jason, > >> > > >> > thanks for the advice. It seems as if the documentation of > >> > Bio::DB::Taxonomy is a bit out of sync. > >> > my $db = new Bio::DB::Taxonomy(-source => 'flatfile' > >> > -nodesfile => $nodesfile, > >> > -namesfile => $namefile); > >> > What does 'flatfile' refer to here? It is not apparent upon looking at > >> > the code for new. > >> > > >> See Bio::DB::Taxonomy::flatfile for more information. As I mentioned > >> in the mail I sent, flatfile is for downloading the taxonomy DB from > >> NCBI. This lets you run it locally using an indexed (BerkelyDB via > >> DB_File) version of the file. > >> > >> You must need the most up-to-date verion of the modules - works fine > >> for me for both the entrez and flatfile code, but you may have to > >> upgrade off of the 1.4.0 release. Code from CVS or the bioperl-1.5 RC1 > >> code should work fine. > >> > >> > >> > >> > I had somewhat better luck using the entrez version, but I got a > >> > pretty amusing error > >> > message: > >> > > >> > MSG: can't create a species object for Homo sapiens (human) because it > >> > isn't a species but is a '' instead > >> > > >> > ### > >> > Full error and a dump of the script follow: > >> > > >> > my $db = new Bio::DB::Taxonomy(-source => 'entrez'); # > >> > my $taxaid = $db->get_taxonid('Homo sapiens'); > >> > my $species = $db->get_Taxonomy_Node(-taxonid => '9606'); > >> > print Dumper($species); > >> > > >> > ### > >> > > >> > Use of uninitialized value in string eq at > >> > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 192. > >> > Use of uninitialized value in sprintf at > >> > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 201. > >> > > >> > -------------------- WARNING --------------------- > >> > MSG: can't create a species object for Homo sapiens (human) because it > >> > isn't a species but is a '' instead > >> > --------------------------------------------------- > >> > Use of uninitialized value in string eq at > >> > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 192. > >> > Use of uninitialized value in sprintf at > >> > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 201. > >> > > >> > -------------------- WARNING --------------------- > >> > MSG: can't create a species object for Homo sapiens (human) because it > >> > isn't a species but is a '' instead > >> > --------------------------------------------------- > >> > $VAR1 = { > >> > 'TaxId' => '9606', > >> > 'Division' => 'mammals', > >> > 'GeneNumber' => '32775', > >> > 'Rank' => 'species', > >> > 'ProtNumber' => '247791', > >> > 'ScientificName' => 'Homo sapiens', > >> > 'CommonName' => 'human', > >> > 'NucNumber' => '9025800', > >> > 'GenNumber' => '25', > >> > 'StructNumber' => '5638' > >> > }; > >> > peter@anna:~/programs/bioperlTest$ > >> > > >> > > >> > --best, peter > >> > > >> > On Mon, 2005-01-03 at 23:51, Jason Stajich wrote: > >> >> Bio::DB::Taxonomy is the factory code - it is pretty easy to get a > >> >> species object (or equivalent) using this code. But you cannot (or > >> >> could not when I wrote this, not sure of the current status) get the > >> >> full classification from the NCBI taxonomy retrieval via cgi. i.e. > >> >> you > >> >> can only get genus and species for a taxon id and I don't know how to > >> >> walk up the hierarchy using the web API. Earlier emails to NCBI > >> >> seemed > >> >> to indicate this is all they intended to provide, but not sure what > >> >> the > >> >> current status is. > >> >> > >> >> my $db = new Bio::DB::Taxonomy(-source => 'entrez'); # use NCBI > >> >> Entrez > >> >> over HTTP > >> >> my $taxaid = $db->get_taxonid('Homo sapiens'); > >> >> my $taxonnode = $db->get_Taxonomy_Node(-taxonid => '9606'); > >> >> > >> >> You can get the full classification if you use the > >> >> Bio::DB::Taxonomy::flatfile factory which requires you to have > >> >> downloaded the taxonomy db flatfile from NCBI. Since this is more > >> >> reliable (and faster) it is what I have tended to use for grouping > >> >> sets > >> >> of seqDB search results, etc. > >> >> > >> >> -jason > >> >> On Jan 3, 2005, at 5:40 PM, Peter Robinson wrote: > >> >> > >> >>> Hi Bioperlers, hi Hilmar, > >> >>> > >> >>> after some thinking I have embarked on a lex/yacc parser for the > >> >>> Entrez > >> >>> Gene ASN.1 format as the way of least resistance, although I am not > >> >>> sure > >> >>> how that would fit in to BioPerl. If anyone is interested in this (or > >> >>> has a better idea of how to go about it..), please drop me a line. > >> >>> > >> >>> In the meantime I have been looking at writing code to parse some of > >> >>> the > >> >>> "easy" Entrez gene documents, starting off with gene_info. This file > >> >>> includes the NCBI taxon id for each entry. I would like to convert > >> >>> this > >> >>> to a Bio::Species object to pass to the following > >> >>> my $seq = $self->sequence_factory->create( > >> >>> -verbose => $self->verbose(), > >> >>> -accession_number => $geneID, > >> >>> -desc => $description, > >> >>> -display_id => $symbol, > >> >>> -species => ??? > >> >>> -annotation => $ann); > >> >>> > >> >>> and saw the Bio::Taxonomy::FactoryI code, which appears to want to do > >> >>> this sort of thing. However, the code for that is pretty preliminary. > >> >>> Is > >> >>> anyone working on this at the moment? Or is there a better way of > >> >>> doing > >> >>> this (it seems a shame not to provide the actual species name if one > >> >>> has > >> >>> the taxid...) > >> >>> > >> >>> best > >> >>> > >> >>> Peter > >> >>> > >> >>> > >> >>> > >> >>> On Tue, 2004-12-28 at 07:17, Hilmar Lapp wrote: > >> >>>> Great to hear that someone is giving this a shot. Yes at this point > >> >>>> is > >> >>>> appears that NCBI is only offering the ASN.1, not a conversion to > >> >>>> XML. > >> >>>> Their asn2xml tool will not work with this ASN.1 format either, just > >> >>>> checked it to be sure. They do seem to be mulling the option of XML > >> >>>> though on the Gene FAQ. Maybe if enough people get in their ears > >> >>>> they > >> >>>> will spend some effort towards that. After all, the entrez gene web > >> >>>> interface can display XML on demand - even though it looks fairly > >> >>>> hideous. > >> >>>> > >> >>>> There is no ASN.1 support in bioperl at all. Also, ASN.1 support in > >> >>>> perl is actually thin - there is Convert::ASN1 at version 0.18 two > >> >>>> years ago that I could find ... doesn't make me feel warm and fuzzy. > >> >>>> > >> >>>> In the absence of any XML available from NCBI, gene_info might be > >> >>>> the > >> >>>> best start. An option could be to check for the presence of the > >> >>>> other > >> >>>> tab-delimited files and use those that are present. These are > >> >>>> tab-delimited and hence the format itself is trivial so you can > >> >>>> focus > >> >>>> entirely on setting up a Bio::Seq plus annotation that's > >> >>>> comparable/compatible to what the current SeqIO::locuslink does. > >> >>>> > >> >>>> My $0.02 (worth less and less almost every day). > >> >>>> > >> >>>> -hilmar > >> >>>> > >> >>>> On Thursday, December 23, 2004, at 10:51 AM, Peter Robinson wrote: > >> >>>> > >> >>>>> Hi, > >> >>>>> > >> >>>>> I have been thinking about given a BioPerl EntrezGene parser a try > >> >>>>> since > >> >>>>> I have been a heavy user of locus link to date. One issue is that > >> >>>>> the > >> >>>>> files that correspond to LL_tmpl (which was a flat file) are now in > >> >>>>> asn > >> >>>>> format > >> >>>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/help/ > >> >>>>> genehelp.html#query > >> >>>>> Although I saw some mention of ASN support in Bioperl by googling, > >> >>>>> I > >> >>>>> can't seem to find any module that does this in the present > >> >>>>> distribution. What is the status on that? In any case, I will be > >> >>>>> working > >> >>>>> on this in the next month or two and if anything nice comes of it I > >> >>>>> will > >> >>>>> send it to you / BioPerpl. > >> >>>>> > >> >>>>> best wishes & happy holidays > >> >>>>> > >> >>>>> Peter > >> >>>>> > >> >>>>> On Tue, 2004-12-14 at 09:00, Hilmar Lapp wrote: > >> >>>>>> Since load_seqdatabase.pl will use bioperl's SeqIO parsers for > >> >>>>>> parsing > >> >>>>>> any input file, what you're asking is whether or not there is a > >> >>>>>> SeqIO > >> >>>>>> parser for NCBI Gene. > >> >>>>>> > >> >>>>>> The answer to that question is no, not yet. Anybody who feels > >> >>>>>> motivated > >> >>>>>> is welcome to give it a try ... Since I'll need it, I'll write the > >> >>>>>> parser if nobody else does within the next 3 months, but I'm not > >> >>>>>> going > >> >>>>>> to promise when exactly this will happen. > >> >>>>>> > >> >>>>>> -hilmar > >> >>>>>> > >> >>>>>> On Monday, December 13, 2004, at 08:03 AM, Law, Annie wrote: > >> >>>>>> > >> >>>>>>> Hi, > >> >>>>>>> > >> >>>>>>> I was wondering with regards to bioperl-db the scripts and schema > >> >>>>>>> and > >> >>>>>>> load_seqdatabase.pl has there been preparation for integration of > >> >>>>>>> Entrez > >> >>>>>>> gene information when locuslink is phased out? Or if it has > >> >>>>>>> already > >> >>>>>>> been > >> >>>>>>> changed could somebody point > >> >>>>>>> me to the documentation or changed code? > >> >>>>>>> > >> >>>>>>> Thanks, > >> >>>>>>> Annie. > >> >>>>>>> _______________________________________________ > >> >>>>>>> Bioperl-l mailing list > >> >>>>>>> Bioperl-l@portal.open-bio.org > >> >>>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >> >>>>>>> > >> >>>>>>> > >> >>>>> -- > >> >>>>> Peter N. Robinson > >> >>>>> peter.robinson@t-online.de > >> >>>>> peter.robinson@charite.de > >> >>>>> http://www.charite.de/ch/medgen/robinson/ > >> >>>>> > >> >>>>> > >> >>> -- > >> >>> Peter N. Robinson > >> >>> peter.robinson@t-online.de > >> >>> peter.robinson@charite.de > >> >>> http://www.charite.de/ch/medgen/robinson/ > >> >>> > >> >>> _______________________________________________ > >> >>> Bioperl-l mailing list > >> >>> Bioperl-l@portal.open-bio.org > >> >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >> >>> > >> >>> > >> >> -- > >> >> Jason Stajich > >> >> jason.stajich at duke.edu > >> >> http://www.duke.edu/~jes12/ > >> > -- > >> > Peter N. Robinson > >> > peter.robinson@t-online.de > >> > peter.robinson@charite.de > >> > http://www.charite.de/ch/medgen/robinson/ > >> > > >> > > >> -- > >> Jason Stajich > >> jason.stajich at duke.edu > >> http://www.duke.edu/~jes12/ > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l@portal.open-bio.org > >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >-- > >Peter N. Robinson > >peter.robinson@t-online.de > >peter.robinson@charite.de > >http://www.charite.de/ch/medgen/robinson/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Peter N. Robinson peter.robinson@t-online.de peter.robinson@charite.de http://www.charite.de/ch/medgen/robinson/ From michael.watson at bbsrc.ac.uk Fri Jan 7 05:11:13 2005 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Fri Jan 7 05:11:05 2005 Subject: [Bioperl-l] Error parsing blast results with blasttable Message-ID: <8975119BCD0AC5419D61A9CF1A923E95E89AA5@iahce2knas1.iah.bbsrc.reserved> Hi I have looked through the archives and this problem did come up once before, but without resolution (as far as I can see). I'm using bioperl-1.4 and NCBI blast with the -m option, using SearchIO and the blasttable format. What I see is this: ------------- EXCEPTION ------------- MSG: Undefined sub-sequence (3264,3268). Valid range = 3252 - 3268 STACK Bio::Search::HSP::HSPI::matches /usr/local/bioperl-1.4/Bio/Search/HSP/HSPI.pm:711 STACK (eval) /usr/local/bioperl-1.4/Bio/Search/SearchUtils.pm:365 STACK Bio::Search::SearchUtils::_adjust_contigs /usr/local/bioperl-1.4/Bio/Search/SearchUtils.pm:364 STACK Bio::Search::SearchUtils::tile_hsps /usr/local/bioperl-1.4/Bio/Search/SearchUtils.pm:170 STACK Bio::Search::Hit::GenericHit::start /usr/local/bioperl-1.4/Bio/Search/Hit/GenericHit.pm:899 STACK main::parse_blast ../split_and_blast.pl:65 STACK toplevel ../split_and_blast.pl:32 -------------------------------------- The code is as you would expect: while (my $result = $searchio->next_result) { while(my $hit = $result->next_hit) { my $start = $hit->start; And it is that call to $hit->start that sets off the whole trace. Any ideas? Thanks Mick Michael Watson Head of Informatics Institute for Animal Health, Compton Laboratory, Compton, Newbury, Berkshire RG20 7NN UK Phone : +44 (0)1635 578411 ext. 2535 Mobile: +44 (0)7990 827831 E-mail: michael.watson@bbsrc.ac.uk From Marc.Logghe at devgen.com Fri Jan 7 05:41:12 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Fri Jan 7 05:38:02 2005 Subject: [Bioperl-l] Error parsing blast results with blasttable Message-ID: > while (my $result = $searchio->next_result) { > while(my $hit = $result->next_hit) { > my $start = $hit->start; > > And it is that call to $hit->start that sets off the whole trace. > > Any ideas? Hi Mick, Have you tried one of these ?: my $start = $hit->start('sbjct'); # or 'query' or 'hit'. Latter is same as 'sbjct' or my $start = $hit->hsp->start('sbjct'); I think in all cases it defaults to 'query'. So it should not crash but give you the start position of the query. I am afraid I can't explain the crash, sorry. Marc From michael.watson at bbsrc.ac.uk Fri Jan 7 05:50:08 2005 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Fri Jan 7 05:49:00 2005 Subject: [Bioperl-l] Error parsing blast results with blasttable Message-ID: <8975119BCD0AC5419D61A9CF1A923E950121B91C@iahce2knas1.iah.bbsrc.reserved> Hi Having done some more tests with this: $hit->start() Actually returns a string which is the concatenation of query start and subject end! (btw I'm using the "-m 8" option) - surely this isn't the desired option???? If I change it to: $hit->start('query') Then I get the correct start back, but I still get the stack trace error. The two co-ordinate sets which cause the problem (3264-3268 and 3252-3268) are on adjacent lines in the file (3252-3268 is the next line after 3264-3268) and are to the SAME subject, ie they are two HSPs of the same hit (in theory) but they are to two VERY different parts of the query. I'm guessing the way blasttable handles multiple HSPs is causing the trouble. Mick -----Original Message----- From: Marc Logghe [mailto:Marc.Logghe@devgen.com] Sent: 07 January 2005 10:41 To: michael watson (IAH-C); Bioperl List Subject: RE: [Bioperl-l] Error parsing blast results with blasttable > while (my $result = $searchio->next_result) { > while(my $hit = $result->next_hit) { > my $start = $hit->start; > > And it is that call to $hit->start that sets off the whole trace. > > Any ideas? Hi Mick, Have you tried one of these ?: my $start = $hit->start('sbjct'); # or 'query' or 'hit'. Latter is same as 'sbjct' or my $start = $hit->hsp->start('sbjct'); I think in all cases it defaults to 'query'. So it should not crash but give you the start position of the query. I am afraid I can't explain the crash, sorry. Marc From nathanhaigh at ukonline.co.uk Fri Jan 7 06:39:16 2005 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Fri Jan 7 06:37:42 2005 Subject: [Bioperl-l] RE: SeqIO fails on masked sequences In-Reply-To: Message-ID: There appears to be an anomaly with Bio::Seq::fasta. If the SeqIO object's alphabet is set, next_seq() results in this being undef and then proceeds to guess the alphabet again, therefore this like the following do not work: my $seq_in = Bio::SeqIO->new(-format=>$format, -fh => \*DATA); $seq_in->alphabet('protein'); Should setting the SeqIO object's alphabet be honoured even if it is set to the wrong type or the sequences are not of that alphabet? I have a bug fix, that allows you to set the alphabet through the SeqIO object, but it doesn't do any sort of checking to see if all the seqs in the object are of the correct type. Essentially, the alphabet is set in one of the following ways: 1) if the SeqIO object is set using e.g. $seq_in->alphabet('dna'); all the seqs that belong to the $seq_in object obtain their alphabet from the SeqIO object, dna in this case, irrespective of whether or not it is actually protein. 2) If alphabet has not been set in this way, the first sequence is used to guess the alphabet of the SeqIO object, from which all the sequences obtain their alphabet. Possible limitations: 1) all seqs in the SeqIO object can only be of the same type - no testing done to see if this is not the case. Does this sound ok and reasonable? Nathan -----Original Message----- From: Brian Osborne [mailto:brian_osborne@cognia.com] Sent: 06 January 2005 12:25 To: nathanhaigh@ukonline.co.uk Subject: RE: SeqIO fails on masked sequences Nathan, The idea is that a sequence with a high proportion of X is more likely to be DNA than protein. The examples I had in mind are unfinished genomic sequence, and there are countless entries in Genbank/EMBL like this. So, someone wrote in and said that their genomic sequence was being characterized as protein since the fraction [gatc] was less than 85%, it was mostly X. By contrast, there are no protein sequences with X in them in these public databases, if I'm not mistaken. So I maintain that in the world of public databases this is the way to go. Now if you venture into the world of sequence analysis it's going to be a different story, since you'll likely mask protein with X, not N, obviously. May I ask, if this person knows his/her sequence is protein then why doesn't s/he set its alphabet to "protein"? Or why don't they mask with A or Z or O or something? They'll be problems either way. What is one's reference? Public sequence or the less well-defined set of possible sequences? Brian O. -----Original Message----- From: Nathan Haigh [mailto:nathanhaigh@ukonline.co.uk] Sent: Wednesday, January 05, 2005 7:38 PM To: 'Brian Osborne' Subject: FW: SeqIO fails on masked sequences You committed a change to Bio::PrimarySeq where 'X' was added to the class of characters that are stripped out of sequences in the _guess_alphabet subroutine. Do you know why sequences containing X were causing a problem, and why X was added to the class of chars? It's causing a problem for someone who has a sequence that containes all masked chars (i.e. all X's), which should still be "guessable" as protein. Cheers Nathan --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0501-0, 04/01/2005 Tested on: 06/01/2005 00:36:20 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com --- avast! Antivirus: Inbound message clean. Virus Database (VPS): 0501-0, 04/01/2005 Tested on: 07/01/2005 00:35:30 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0501-0, 04/01/2005 Tested on: 07/01/2005 11:39:14 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com From sdavis2 at mail.nih.gov Fri Jan 7 06:41:49 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Fri Jan 7 06:38:47 2005 Subject: [Bioperl-l] Entrez Gene and bioperl-db In-Reply-To: <1105080663.3142.16.camel@localhost.localdomain> References: <2ED9C47A-5898-11D9-AC01-000A959EB4C4@gmx.net> <1104792001.3186.17.camel@localhost.localdomain> <0F5A3AE4-5DDA-11D9-AA3C-000393C44276@duke.edu> <1104871954.3102.24.camel@localhost.localdomain> <1DA5FD5C-5E94-11D9-9C0C-000393C44276@duke.edu> <1105044266.3084.27.camel@localhost.localdomain> <1105080663.3142.16.camel@localhost.localdomain> Message-ID: <1E370C03-60A1-11D9-AD91-000D933565E8@mail.nih.gov> I think the power of bioperl is in dealing with entire Entrez Gene objects. For dealing with gene_info, gene2unigene, or generifs files in isolation, I'm not sure that an object model is necessary or efficient. However, as many of us do want to deal with Gene objects, I think that having a parser that constructs these rich objects is important. That said, I think there may be a NEED two parsers, one for species-specific ASN.1 files and one for the tab-delimited files. The ASN.1 parser fits the SeqIO model rather well, I would suppose, but is limited by the fact that each species must be downloaded and parsed separately. However, for the vast majority of folks dealing with only one or two species, the ease of downloading a single, self-contained file for a species or two of interest and passing the file through an ASN.1 Gene parser is quite appealing. Then, for the comparative genomicists or those with a need for more than a few species, the tab-delimited option could be made available for parsing the text files. Despite my second sentence above, I agree with Stefan that having a parser that deals with each text file in isolation (with the only required file being gene_info) is quite appealing, allowing the user to have a way to choose what files to parse and add to the object. (This is only important because of the number of Gene records and needing to complete the parse/object construction in a reasonable amount of time.) I know that having two parsers is not ideal (and that suggesting this is a bit of a cop-out), but NCBI has chosen a path that may necessitate both solutions to meet the needs of all users. I would also certainly be willing to help out. Sean On Jan 7, 2005, at 1:51 AM, Peter Robinson wrote: > Hi Stefan, > happy to team up with you for Entrez Gene parsing. Since gene2unigene > has entries of the form "geneid\tunigeneid", it didnt seem worth the > trouble putting this information into a Bio::Annotation object in > isolation. On the other hand, parsing multiple Entrez Gene files at > once > in order to synthesize various forms of infomration about an Entrez > Gene > id seemed to depart from the style of the rest of Bio::SeqIO code. > > Suggestions/thoughts, anyone? > > -peter > > On Fri, 2005-01-07 at 03:33, Stefan A Kirov wrote: >> Peter, >> Why unigene can't be added as Bio::Annotation object for example? >> Peter, >> would you mind if I give you a hand, as I am also doing some Entrez >> Gene >> DB parsing. >> Hilmar, >> Getting back to your post, I have some concern about automatic >> parsing of multiple files (if I got this right...). Say if one >> downloads >> the whole Entrez Gene stuff and all is OK I don't see why this can't >> be >> done. But if something goes wrong (and occasionally it will), it will >> be >> really hard for the user to understand he misses parts of the data. Of >> course this could be done through warnings, but what about people who >> intentionally parse part of the DB? I guess one can add something like >> -suppress_warning=>1/0. >> Another issue that comes to mind is the approach of a stream is fine >> for >> people with the whole DB on their minds. But of you need particular >> record, I guess you you could index the files, but this totally >> different >> game. Any volunteers? >> >> >> On Thu, 6 Jan 2005, Peter Robinson wrote: >> >>> Dear Bioperlers, >>> >>> I have started looking at writing some modules to parse the new >>> Entrez >>> gene, which is kind of an expanded LocusLink. The really interesting >>> files are species specific and are in the ASN.1 format, and I am >>> still >>> experimenting around with the best way of parsing them. To get >>> started, >>> I am looking at the tab-delimited flat files. It seems to me that it >>> would be interesting to be able to parse gene_info and gene2accession >>> using the Bio::SeqIO system, the other files such as gene2unigene >>> seem >>> less suited for this (the latter has just two entries which could be >>> parsed ad hoc easily enough). >>> >>> In any case, I am sending a proposed module Bio::SeqIO::geneinfo.pm >>> as >>> well as a test script (which contains a small excerpt of gene_info in >>> the data section) for comments and criticism to the list. I am >>> presently >>> working on another module for Bio::SeqIO::gene2accession and plan to >>> write a demo script using both modules to convert NCBI accession >>> numbers >>> to MGI accession numbers (which is something one might want to do in >>> order to use Gene Ontology for affymetrix data, although one needs >>> additional work for probesets which are only related to ESTs). >>> >>> For the moment it seemed better to just parse in the NCBI taxon id >>> into >>> the Bio::Species object (only this info is supplied by gene_info), >>> and >>> expect users who need the information to use the taxonomy support of >>> other Bioperl modules in their scripts. >>> >>> I will continue to work on parsing the species specific ASN.1 files, >>> but >>> I will be trying a combination of lex/yacc/C to do this. If that >>> works I >>> will look into trying perl support for lex/yacc for potential use in >>> Bioperl, but since I am not sure how long this will take me, I do not >>> want to scare off anyone else who would like to give this a shot. >>> >>> best, >>> peter >>> >>> >>> On Tue, 2005-01-04 at 22:03, Jason Stajich wrote: >>>> On Jan 4, 2005, at 3:52 PM, Peter Robinson wrote: >>>> >>>>> Hi Jason, >>>>> >>>>> thanks for the advice. It seems as if the documentation of >>>>> Bio::DB::Taxonomy is a bit out of sync. >>>>> my $db = new Bio::DB::Taxonomy(-source => 'flatfile' >>>>> -nodesfile => $nodesfile, >>>>> -namesfile => $namefile); >>>>> What does 'flatfile' refer to here? It is not apparent upon >>>>> looking at >>>>> the code for new. >>>>> >>>> See Bio::DB::Taxonomy::flatfile for more information. As I >>>> mentioned >>>> in the mail I sent, flatfile is for downloading the taxonomy DB from >>>> NCBI. This lets you run it locally using an indexed (BerkelyDB via >>>> DB_File) version of the file. >>>> >>>> You must need the most up-to-date verion of the modules - works fine >>>> for me for both the entrez and flatfile code, but you may have to >>>> upgrade off of the 1.4.0 release. Code from CVS or the bioperl-1.5 >>>> RC1 >>>> code should work fine. >>>> >>>> >>>> >>>>> I had somewhat better luck using the entrez version, but I got a >>>>> pretty amusing error >>>>> message: >>>>> >>>>> MSG: can't create a species object for Homo sapiens (human) >>>>> because it >>>>> isn't a species but is a '' instead >>>>> >>>>> ### >>>>> Full error and a dump of the script follow: >>>>> >>>>> my $db = new Bio::DB::Taxonomy(-source => 'entrez'); # >>>>> my $taxaid = $db->get_taxonid('Homo sapiens'); >>>>> my $species = $db->get_Taxonomy_Node(-taxonid => '9606'); >>>>> print Dumper($species); >>>>> >>>>> ### >>>>> >>>>> Use of uninitialized value in string eq at >>>>> /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 192. >>>>> Use of uninitialized value in sprintf at >>>>> /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 201. >>>>> >>>>> -------------------- WARNING --------------------- >>>>> MSG: can't create a species object for Homo sapiens (human) >>>>> because it >>>>> isn't a species but is a '' instead >>>>> --------------------------------------------------- >>>>> Use of uninitialized value in string eq at >>>>> /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 192. >>>>> Use of uninitialized value in sprintf at >>>>> /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 201. >>>>> >>>>> -------------------- WARNING --------------------- >>>>> MSG: can't create a species object for Homo sapiens (human) >>>>> because it >>>>> isn't a species but is a '' instead >>>>> --------------------------------------------------- >>>>> $VAR1 = { >>>>> 'TaxId' => '9606', >>>>> 'Division' => 'mammals', >>>>> 'GeneNumber' => '32775', >>>>> 'Rank' => 'species', >>>>> 'ProtNumber' => '247791', >>>>> 'ScientificName' => 'Homo sapiens', >>>>> 'CommonName' => 'human', >>>>> 'NucNumber' => '9025800', >>>>> 'GenNumber' => '25', >>>>> 'StructNumber' => '5638' >>>>> }; >>>>> peter@anna:~/programs/bioperlTest$ >>>>> >>>>> >>>>> --best, peter >>>>> >>>>> On Mon, 2005-01-03 at 23:51, Jason Stajich wrote: >>>>>> Bio::DB::Taxonomy is the factory code - it is pretty easy to get a >>>>>> species object (or equivalent) using this code. But you cannot >>>>>> (or >>>>>> could not when I wrote this, not sure of the current status) get >>>>>> the >>>>>> full classification from the NCBI taxonomy retrieval via cgi. >>>>>> i.e. >>>>>> you >>>>>> can only get genus and species for a taxon id and I don't know >>>>>> how to >>>>>> walk up the hierarchy using the web API. Earlier emails to NCBI >>>>>> seemed >>>>>> to indicate this is all they intended to provide, but not sure >>>>>> what >>>>>> the >>>>>> current status is. >>>>>> >>>>>> my $db = new Bio::DB::Taxonomy(-source => 'entrez'); # use NCBI >>>>>> Entrez >>>>>> over HTTP >>>>>> my $taxaid = $db->get_taxonid('Homo sapiens'); >>>>>> my $taxonnode = $db->get_Taxonomy_Node(-taxonid => '9606'); >>>>>> >>>>>> You can get the full classification if you use the >>>>>> Bio::DB::Taxonomy::flatfile factory which requires you to have >>>>>> downloaded the taxonomy db flatfile from NCBI. Since this is more >>>>>> reliable (and faster) it is what I have tended to use for grouping >>>>>> sets >>>>>> of seqDB search results, etc. >>>>>> >>>>>> -jason >>>>>> On Jan 3, 2005, at 5:40 PM, Peter Robinson wrote: >>>>>> >>>>>>> Hi Bioperlers, hi Hilmar, >>>>>>> >>>>>>> after some thinking I have embarked on a lex/yacc parser for the >>>>>>> Entrez >>>>>>> Gene ASN.1 format as the way of least resistance, although I am >>>>>>> not >>>>>>> sure >>>>>>> how that would fit in to BioPerl. If anyone is interested in >>>>>>> this (or >>>>>>> has a better idea of how to go about it..), please drop me a >>>>>>> line. >>>>>>> >>>>>>> In the meantime I have been looking at writing code to parse >>>>>>> some of >>>>>>> the >>>>>>> "easy" Entrez gene documents, starting off with gene_info. This >>>>>>> file >>>>>>> includes the NCBI taxon id for each entry. I would like to >>>>>>> convert >>>>>>> this >>>>>>> to a Bio::Species object to pass to the following >>>>>>> my $seq = $self->sequence_factory->create( >>>>>>> -verbose => $self->verbose(), >>>>>>> -accession_number => $geneID, >>>>>>> -desc => $description, >>>>>>> -display_id => $symbol, >>>>>>> -species => ??? >>>>>>> -annotation => $ann); >>>>>>> >>>>>>> and saw the Bio::Taxonomy::FactoryI code, which appears to want >>>>>>> to do >>>>>>> this sort of thing. However, the code for that is pretty >>>>>>> preliminary. >>>>>>> Is >>>>>>> anyone working on this at the moment? Or is there a better way of >>>>>>> doing >>>>>>> this (it seems a shame not to provide the actual species name if >>>>>>> one >>>>>>> has >>>>>>> the taxid...) >>>>>>> >>>>>>> best >>>>>>> >>>>>>> Peter >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, 2004-12-28 at 07:17, Hilmar Lapp wrote: >>>>>>>> Great to hear that someone is giving this a shot. Yes at this >>>>>>>> point >>>>>>>> is >>>>>>>> appears that NCBI is only offering the ASN.1, not a conversion >>>>>>>> to >>>>>>>> XML. >>>>>>>> Their asn2xml tool will not work with this ASN.1 format either, >>>>>>>> just >>>>>>>> checked it to be sure. They do seem to be mulling the option of >>>>>>>> XML >>>>>>>> though on the Gene FAQ. Maybe if enough people get in their ears >>>>>>>> they >>>>>>>> will spend some effort towards that. After all, the entrez gene >>>>>>>> web >>>>>>>> interface can display XML on demand - even though it looks >>>>>>>> fairly >>>>>>>> hideous. >>>>>>>> >>>>>>>> There is no ASN.1 support in bioperl at all. Also, ASN.1 >>>>>>>> support in >>>>>>>> perl is actually thin - there is Convert::ASN1 at version 0.18 >>>>>>>> two >>>>>>>> years ago that I could find ... doesn't make me feel warm and >>>>>>>> fuzzy. >>>>>>>> >>>>>>>> In the absence of any XML available from NCBI, gene_info might >>>>>>>> be >>>>>>>> the >>>>>>>> best start. An option could be to check for the presence of the >>>>>>>> other >>>>>>>> tab-delimited files and use those that are present. These are >>>>>>>> tab-delimited and hence the format itself is trivial so you can >>>>>>>> focus >>>>>>>> entirely on setting up a Bio::Seq plus annotation that's >>>>>>>> comparable/compatible to what the current SeqIO::locuslink does. >>>>>>>> >>>>>>>> My $0.02 (worth less and less almost every day). >>>>>>>> >>>>>>>> -hilmar >>>>>>>> >>>>>>>> On Thursday, December 23, 2004, at 10:51 AM, Peter Robinson >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I have been thinking about given a BioPerl EntrezGene parser a >>>>>>>>> try >>>>>>>>> since >>>>>>>>> I have been a heavy user of locus link to date. One issue is >>>>>>>>> that >>>>>>>>> the >>>>>>>>> files that correspond to LL_tmpl (which was a flat file) are >>>>>>>>> now in >>>>>>>>> asn >>>>>>>>> format >>>>>>>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/help/ >>>>>>>>> genehelp.html#query >>>>>>>>> Although I saw some mention of ASN support in Bioperl by >>>>>>>>> googling, >>>>>>>>> I >>>>>>>>> can't seem to find any module that does this in the present >>>>>>>>> distribution. What is the status on that? In any case, I will >>>>>>>>> be >>>>>>>>> working >>>>>>>>> on this in the next month or two and if anything nice comes of >>>>>>>>> it I >>>>>>>>> will >>>>>>>>> send it to you / BioPerpl. >>>>>>>>> >>>>>>>>> best wishes & happy holidays >>>>>>>>> >>>>>>>>> Peter >>>>>>>>> >>>>>>>>> On Tue, 2004-12-14 at 09:00, Hilmar Lapp wrote: >>>>>>>>>> Since load_seqdatabase.pl will use bioperl's SeqIO parsers for >>>>>>>>>> parsing >>>>>>>>>> any input file, what you're asking is whether or not there is >>>>>>>>>> a >>>>>>>>>> SeqIO >>>>>>>>>> parser for NCBI Gene. >>>>>>>>>> >>>>>>>>>> The answer to that question is no, not yet. Anybody who feels >>>>>>>>>> motivated >>>>>>>>>> is welcome to give it a try ... Since I'll need it, I'll >>>>>>>>>> write the >>>>>>>>>> parser if nobody else does within the next 3 months, but I'm >>>>>>>>>> not >>>>>>>>>> going >>>>>>>>>> to promise when exactly this will happen. >>>>>>>>>> >>>>>>>>>> -hilmar >>>>>>>>>> >>>>>>>>>> On Monday, December 13, 2004, at 08:03 AM, Law, Annie wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I was wondering with regards to bioperl-db the scripts and >>>>>>>>>>> schema >>>>>>>>>>> and >>>>>>>>>>> load_seqdatabase.pl has there been preparation for >>>>>>>>>>> integration of >>>>>>>>>>> Entrez >>>>>>>>>>> gene information when locuslink is phased out? Or if it has >>>>>>>>>>> already >>>>>>>>>>> been >>>>>>>>>>> changed could somebody point >>>>>>>>>>> me to the documentation or changed code? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Annie. >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Bioperl-l mailing list >>>>>>>>>>> Bioperl-l@portal.open-bio.org >>>>>>>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> -- >>>>>>>>> Peter N. Robinson >>>>>>>>> peter.robinson@t-online.de >>>>>>>>> peter.robinson@charite.de >>>>>>>>> http://www.charite.de/ch/medgen/robinson/ >>>>>>>>> >>>>>>>>> >>>>>>> -- >>>>>>> Peter N. Robinson >>>>>>> peter.robinson@t-online.de >>>>>>> peter.robinson@charite.de >>>>>>> http://www.charite.de/ch/medgen/robinson/ >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l@portal.open-bio.org >>>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>>> -- >>>>>> Jason Stajich >>>>>> jason.stajich at duke.edu >>>>>> http://www.duke.edu/~jes12/ >>>>> -- >>>>> Peter N. Robinson >>>>> peter.robinson@t-online.de >>>>> peter.robinson@charite.de >>>>> http://www.charite.de/ch/medgen/robinson/ >>>>> >>>>> >>>> -- >>>> Jason Stajich >>>> jason.stajich at duke.edu >>>> http://www.duke.edu/~jes12/ >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l@portal.open-bio.org >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> -- >>> Peter N. Robinson >>> peter.robinson@t-online.de >>> peter.robinson@charite.de >>> http://www.charite.de/ch/medgen/robinson/ >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- > Peter N. Robinson > peter.robinson@t-online.de > peter.robinson@charite.de > http://www.charite.de/ch/medgen/robinson/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Fri Jan 7 08:03:47 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Jan 7 08:00:43 2005 Subject: [Bioperl-l] Error parsing blast results with blasttable In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E950121B91C@iahce2knas1.iah.bbsrc.reserved> References: <8975119BCD0AC5419D61A9CF1A923E950121B91C@iahce2knas1.iah.bbsrc.reserved> Message-ID: <91955B6C-60AC-11D9-ACAB-000393C44276@duke.edu> $hit->start is a convience function which first tiles the HSPs and then gives you the smallest start. If you look at the documentation for the method you see that not giving it a type will give you start in the query and hit. Usage : $sbjct->start( [seq_type] ); Purpose : Gets the start coordinate for the query, sbjct, or both sequences : in the BlastHit object. If there is more than one HSP, the lowest start : value of all HSPs is returned. Example : $qbeg = $sbjct->start('query'); : $sbeg = $sbjct->start('hit'); : ($qbeg, $sbeg) = $sbjct->start(); Returns : scalar context: integer : array context without args: list of two integers (queryStart, sbjctStart) : Array context can be "induced" by providing an argument of 'list' or 'array'. Argument : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' (default = 'query') ('sbjct' is synonymous with 'hit') Throws : n/a Comments : This method requires that all HSPs be tiled. If there is more than one : HSP and they have not already been tiled, they will be tiled first automatically.. : Remember that the start and end coordinates of all HSPs are : normalized so that start < end. Strand information can be : obtained by calling $hit->strand(). I don't know why you are seeing concatenated positions unless you are somehow getting it in array context and then turning it into a string. I really don't use this, if I want tiled HSPs I use WU-BLAST with the -links and build compatible HSP groups. What are you trying to get - the smallest hit or query start? Just the start/end for HSPs? If this is somehow a blasttable specific problem will try and see if can figure out why. -jason On Jan 7, 2005, at 5:50 AM, michael watson ((IAH-C)) wrote: > Hi > > Having done some more tests with this: > > $hit->start() > > Actually returns a string which is the concatenation of query start and > subject end! (btw I'm using the "-m 8" option) - surely this isn't the > desired option???? > > If I change it to: > > $hit->start('query') > > Then I get the correct start back, but I still get the stack trace > error. > > The two co-ordinate sets which cause the problem (3264-3268 and > 3252-3268) are on adjacent lines in the file (3252-3268 is the next > line > after 3264-3268) and are to the SAME subject, ie they are two HSPs of > the same hit (in theory) but they are to two VERY different parts of > the > query. > > I'm guessing the way blasttable handles multiple HSPs is causing the > trouble. > > Mick > > -----Original Message----- > From: Marc Logghe [mailto:Marc.Logghe@devgen.com] > Sent: 07 January 2005 10:41 > To: michael watson (IAH-C); Bioperl List > Subject: RE: [Bioperl-l] Error parsing blast results with blasttable > > >> while (my $result = $searchio->next_result) { >> while(my $hit = $result->next_hit) { >> my $start = $hit->start; >> >> And it is that call to $hit->start that sets off the whole trace. >> >> Any ideas? > > > Hi Mick, > Have you tried one of these ?: > > my $start = $hit->start('sbjct'); # or 'query' or 'hit'. Latter is > same > as 'sbjct' > > or > > my $start = $hit->hsp->start('sbjct'); > > > I think in all cases it defaults to 'query'. So it should not crash but > give you the start position of the query. I am afraid I can't explain > the crash, sorry. > > Marc > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From michael.watson at bbsrc.ac.uk Fri Jan 7 08:21:25 2005 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Fri Jan 7 08:20:40 2005 Subject: [Bioperl-l] Error parsing blast results with blasttable Message-ID: <8975119BCD0AC5419D61A9CF1A923E95E89AB3@iahce2knas1.iah.bbsrc.reserved> Hi I submitted a bug which contains some blasttable output, example code, and the error produced. Mick -----Original Message----- From: Jason Stajich [mailto:jason.stajich@duke.edu] Sent: 07 January 2005 13:04 To: michael watson (IAH-C) Cc: Bioperl List; Marc Logghe Subject: Re: [Bioperl-l] Error parsing blast results with blasttable $hit->start is a convience function which first tiles the HSPs and then gives you the smallest start. If you look at the documentation for the method you see that not giving it a type will give you start in the query and hit. Usage : $sbjct->start( [seq_type] ); Purpose : Gets the start coordinate for the query, sbjct, or both sequences : in the BlastHit object. If there is more than one HSP, the lowest start : value of all HSPs is returned. Example : $qbeg = $sbjct->start('query'); : $sbeg = $sbjct->start('hit'); : ($qbeg, $sbeg) = $sbjct->start(); Returns : scalar context: integer : array context without args: list of two integers (queryStart, sbjctStart) : Array context can be "induced" by providing an argument of 'list' or 'array'. Argument : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' (default = 'query') ('sbjct' is synonymous with 'hit') Throws : n/a Comments : This method requires that all HSPs be tiled. If there is more than one : HSP and they have not already been tiled, they will be tiled first automatically.. : Remember that the start and end coordinates of all HSPs are : normalized so that start < end. Strand information can be : obtained by calling $hit->strand(). I don't know why you are seeing concatenated positions unless you are somehow getting it in array context and then turning it into a string. I really don't use this, if I want tiled HSPs I use WU-BLAST with the -links and build compatible HSP groups. What are you trying to get - the smallest hit or query start? Just the start/end for HSPs? If this is somehow a blasttable specific problem will try and see if can figure out why. -jason On Jan 7, 2005, at 5:50 AM, michael watson ((IAH-C)) wrote: > Hi > > Having done some more tests with this: > > $hit->start() > > Actually returns a string which is the concatenation of query start > and subject end! (btw I'm using the "-m 8" option) - surely this > isn't the desired option???? > > If I change it to: > > $hit->start('query') > > Then I get the correct start back, but I still get the stack trace > error. > > The two co-ordinate sets which cause the problem (3264-3268 and > 3252-3268) are on adjacent lines in the file (3252-3268 is the next > line > after 3264-3268) and are to the SAME subject, ie they are two HSPs of > the same hit (in theory) but they are to two VERY different parts of > the > query. > > I'm guessing the way blasttable handles multiple HSPs is causing the > trouble. > > Mick > > -----Original Message----- > From: Marc Logghe [mailto:Marc.Logghe@devgen.com] > Sent: 07 January 2005 10:41 > To: michael watson (IAH-C); Bioperl List > Subject: RE: [Bioperl-l] Error parsing blast results with blasttable > > >> while (my $result = $searchio->next_result) { >> while(my $hit = $result->next_hit) { >> my $start = $hit->start; >> >> And it is that call to $hit->start that sets off the whole trace. >> >> Any ideas? > > > Hi Mick, > Have you tried one of these ?: > > my $start = $hit->start('sbjct'); # or 'query' or 'hit'. Latter is > same > as 'sbjct' > > or > > my $start = $hit->hsp->start('sbjct'); > > > I think in all cases it defaults to 'query'. So it should not crash > but give you the start position of the query. I am afraid I can't > explain the crash, sorry. > > Marc > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From jason.stajich at duke.edu Fri Jan 7 08:29:51 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Jan 7 08:26:22 2005 Subject: [Bioperl-l] Error parsing blast results with blasttable In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E95E89AB3@iahce2knas1.iah.bbsrc.reserved> References: <8975119BCD0AC5419D61A9CF1A923E95E89AB3@iahce2knas1.iah.bbsrc.reserved> Message-ID: <358EC678-60B0-11D9-ACAB-000393C44276@duke.edu> Right - but back to my question - what do you want to be getting out? Do you want the smallest HSP start position if you are calling $hit->start('query')? Are you hoping for fancy HSP tiling? I'm pretty sure the problem you are showing has to do with being unable to build a single compatible tiling path for a set of HSPs. I just think the code for doing this is just too brittle. There may also be a bug since the blasttable parser has less data available too it and that may be the cause as well, so will have to be investigated nonetheless. -jason On Jan 7, 2005, at 8:21 AM, michael watson ((IAH-C)) wrote: > Hi > > I submitted a bug which contains some blasttable output, example code, > and the error produced. > > Mick > > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich@duke.edu] > Sent: 07 January 2005 13:04 > To: michael watson (IAH-C) > Cc: Bioperl List; Marc Logghe > Subject: Re: [Bioperl-l] Error parsing blast results with blasttable > > > $hit->start is a convience function which first tiles the HSPs and then > gives you the smallest start. > > If you look at the documentation for the method you see that not giving > it a type will give you start in the query and hit. > > Usage : $sbjct->start( [seq_type] ); > Purpose : Gets the start coordinate for the query, sbjct, or both > sequences > : in the BlastHit object. If there is more than one HSP, > the > > lowest start > : value of all HSPs is returned. > Example : $qbeg = $sbjct->start('query'); > : $sbeg = $sbjct->start('hit'); > : ($qbeg, $sbeg) = $sbjct->start(); > Returns : scalar context: integer > : array context without args: list of two integers > (queryStart, sbjctStart) > : Array context can be "induced" by providing an argument > of > > 'list' or 'array'. > Argument : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' > (default = 'query') > ('sbjct' is synonymous with 'hit') > Throws : n/a > Comments : This method requires that all HSPs be tiled. If there is > more than one > : HSP and they have not already been tiled, they will be > tiled first automatically.. > : Remember that the start and end coordinates of all HSPs > are > : normalized so that start < end. Strand information can be > : obtained by calling $hit->strand(). > > I don't know why you are seeing concatenated positions unless you are > somehow getting it in array context and then turning it into a string. > > > I really don't use this, if I want tiled HSPs I use WU-BLAST with the > -links and build compatible HSP groups. > > What are you trying to get - the smallest hit or query start? Just the > start/end for HSPs? > > If this is somehow a blasttable specific problem will try and see if > can figure out why. > > -jason > > On Jan 7, 2005, at 5:50 AM, michael watson ((IAH-C)) wrote: > >> Hi >> >> Having done some more tests with this: >> >> $hit->start() >> >> Actually returns a string which is the concatenation of query start >> and subject end! (btw I'm using the "-m 8" option) - surely this >> isn't the desired option???? >> >> If I change it to: >> >> $hit->start('query') >> >> Then I get the correct start back, but I still get the stack trace >> error. >> >> The two co-ordinate sets which cause the problem (3264-3268 and >> 3252-3268) are on adjacent lines in the file (3252-3268 is the next >> line >> after 3264-3268) and are to the SAME subject, ie they are two HSPs of >> the same hit (in theory) but they are to two VERY different parts of >> the >> query. >> >> I'm guessing the way blasttable handles multiple HSPs is causing the >> trouble. >> >> Mick >> >> -----Original Message----- >> From: Marc Logghe [mailto:Marc.Logghe@devgen.com] >> Sent: 07 January 2005 10:41 >> To: michael watson (IAH-C); Bioperl List >> Subject: RE: [Bioperl-l] Error parsing blast results with blasttable >> >> >>> while (my $result = $searchio->next_result) { >>> while(my $hit = $result->next_hit) { >>> my $start = $hit->start; >>> >>> And it is that call to $hit->start that sets off the whole trace. >>> >>> Any ideas? >> >> >> Hi Mick, >> Have you tried one of these ?: >> >> my $start = $hit->start('sbjct'); # or 'query' or 'hit'. Latter is >> same >> as 'sbjct' >> >> or >> >> my $start = $hit->hsp->start('sbjct'); >> >> >> I think in all cases it defaults to 'query'. So it should not crash >> but give you the start position of the query. I am afraid I can't >> explain the crash, sorry. >> >> Marc >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From michael.watson at bbsrc.ac.uk Fri Jan 7 08:34:21 2005 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Fri Jan 7 08:31:57 2005 Subject: [Bioperl-l] Error parsing blast results with blasttable Message-ID: <8975119BCD0AC5419D61A9CF1A923E95E89AB4@iahce2knas1.iah.bbsrc.reserved> Actually, the very first answer from Marc solved a bug in my script - I'm simply trying to get the query start and end of the HSPs, I don't need them to be linked together into a coherent hit object with multiple HSPs, I'd be happy with them separate. I'm not trying to do anything fancy, just mark up the HSPs as features on the query sequence. When running the exact same script using "blast" format instead of "blasttable" I get exactly what I need - I was trying to use blasttable for efficiency sake though. Thanks Mick -----Original Message----- From: Jason Stajich [mailto:jason.stajich@duke.edu] Sent: 07 January 2005 13:30 To: michael watson (IAH-C) Cc: Bioperl List; Marc Logghe Subject: Re: [Bioperl-l] Error parsing blast results with blasttable Right - but back to my question - what do you want to be getting out? Do you want the smallest HSP start position if you are calling $hit->start('query')? Are you hoping for fancy HSP tiling? I'm pretty sure the problem you are showing has to do with being unable to build a single compatible tiling path for a set of HSPs. I just think the code for doing this is just too brittle. There may also be a bug since the blasttable parser has less data available too it and that may be the cause as well, so will have to be investigated nonetheless. -jason On Jan 7, 2005, at 8:21 AM, michael watson ((IAH-C)) wrote: > Hi > > I submitted a bug which contains some blasttable output, example code, > and the error produced. > > Mick > > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich@duke.edu] > Sent: 07 January 2005 13:04 > To: michael watson (IAH-C) > Cc: Bioperl List; Marc Logghe > Subject: Re: [Bioperl-l] Error parsing blast results with blasttable > > > $hit->start is a convience function which first tiles the HSPs and > then gives you the smallest start. > > If you look at the documentation for the method you see that not > giving it a type will give you start in the query and hit. > > Usage : $sbjct->start( [seq_type] ); > Purpose : Gets the start coordinate for the query, sbjct, or both > sequences > : in the BlastHit object. If there is more than one HSP, > the > > lowest start > : value of all HSPs is returned. > Example : $qbeg = $sbjct->start('query'); > : $sbeg = $sbjct->start('hit'); > : ($qbeg, $sbeg) = $sbjct->start(); > Returns : scalar context: integer > : array context without args: list of two integers > (queryStart, sbjctStart) > : Array context can be "induced" by providing an argument > of > > 'list' or 'array'. > Argument : In scalar context: seq_type = 'query' or 'hit' or > 'sbjct' (default = 'query') > ('sbjct' is synonymous with 'hit') > Throws : n/a > Comments : This method requires that all HSPs be tiled. If there is > more than one > : HSP and they have not already been tiled, they will be > tiled first automatically.. > : Remember that the start and end coordinates of all HSPs > are > : normalized so that start < end. Strand information can be > : obtained by calling $hit->strand(). > > I don't know why you are seeing concatenated positions unless you are > somehow getting it in array context and then turning it into a string. > > > I really don't use this, if I want tiled HSPs I use WU-BLAST with the > -links and build compatible HSP groups. > > What are you trying to get - the smallest hit or query start? Just the > start/end for HSPs? > > If this is somehow a blasttable specific problem will try and see if > can figure out why. > > -jason > > On Jan 7, 2005, at 5:50 AM, michael watson ((IAH-C)) wrote: > >> Hi >> >> Having done some more tests with this: >> >> $hit->start() >> >> Actually returns a string which is the concatenation of query start >> and subject end! (btw I'm using the "-m 8" option) - surely this >> isn't the desired option???? >> >> If I change it to: >> >> $hit->start('query') >> >> Then I get the correct start back, but I still get the stack trace >> error. >> >> The two co-ordinate sets which cause the problem (3264-3268 and >> 3252-3268) are on adjacent lines in the file (3252-3268 is the next >> line after 3264-3268) and are to the SAME subject, ie they are two >> HSPs of the same hit (in theory) but they are to two VERY different >> parts of the >> query. >> >> I'm guessing the way blasttable handles multiple HSPs is causing the >> trouble. >> >> Mick >> >> -----Original Message----- >> From: Marc Logghe [mailto:Marc.Logghe@devgen.com] >> Sent: 07 January 2005 10:41 >> To: michael watson (IAH-C); Bioperl List >> Subject: RE: [Bioperl-l] Error parsing blast results with blasttable >> >> >>> while (my $result = $searchio->next_result) { >>> while(my $hit = $result->next_hit) { >>> my $start = $hit->start; >>> >>> And it is that call to $hit->start that sets off the whole trace. >>> >>> Any ideas? >> >> >> Hi Mick, >> Have you tried one of these ?: >> >> my $start = $hit->start('sbjct'); # or 'query' or 'hit'. Latter is >> same as 'sbjct' >> >> or >> >> my $start = $hit->hsp->start('sbjct'); >> >> >> I think in all cases it defaults to 'query'. So it should not crash >> but give you the start position of the query. I am afraid I can't >> explain the crash, sorry. >> >> Marc >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From jason.stajich at duke.edu Fri Jan 7 08:48:30 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Jan 7 08:44:58 2005 Subject: [Bioperl-l] Error parsing blast results with blasttable In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E95E89AB4@iahce2knas1.iah.bbsrc.reserved> References: <8975119BCD0AC5419D61A9CF1A923E95E89AB4@iahce2knas1.iah.bbsrc.reserved> Message-ID: On Jan 7, 2005, at 8:34 AM, michael watson ((IAH-C)) wrote: > Actually, the very first answer from Marc solved a bug in my script - > I'm simply trying to get the query start and end of the HSPs, I don't > need them to be linked together into a coherent hit object with > multiple > HSPs, I'd be happy with them separate. I'm not trying to do anything > fancy, just mark up the HSPs as features on the query sequence. > > When running the exact same script using "blast" format instead of > "blasttable" I get exactly what I need - I was trying to use blasttable > for efficiency sake though. > Sure. that is the right thing to do. but calling $hit->start $hit->end is not really doing what you want so don't use it. If you want the start and end of HSPs you need to be calling that on the HSPs themselves. If you look at the searchIO howto you'll see this construct while (my $result = $searchio->next_result) { while(my $hit = $result->next_hit) { while( my $hsp = $hit->next_hsp) { print $hsp->query->stary, " ",$hsp->query->end, "\n"; } } } That will work and get you the HSP start and end for the query. use $hsp->hit->start and $hsp->hit->end to get the start/end of the hit sequence coordinates in the HSP. -jason > Thanks > Mick > > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich@duke.edu] > Sent: 07 January 2005 13:30 > To: michael watson (IAH-C) > Cc: Bioperl List; Marc Logghe > Subject: Re: [Bioperl-l] Error parsing blast results with blasttable > > > Right - but back to my question - what do you want to be getting out? > Do you want the smallest HSP start position if you are calling > $hit->start('query')? Are you hoping for fancy HSP tiling? > > I'm pretty sure the problem you are showing has to do with being unable > to build a single compatible tiling path for a set of HSPs. I just > think the code for doing this is just too brittle. There may also be a > bug since the blasttable parser has less data available too it and that > may be the cause as well, so will have to be investigated nonetheless. > > -jason > On Jan 7, 2005, at 8:21 AM, michael watson ((IAH-C)) wrote: > >> Hi >> >> I submitted a bug which contains some blasttable output, example code, > >> and the error produced. >> >> Mick >> >> -----Original Message----- >> From: Jason Stajich [mailto:jason.stajich@duke.edu] >> Sent: 07 January 2005 13:04 >> To: michael watson (IAH-C) >> Cc: Bioperl List; Marc Logghe >> Subject: Re: [Bioperl-l] Error parsing blast results with blasttable >> >> >> $hit->start is a convience function which first tiles the HSPs and >> then gives you the smallest start. >> >> If you look at the documentation for the method you see that not >> giving it a type will give you start in the query and hit. >> >> Usage : $sbjct->start( [seq_type] ); >> Purpose : Gets the start coordinate for the query, sbjct, or both >> sequences >> : in the BlastHit object. If there is more than one HSP, >> the >> >> lowest start >> : value of all HSPs is returned. >> Example : $qbeg = $sbjct->start('query'); >> : $sbeg = $sbjct->start('hit'); >> : ($qbeg, $sbeg) = $sbjct->start(); >> Returns : scalar context: integer >> : array context without args: list of two integers >> (queryStart, sbjctStart) >> : Array context can be "induced" by providing an argument >> of >> >> 'list' or 'array'. >> Argument : In scalar context: seq_type = 'query' or 'hit' or >> 'sbjct' (default = 'query') >> ('sbjct' is synonymous with 'hit') >> Throws : n/a >> Comments : This method requires that all HSPs be tiled. If there is > >> more than one >> : HSP and they have not already been tiled, they will be >> tiled first automatically.. >> : Remember that the start and end coordinates of all HSPs >> are >> : normalized so that start < end. Strand information can > be >> : obtained by calling $hit->strand(). >> >> I don't know why you are seeing concatenated positions unless you are >> somehow getting it in array context and then turning it into a string. >> >> >> I really don't use this, if I want tiled HSPs I use WU-BLAST with the >> -links and build compatible HSP groups. >> >> What are you trying to get - the smallest hit or query start? Just the > >> start/end for HSPs? >> >> If this is somehow a blasttable specific problem will try and see if >> can figure out why. >> >> -jason >> >> On Jan 7, 2005, at 5:50 AM, michael watson ((IAH-C)) wrote: >> >>> Hi >>> >>> Having done some more tests with this: >>> >>> $hit->start() >>> >>> Actually returns a string which is the concatenation of query start >>> and subject end! (btw I'm using the "-m 8" option) - surely this >>> isn't the desired option???? >>> >>> If I change it to: >>> >>> $hit->start('query') >>> >>> Then I get the correct start back, but I still get the stack trace >>> error. >>> >>> The two co-ordinate sets which cause the problem (3264-3268 and >>> 3252-3268) are on adjacent lines in the file (3252-3268 is the next >>> line after 3264-3268) and are to the SAME subject, ie they are two >>> HSPs of the same hit (in theory) but they are to two VERY different >>> parts of the >>> query. >>> >>> I'm guessing the way blasttable handles multiple HSPs is causing the >>> trouble. >>> >>> Mick >>> >>> -----Original Message----- >>> From: Marc Logghe [mailto:Marc.Logghe@devgen.com] >>> Sent: 07 January 2005 10:41 >>> To: michael watson (IAH-C); Bioperl List >>> Subject: RE: [Bioperl-l] Error parsing blast results with blasttable >>> >>> >>>> while (my $result = $searchio->next_result) { >>>> while(my $hit = $result->next_hit) { >>>> my $start = $hit->start; >>>> >>>> And it is that call to $hit->start that sets off the whole trace. >>>> >>>> Any ideas? >>> >>> >>> Hi Mick, >>> Have you tried one of these ?: >>> >>> my $start = $hit->start('sbjct'); # or 'query' or 'hit'. Latter is >>> same as 'sbjct' >>> >>> or >>> >>> my $start = $hit->hsp->start('sbjct'); >>> >>> >>> I think in all cases it defaults to 'query'. So it should not crash >>> but give you the start position of the query. I am afraid I can't >>> explain the crash, sorry. >>> >>> Marc >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> -- >> Jason Stajich >> jason.stajich at duke.edu >> http://www.duke.edu/~jes12/ >> >> > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From michael.watson at bbsrc.ac.uk Fri Jan 7 08:51:18 2005 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Fri Jan 7 08:48:50 2005 Subject: [Bioperl-l] Error parsing blast results with blasttable Message-ID: <8975119BCD0AC5419D61A9CF1A923E95E89AB7@iahce2knas1.iah.bbsrc.reserved> OK, I was under the impression (mistakenly) that hits from blasttable output didn't have HSPs. -----Original Message----- From: Jason Stajich [mailto:jason.stajich@duke.edu] Sent: 07 January 2005 13:49 To: michael watson (IAH-C) Cc: Bioperl List; Marc Logghe Subject: Re: [Bioperl-l] Error parsing blast results with blasttable On Jan 7, 2005, at 8:34 AM, michael watson ((IAH-C)) wrote: > Actually, the very first answer from Marc solved a bug in my script - > I'm simply trying to get the query start and end of the HSPs, I don't > need them to be linked together into a coherent hit object with > multiple HSPs, I'd be happy with them separate. I'm not trying to do > anything fancy, just mark up the HSPs as features on the query > sequence. > > When running the exact same script using "blast" format instead of > "blasttable" I get exactly what I need - I was trying to use > blasttable for efficiency sake though. > Sure. that is the right thing to do. but calling $hit->start $hit->end is not really doing what you want so don't use it. If you want the start and end of HSPs you need to be calling that on the HSPs themselves. If you look at the searchIO howto you'll see this construct while (my $result = $searchio->next_result) { while(my $hit = $result->next_hit) { while( my $hsp = $hit->next_hsp) { print $hsp->query->stary, " ",$hsp->query->end, "\n"; } } } That will work and get you the HSP start and end for the query. use $hsp->hit->start and $hsp->hit->end to get the start/end of the hit sequence coordinates in the HSP. -jason > Thanks > Mick > > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich@duke.edu] > Sent: 07 January 2005 13:30 > To: michael watson (IAH-C) > Cc: Bioperl List; Marc Logghe > Subject: Re: [Bioperl-l] Error parsing blast results with blasttable > > > Right - but back to my question - what do you want to be getting out? > Do you want the smallest HSP start position if you are calling > $hit->start('query')? Are you hoping for fancy HSP tiling? > > I'm pretty sure the problem you are showing has to do with being > unable to build a single compatible tiling path for a set of HSPs. I > just think the code for doing this is just too brittle. There may > also be a bug since the blasttable parser has less data available too > it and that may be the cause as well, so will have to be investigated > nonetheless. > > -jason > On Jan 7, 2005, at 8:21 AM, michael watson ((IAH-C)) wrote: > >> Hi >> >> I submitted a bug which contains some blasttable output, example >> code, > >> and the error produced. >> >> Mick >> >> -----Original Message----- >> From: Jason Stajich [mailto:jason.stajich@duke.edu] >> Sent: 07 January 2005 13:04 >> To: michael watson (IAH-C) >> Cc: Bioperl List; Marc Logghe >> Subject: Re: [Bioperl-l] Error parsing blast results with blasttable >> >> >> $hit->start is a convience function which first tiles the HSPs and >> then gives you the smallest start. >> >> If you look at the documentation for the method you see that not >> giving it a type will give you start in the query and hit. >> >> Usage : $sbjct->start( [seq_type] ); >> Purpose : Gets the start coordinate for the query, sbjct, or both >> sequences >> : in the BlastHit object. If there is more than one HSP, >> the >> >> lowest start >> : value of all HSPs is returned. >> Example : $qbeg = $sbjct->start('query'); >> : $sbeg = $sbjct->start('hit'); >> : ($qbeg, $sbeg) = $sbjct->start(); >> Returns : scalar context: integer >> : array context without args: list of two integers >> (queryStart, sbjctStart) >> : Array context can be "induced" by providing an argument >> of >> >> 'list' or 'array'. >> Argument : In scalar context: seq_type = 'query' or 'hit' or >> 'sbjct' (default = 'query') >> ('sbjct' is synonymous with 'hit') >> Throws : n/a >> Comments : This method requires that all HSPs be tiled. If there >> is > >> more than one >> : HSP and they have not already been tiled, they will be >> tiled first automatically.. >> : Remember that the start and end coordinates of all HSPs >> are >> : normalized so that start < end. Strand information can > be >> : obtained by calling $hit->strand(). >> >> I don't know why you are seeing concatenated positions unless you are >> somehow getting it in array context and then turning it into a >> string. >> >> >> I really don't use this, if I want tiled HSPs I use WU-BLAST with the >> -links and build compatible HSP groups. >> >> What are you trying to get - the smallest hit or query start? Just >> the > >> start/end for HSPs? >> >> If this is somehow a blasttable specific problem will try and see if >> can figure out why. >> >> -jason >> >> On Jan 7, 2005, at 5:50 AM, michael watson ((IAH-C)) wrote: >> >>> Hi >>> >>> Having done some more tests with this: >>> >>> $hit->start() >>> >>> Actually returns a string which is the concatenation of query start >>> and subject end! (btw I'm using the "-m 8" option) - surely this >>> isn't the desired option???? >>> >>> If I change it to: >>> >>> $hit->start('query') >>> >>> Then I get the correct start back, but I still get the stack trace >>> error. >>> >>> The two co-ordinate sets which cause the problem (3264-3268 and >>> 3252-3268) are on adjacent lines in the file (3252-3268 is the next >>> line after 3264-3268) and are to the SAME subject, ie they are two >>> HSPs of the same hit (in theory) but they are to two VERY different >>> parts of the query. >>> >>> I'm guessing the way blasttable handles multiple HSPs is causing the >>> trouble. >>> >>> Mick >>> >>> -----Original Message----- >>> From: Marc Logghe [mailto:Marc.Logghe@devgen.com] >>> Sent: 07 January 2005 10:41 >>> To: michael watson (IAH-C); Bioperl List >>> Subject: RE: [Bioperl-l] Error parsing blast results with blasttable >>> >>> >>>> while (my $result = $searchio->next_result) { >>>> while(my $hit = $result->next_hit) { >>>> my $start = $hit->start; >>>> >>>> And it is that call to $hit->start that sets off the whole trace. >>>> >>>> Any ideas? >>> >>> >>> Hi Mick, >>> Have you tried one of these ?: >>> >>> my $start = $hit->start('sbjct'); # or 'query' or 'hit'. Latter is >>> same as 'sbjct' >>> >>> or >>> >>> my $start = $hit->hsp->start('sbjct'); >>> >>> >>> I think in all cases it defaults to 'query'. So it should not crash >>> but give you the start position of the query. I am afraid I can't >>> explain the crash, sorry. >>> >>> Marc >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> -- >> Jason Stajich >> jason.stajich at duke.edu >> http://www.duke.edu/~jes12/ >> >> > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From golharam at umdnj.edu Fri Jan 7 14:36:35 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Fri Jan 7 14:28:01 2005 Subject: [Bioperl-l] libgd Message-ID: <005d01c4f4f0$3349f5f0$a6028a0a@GOLHARMOBILE1> Hi all, Not sure where this should be posted, so forgive me if I'm posting it in the wrong place. I'm trying to use Bioperl with RedHat Enterprise Linux v3. RedHat provides gd v1.8, however bioperl requires > 2.something. So, I downloaded gd 2.0.33 and rebuilt it using a spec file I found from verion 2.0.16 (I think). When I tried upgrading the package using: [root@hydrogen i386]# rpm -Uvh gd-2.0.33-1.i386.rpm gd-devel-2.0.33-1.i386.rpm I get the error: error: Failed dependencies: libgd.so.1.8 is needed by (installed) glibc-utils-2.3.2-95.30 If I do a 'ls -lp /usr/lib/libgd*', I get: -rw-r--r-- 1 root root 212978 Jun 17 2003 /usr/lib/libgd.a lrwxrwxrwx 1 root root 14 Jan 7 13:23 /usr/lib/libgd.so -> libgd.so.1.8.4 lrwxrwxrwx 1 root root 14 Jan 7 13:23 /usr/lib/libgd.so.1 -> libgd.so.1.8.4 lrwxrwxrwx 1 root root 14 Jan 7 13:23 /usr/lib/libgd.so.1.8 -> libgd.so.1.8.4 -rwxr-xr-x 1 root root 183332 Jun 17 2003 /usr/lib/libgd.so.1.8.4 If I do a 'rpm -qpl gd-2.0.33-1.i386.rpm', I get: /usr/lib/libgd.so.1 /usr/lib/libgd.so.1.8 /usr/lib/libgd.so.2 /usr/lib/libgd.so.2.0.0 /usr/share/doc/gd-2.0.33 /usr/share/doc/gd-2.0.33/README-JPEG.TXT /usr/share/doc/gd-2.0.33/README.TXT /usr/share/doc/gd-2.0.33/entities.html /usr/share/doc/gd-2.0.33/index.html Here is the spec file I'm using (minus some unnecessary stuff). Any ideas? Summary: A graphics library for quick creation of PNG, GIF or JPEG images. Name: gd Version: 2.0.33 Release: 1 URL: http://www.boutell.com/gd/ Source0: http://www.boutell.com/gd/http/gd-%{version}.tar.gz License: BSD-style Group: System Environment/Libraries BuildRoot: %{_tmppath}/%{name}-root Prereq: /sbin/ldconfig BuildPrereq: freetype-devel, libjpeg-devel, libpng-devel, zlib-devel %define shlibver %(echo %{version} | cut -f-2 -d.) %prep %setup -q %build ./configure --prefix=$RPM_BUILD_ROOT/usr make %install [ "$RPM_BUILD_ROOT" != "/" ] && rm -fr $RPM_BUILD_ROOT make install (cd $RPM_BUILD_ROOT/usr/lib && ln -s libgd.so.2.0.0 libgd.so.1) (cd $RPM_BUILD_ROOT/usr/lib && ln -s libgd.so.2.0.0 libgd.so.1.8) rm -rf $RPM_BUILD_ROOT%{_libdir}/libgd.la %clean [ "$RPM_BUILD_ROOT" != "/" ] && rm -fr $RPM_BUILD_ROOT %post -p /sbin/ldconfig %postun -p /sbin/ldconfig %files %defattr(-,root,root) %doc README.TXT README-JPEG.TXT index.html entities.html %{_libdir}/*.so.* %files progs %defattr(-,root,root) %{_bindir}/* %files devel %defattr(-,root,root) %{_includedir}/* %{_libdir}/*.so %{_libdir}/*.a ----- Ryan Golhar Computational Biologist The Informatics Institute at The University of Medicine & Dentistry of NJ Phone: 973-972-5034 Fax: 973-972-7412 Email: golharam@umdnj.edu From golharam at umdnj.edu Fri Jan 7 16:39:56 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Fri Jan 7 16:31:23 2005 Subject: [Bioperl-l] (no subject) Message-ID: <005e01c4f501$6f0be740$a6028a0a@GOLHARMOBILE1> Hi all, I have a bunch of protein ids, and I'm attempting to obtain the cds that corresponds to the id. I can locate the sequence feature in the genbank file, however, when I make a call to $feature->spliced_seq, I can an error 'cannot get remote location for ... without a valid Bio::DB::RandomAccessI database handle'. Its line 546 of Bio::SeqFeatureI.pm. I suspect the problem is because in the genbank file, this particular entry reads: Join(AF072550.1:61..103,AF072550.1:5359..5524) I'm wondering if the accession number is throwing the parser off. Does anyone have any experience with this? Ryan From jason.stajich at duke.edu Fri Jan 7 16:45:24 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Jan 7 16:41:55 2005 Subject: [Bioperl-l] getting remote sequence features with spliced_seq In-Reply-To: <005e01c4f501$6f0be740$a6028a0a@GOLHARMOBILE1> References: <005e01c4f501$6f0be740$a6028a0a@GOLHARMOBILE1> Message-ID: <7008278A-60F5-11D9-ACAB-000393C44276@duke.edu> Pass in Bio::DB::GenBank handle to achieve this magic. my $dbh = Bio::DB::GenBank->new(); $feature->spliced_seq($dbh); From Bio::SeqFeatureI spliced_seq Title : spliced_seq Usage : $seq = $feature->spliced_seq() $seq = $feature_with_remote_locations->spliced_seq($db_for_seqs) Function: Provides a sequence of the feature which is the most semantically "relevant" feature for this sequence. A default implementation is provided which for simple cases returns just the sequence, but for split cases, loops over the split location to return the sequence. In the case of split locations with remote locations, eg join(AB000123:5567-5589,80..1144) in the case when a database object is passed in, it will attempt to retrieve the sequence from the database object, and "Do the right thing", however if no database object is provided, it will generate the correct number of N's (DNA) or X's (protein, though this is unlikely). This function is deliberately "magical" attempting to second guess what a user wants as "the" sequence for this feature. Implementing classes are free to override this method with their own magic if they have a better idea what the user wants. Args : [optional] A L compliant object if one needs to retrieve remote seqs. [optional] boolean if the locations should not be sorted by start location. Returns : A L object On Jan 7, 2005, at 4:39 PM, Ryan Golhar wrote: > Hi all, > > I have a bunch of protein ids, and I'm attempting to obtain the cds > that > corresponds to the id. I can locate the sequence feature in the > genbank > file, however, when I make a call to $feature->spliced_seq, I can an > error 'cannot get remote location for ... without a valid > Bio::DB::RandomAccessI database handle'. Its line 546 of > Bio::SeqFeatureI.pm. > > I suspect the problem is because in the genbank file, this particular > entry reads: > > Join(AF072550.1:61..103,AF072550.1:5359..5524) > > I'm wondering if the accession number is throwing the parser off. Does > anyone have any experience with this? > > Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From tex at biocompute.net Fri Jan 7 03:23:47 2005 From: tex at biocompute.net (James Thompson) Date: Fri Jan 7 18:31:41 2005 Subject: [Bioperl-l] libgd In-Reply-To: <005d01c4f4f0$3349f5f0$a6028a0a@GOLHARMOBILE1> Message-ID: Ryan, If you don't need gcc-utils, you can try uninstalling that package and then instsalling your gd-devel packages. That would be a quick and easy fix, but hoping for that may be a bit unlikely. I can't imagine why utilities for a compiler would need libgd, but that's just me. Another option is to use the Bioperl RPM from http://biolinux.org/bioperl.html, IIRC it contains a 2.0.x version of the GD library in it. You may want to try this on a testing system before risking a production system, but for what it's worth I've successfully used the RPM on RedHat 9.0 and Fedora Core 2. Best of luck solving your problem. Cheers, James Thompson On Fri, 7 Jan 2005, Ryan Golhar wrote: > Hi all, > > Not sure where this should be posted, so forgive me if I'm posting it in > the wrong place. > > I'm trying to use Bioperl with RedHat Enterprise Linux v3. RedHat > provides gd v1.8, however bioperl requires > 2.something. > > So, I downloaded gd 2.0.33 and rebuilt it using a spec file I found from > verion 2.0.16 (I think). > > When I tried upgrading the package using: > [root@hydrogen i386]# rpm -Uvh gd-2.0.33-1.i386.rpm > gd-devel-2.0.33-1.i386.rpm > > I get the error: > error: Failed dependencies: > libgd.so.1.8 is needed by (installed) glibc-utils-2.3.2-95.30 > > If I do a 'ls -lp /usr/lib/libgd*', I get: > > -rw-r--r-- 1 root root 212978 Jun 17 2003 /usr/lib/libgd.a > lrwxrwxrwx 1 root root 14 Jan 7 13:23 > /usr/lib/libgd.so -> libgd.so.1.8.4 > lrwxrwxrwx 1 root root 14 Jan 7 13:23 > /usr/lib/libgd.so.1 -> libgd.so.1.8.4 > lrwxrwxrwx 1 root root 14 Jan 7 13:23 > /usr/lib/libgd.so.1.8 -> libgd.so.1.8.4 > -rwxr-xr-x 1 root root 183332 Jun 17 2003 > /usr/lib/libgd.so.1.8.4 > > > If I do a 'rpm -qpl gd-2.0.33-1.i386.rpm', I get: > > /usr/lib/libgd.so.1 > /usr/lib/libgd.so.1.8 > /usr/lib/libgd.so.2 > /usr/lib/libgd.so.2.0.0 > /usr/share/doc/gd-2.0.33 > /usr/share/doc/gd-2.0.33/README-JPEG.TXT > /usr/share/doc/gd-2.0.33/README.TXT > /usr/share/doc/gd-2.0.33/entities.html > /usr/share/doc/gd-2.0.33/index.html > > Here is the spec file I'm using (minus some unnecessary stuff). Any > ideas? > > Summary: A graphics library for quick creation of PNG, GIF or JPEG > images. > Name: gd > Version: 2.0.33 > Release: 1 > URL: http://www.boutell.com/gd/ > Source0: http://www.boutell.com/gd/http/gd-%{version}.tar.gz > License: BSD-style > Group: System Environment/Libraries > BuildRoot: %{_tmppath}/%{name}-root > Prereq: /sbin/ldconfig > BuildPrereq: freetype-devel, libjpeg-devel, libpng-devel, zlib-devel > %define shlibver %(echo %{version} | cut -f-2 -d.) > > %prep > %setup -q > > %build > ./configure --prefix=$RPM_BUILD_ROOT/usr > make > > %install > [ "$RPM_BUILD_ROOT" != "/" ] && rm -fr $RPM_BUILD_ROOT > make install > (cd $RPM_BUILD_ROOT/usr/lib && ln -s libgd.so.2.0.0 libgd.so.1) > (cd $RPM_BUILD_ROOT/usr/lib && ln -s libgd.so.2.0.0 libgd.so.1.8) > rm -rf $RPM_BUILD_ROOT%{_libdir}/libgd.la > > %clean > [ "$RPM_BUILD_ROOT" != "/" ] && rm -fr $RPM_BUILD_ROOT > > %post -p /sbin/ldconfig > > %postun -p /sbin/ldconfig > > %files > %defattr(-,root,root) > %doc README.TXT README-JPEG.TXT index.html entities.html > %{_libdir}/*.so.* > > %files progs > %defattr(-,root,root) > %{_bindir}/* > > %files devel > %defattr(-,root,root) > %{_includedir}/* > %{_libdir}/*.so > %{_libdir}/*.a > > > ----- > Ryan Golhar > Computational Biologist > The Informatics Institute at > The University of Medicine & Dentistry of NJ > > Phone: 973-972-5034 > Fax: 973-972-7412 > Email: golharam@umdnj.edu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Sat Jan 8 02:37:36 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Jan 8 02:34:12 2005 Subject: [Bioperl-l] RE: SeqIO fails on masked sequences In-Reply-To: Message-ID: <2AA3B49A-6148-11D9-947F-000A959EB4C4@gmx.net> You should not require by default that all sequences in one file be of the same type (alphabet). We never have required this, nor documented that it is a (not enforced) requirement, and so there may be people out there relying on this 'feature'. -hilmar On Friday, January 7, 2005, at 03:39 AM, Nathan Haigh wrote: > There appears to be an anomaly with Bio::Seq::fasta. If the SeqIO > object's alphabet is set, next_seq() results in this being undef > and then proceeds to guess the alphabet again, therefore this like the > following do not work: > > my $seq_in = Bio::SeqIO->new(-format=>$format, -fh => \*DATA); > > $seq_in->alphabet('protein'); > > Should setting the SeqIO object's alphabet be honoured even if it is > set to the wrong type or the sequences are not of that > alphabet? > > > > I have a bug fix, that allows you to set the alphabet through the > SeqIO object, but it doesn't do any sort of checking to see if all > the seqs in the object are of the correct type. Essentially, the > alphabet is set in one of the following ways: > > 1) if the SeqIO object is set using e.g. $seq_in->alphabet('dna'); all > the seqs that belong to the $seq_in object obtain their > alphabet from the SeqIO object, dna in this case, irrespective of > whether or not it is actually protein. > > 2) If alphabet has not been set in this way, the first sequence is > used to guess the alphabet of the SeqIO object, from which all > the sequences obtain their alphabet. > > > > Possible limitations: > > 1) all seqs in the SeqIO object can only be of the same type - no > testing done to see if this is not the case. > > > > Does this sound ok and reasonable? > > Nathan > > > > -----Original Message----- > From: Brian Osborne [mailto:brian_osborne@cognia.com] > Sent: 06 January 2005 12:25 > To: nathanhaigh@ukonline.co.uk > Subject: RE: SeqIO fails on masked sequences > > > > Nathan, > > > > The idea is that a sequence with a high proportion of X is more likely > to be DNA than protein. The examples I had in mind are > unfinished genomic sequence, and there are countless entries in > Genbank/EMBL like this. So, someone wrote in and said that their > genomic sequence was being characterized as protein since the fraction > [gatc] was less than 85%, it was mostly X. By contrast, there > are no protein sequences with X in them in these public databases, if > I'm not mistaken. So I maintain that in the world of public > databases this is the way to go. > > > > Now if you venture into the world of sequence analysis it's going to > be a different story, since you'll likely mask protein with X, > not N, obviously. May I ask, if this person knows his/her sequence is > protein then why doesn't s/he set its alphabet to "protein"? > Or why don't they mask with A or Z or O or something? > > > > They'll be problems either way. What is one's reference? Public > sequence or the less well-defined set of possible sequences? > > > > Brian O. > > -----Original Message----- > From: Nathan Haigh [mailto:nathanhaigh@ukonline.co.uk] > Sent: Wednesday, January 05, 2005 7:38 PM > To: 'Brian Osborne' > Subject: FW: SeqIO fails on masked sequences > > You committed a change to Bio::PrimarySeq where 'X' was added to the > class of characters that are stripped out of sequences in the > _guess_alphabet subroutine. Do you know why sequences containing X > were causing a problem, and why X was added to the class of > chars? > > > > It's causing a problem for someone who has a sequence that containes > all masked chars (i.e. all X's), which should still be > "guessable" as protein. > > > > Cheers > > Nathan > > --- > avast! Antivirus: Outbound message clean. > Virus Database (VPS): 0501-0, 04/01/2005 > Tested on: 06/01/2005 00:36:20 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > > > --- > avast! Antivirus: Inbound message clean. > Virus Database (VPS): 0501-0, 04/01/2005 > Tested on: 07/01/2005 00:35:30 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > > > > --- > avast! Antivirus: Outbound message clean. > Virus Database (VPS): 0501-0, 04/01/2005 > Tested on: 07/01/2005 11:39:14 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Sat Jan 8 02:45:28 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Jan 8 02:41:53 2005 Subject: [Bioperl-l] Entrez Gene and bioperl-db In-Reply-To: Message-ID: <43B77B65-6149-11D9-947F-000A959EB4C4@gmx.net> On Thursday, January 6, 2005, at 06:33 PM, Stefan A Kirov wrote: > Hilmar, > Getting back to your post, I have some concern about automatic > parsing of multiple files (if I got this right...). Say if one > downloads > the whole Entrez Gene stuff and all is OK I don't see why this can't be > done. But if something goes wrong (and occasionally it will), it will > be > really hard for the user to understand he misses parts of the data. By going wrong you mean partial downloads resulting from interrupted file transfer sessions? If so, then this is no different from parsing other (e.g. Genbank) downloaded and therefore possibly truncated files. If by wrong you mean certain files are absent, then yes, I mean that there presence is optional, and certainly the parser could warn, unless warnings are suppressed. > [...] > Another issue that comes to mind is the approach of a stream is fine > for > people with the whole DB on their minds. But of you need particular > record, I guess you you could index the files, but this totally > different > game. right. You'd write a Bio::Index:: module for this. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Sat Jan 8 13:09:56 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Jan 8 13:06:58 2005 Subject: [Bioperl-l] Entrez Gene and bioperl-db In-Reply-To: <1105080663.3142.16.camel@localhost.localdomain> Message-ID: <80BA8B98-61A0-11D9-947F-000A959EB4C4@gmx.net> On Thursday, January 6, 2005, at 10:51 PM, Peter Robinson wrote: > On the other hand, parsing multiple Entrez Gene files at once > in order to synthesize various forms of infomration about an Entrez > Gene > id seemed to depart from the style of the rest of Bio::SeqIO code. I don't think so at all. It only appears so because most other formats happen to come in a single file. The OntologyIO GO parser e.g. takes any number of files. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From claratsm at hkusua.hku.hk Sun Jan 9 11:02:36 2005 From: claratsm at hkusua.hku.hk (claratsm@hkusua.hku.hk) Date: Sun Jan 9 14:50:34 2005 Subject: [Bioperl-l] Problem about bioperl SeqIO Message-ID: <1105286556.41e1559cb72f3@imp.webmail.hku.hk> Hi, I am a new user of bioperl and I have encountered some problems when using it for programming. As i try to deal with a very large file, as large as a large chromosome contigs data, so I use my $seqio_mfa = Bio::SeqIO->new('-file' => $seq_file, '-format' => 'largefasta'); However, whenever I get a subseq, it generates one temp file. And the number of temp files generated is too large (even though most are empty) such that the program stop with exception...something like cannot create temp file any more. Am I able to delete the temp file during the running of my program? How can i get the temp file name of each new sequence generated? If i can get the temp file name, is it safe to delete the file using rmdir function? Can anybody help me!! Thank you From Marc.Logghe at devgen.com Sun Jan 9 15:20:18 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Sun Jan 9 15:22:17 2005 Subject: [Bioperl-l] Problem about bioperl SeqIO Message-ID: Hi, Not sure, but the temporary files should be deleted as soon as the objects are destroyed. Maybe you keep a lot of Bio::Seq::LargePrimarySeq references in scope so that for all of them the temp files are kept open ? You *can* get to the filename but it is not intended to do so because it is a private method (_filename). HTH, Marc -----Oorspronkelijk bericht----- Van: bioperl-l-bounces@portal.open-bio.org namens claratsm@hkusua.hku.hk Verzonden: zo 9-1-2005 17:02 Aan: bioperl-l@portal.open-bio.org Onderwerp: [Bioperl-l] Problem about bioperl SeqIO Hi, I am a new user of bioperl and I have encountered some problems when using it for programming. As i try to deal with a very large file, as large as a large chromosome contigs data, so I use my $seqio_mfa = Bio::SeqIO->new('-file' => $seq_file, '-format' => 'largefasta'); However, whenever I get a subseq, it generates one temp file. And the number of temp files generated is too large (even though most are empty) such that the program stop with exception...something like cannot create temp file any more. Am I able to delete the temp file during the running of my program? How can i get the temp file name of each new sequence generated? If i can get the temp file name, is it safe to delete the file using rmdir function? Can anybody help me!! Thank you _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From Peter.Robinson at t-online.de Sun Jan 9 17:01:02 2005 From: Peter.Robinson at t-online.de (Peter Robinson) Date: Sun Jan 9 16:57:08 2005 Subject: [Bioperl-l] Entrez Gene and bioperl-db In-Reply-To: <80BA8B98-61A0-11D9-947F-000A959EB4C4@gmx.net> References: <80BA8B98-61A0-11D9-947F-000A959EB4C4@gmx.net> Message-ID: <1105308062.3757.14.camel@localhost.localdomain> I meant that there is information about a single gene spread across various Entrez Gene files, so if one were to parse them all at once, one would have to keep a lot of info in memory, especially since the order of the entries is not necessarily the same across files; for instance, gene2unigene is ordered according to the UniGene identifiers, and gene2accession is not; if one wanted to add the unigene info to all entries in one fell swoop, this would seem to require keeping entries either in memory or in some indexed file. In contrast, the ontology files you mention are more independent of one another, so there is no particular difficulty in combining flat files for the three subontologies. I am starting to think that it might make the most sense to concentrate on the ASN.1 files. It think it should be reasonably simple to do this with a kind of recursive descent strategy, either using some CPAN modules or perhaps better self-rolled. At the moment I have not seen any modules that appear to be great candidates for lexing the ASN.1 text (ideas anyone?). -peter On Sat, 2005-01-08 at 19:09, Hilmar Lapp wrote: > On Thursday, January 6, 2005, at 10:51 PM, Peter Robinson wrote: > > > On the other hand, parsing multiple Entrez Gene files at once > > in order to synthesize various forms of infomration about an Entrez > > Gene > > id seemed to depart from the style of the rest of Bio::SeqIO code. > > I don't think so at all. It only appears so because most other formats > happen to come in a single file. The OntologyIO GO parser e.g. takes > any number of files. > > -hilmar -- Peter N. Robinson peter.robinson@t-online.de peter.robinson@charite.de http://www.charite.de/ch/medgen/robinson/ From wes.barris at csiro.au Sun Jan 9 18:42:33 2005 From: wes.barris at csiro.au (Wes Barris) Date: Sun Jan 9 18:39:03 2005 Subject: [Bioperl-l] RE: SeqIO fails on masked sequences In-Reply-To: <2AA3B49A-6148-11D9-947F-000A959EB4C4@gmx.net> References: <2AA3B49A-6148-11D9-947F-000A959EB4C4@gmx.net> Message-ID: <41E1C169.4010302@csiro.au> Hilmar Lapp wrote: > You should not require by default that all sequences in one file be of > the same type (alphabet). We never have required this, nor documented > that it is a (not enforced) requirement, and so there may be people out > there relying on this 'feature'. Mixing both DNA and protein sequences in one file and then attempting to process it seems like kind of a bizarre thing to want to do. If the alphabet is explicitly specified, isn't there a way to make that take precedence? > > -hilmar > > On Friday, January 7, 2005, at 03:39 AM, Nathan Haigh wrote: > >> There appears to be an anomaly with Bio::Seq::fasta. If the SeqIO >> object's alphabet is set, next_seq() results in this being undef >> and then proceeds to guess the alphabet again, therefore this like the >> following do not work: >> >> my $seq_in = Bio::SeqIO->new(-format=>$format, -fh => \*DATA); >> >> $seq_in->alphabet('protein'); >> >> Should setting the SeqIO object's alphabet be honoured even if it is >> set to the wrong type or the sequences are not of that >> alphabet? >> >> >> >> I have a bug fix, that allows you to set the alphabet through the >> SeqIO object, but it doesn't do any sort of checking to see if all >> the seqs in the object are of the correct type. Essentially, the >> alphabet is set in one of the following ways: >> >> 1) if the SeqIO object is set using e.g. $seq_in->alphabet('dna'); all >> the seqs that belong to the $seq_in object obtain their >> alphabet from the SeqIO object, dna in this case, irrespective of >> whether or not it is actually protein. >> >> 2) If alphabet has not been set in this way, the first sequence is >> used to guess the alphabet of the SeqIO object, from which all >> the sequences obtain their alphabet. >> >> >> >> Possible limitations: >> >> 1) all seqs in the SeqIO object can only be of the same type - no >> testing done to see if this is not the case. >> >> >> >> Does this sound ok and reasonable? >> >> Nathan >> >> >> >> -----Original Message----- >> From: Brian Osborne [mailto:brian_osborne@cognia.com] >> Sent: 06 January 2005 12:25 >> To: nathanhaigh@ukonline.co.uk >> Subject: RE: SeqIO fails on masked sequences >> >> >> >> Nathan, >> >> >> >> The idea is that a sequence with a high proportion of X is more likely >> to be DNA than protein. The examples I had in mind are >> unfinished genomic sequence, and there are countless entries in >> Genbank/EMBL like this. So, someone wrote in and said that their >> genomic sequence was being characterized as protein since the fraction >> [gatc] was less than 85%, it was mostly X. By contrast, there >> are no protein sequences with X in them in these public databases, if >> I'm not mistaken. So I maintain that in the world of public >> databases this is the way to go. >> >> >> >> Now if you venture into the world of sequence analysis it's going to >> be a different story, since you'll likely mask protein with X, >> not N, obviously. May I ask, if this person knows his/her sequence is >> protein then why doesn't s/he set its alphabet to "protein"? >> Or why don't they mask with A or Z or O or something? >> >> >> >> They'll be problems either way. What is one's reference? Public >> sequence or the less well-defined set of possible sequences? >> >> >> >> Brian O. >> >> -----Original Message----- >> From: Nathan Haigh [mailto:nathanhaigh@ukonline.co.uk] >> Sent: Wednesday, January 05, 2005 7:38 PM >> To: 'Brian Osborne' >> Subject: FW: SeqIO fails on masked sequences >> >> You committed a change to Bio::PrimarySeq where 'X' was added to the >> class of characters that are stripped out of sequences in the >> _guess_alphabet subroutine. Do you know why sequences containing X >> were causing a problem, and why X was added to the class of >> chars? >> >> >> >> It's causing a problem for someone who has a sequence that containes >> all masked chars (i.e. all X's), which should still be >> "guessable" as protein. >> >> >> >> Cheers >> >> Nathan >> >> --- >> avast! Antivirus: Outbound message clean. >> Virus Database (VPS): 0501-0, 04/01/2005 >> Tested on: 06/01/2005 00:36:20 >> avast! is copyright (c) 2000-2003 ALWIL Software. >> http://www.avast.com >> >> >> >> --- >> avast! Antivirus: Inbound message clean. >> Virus Database (VPS): 0501-0, 04/01/2005 >> Tested on: 07/01/2005 00:35:30 >> avast! is copyright (c) 2000-2003 ALWIL Software. >> http://www.avast.com >> >> >> >> >> --- >> avast! Antivirus: Outbound message clean. >> Virus Database (VPS): 0501-0, 04/01/2005 >> Tested on: 07/01/2005 11:39:14 >> avast! is copyright (c) 2000-2003 ALWIL Software. >> http://www.avast.com >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> -- Wes Barris E-Mail: Wes.Barris@csiro.au From nathanhaigh at ukonline.co.uk Sun Jan 9 19:35:08 2005 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Sun Jan 9 19:31:55 2005 Subject: [Bioperl-l] RE: SeqIO fails on masked sequences In-Reply-To: <41E1C169.4010302@csiro.au> Message-ID: > -----Original Message----- > From: Wes Barris [mailto:wes.barris@csiro.au] > Sent: 09 January 2005 23:43 > To: Hilmar Lapp > Cc: nathanhaigh@ukonline.co.uk; 'Bioperl list'; 'Brian Osborne' > Subject: Re: [Bioperl-l] RE: SeqIO fails on masked sequences > > Hilmar Lapp wrote: > > You should not require by default that all sequences in one file be of > > the same type (alphabet). We never have required this, nor documented > > that it is a (not enforced) requirement, and so there may be people out > > there relying on this 'feature'. > > Mixing both DNA and protein sequences in one file and then attempting > to process it seems like kind of a bizarre thing to want to do. If > the alphabet is explicitly specified, isn't there a way to make that > take precedence? Why are you then able to set the alphabet of a SeqIO object if whenever you call next_seq() it trys to guess the alphabet of the sequence anyway? It seems more logical to me, that the user can specify the alphabet without worrying about bioperl guessing it, and getting it wrong, or not setting it at all. > > > > > -hilmar > > > > On Friday, January 7, 2005, at 03:39 AM, Nathan Haigh wrote: > > > >> There appears to be an anomaly with Bio::Seq::fasta. If the SeqIO > >> object's alphabet is set, next_seq() results in this being undef > >> and then proceeds to guess the alphabet again, therefore this like the > >> following do not work: > >> > >> my $seq_in = Bio::SeqIO->new(-format=>$format, -fh => \*DATA); > >> > >> $seq_in->alphabet('protein'); > >> > >> Should setting the SeqIO object's alphabet be honoured even if it is > >> set to the wrong type or the sequences are not of that > >> alphabet? > >> > >> > >> > >> I have a bug fix, that allows you to set the alphabet through the > >> SeqIO object, but it doesn't do any sort of checking to see if all > >> the seqs in the object are of the correct type. Essentially, the > >> alphabet is set in one of the following ways: > >> > >> 1) if the SeqIO object is set using e.g. $seq_in->alphabet('dna'); all > >> the seqs that belong to the $seq_in object obtain their > >> alphabet from the SeqIO object, dna in this case, irrespective of > >> whether or not it is actually protein. > >> > >> 2) If alphabet has not been set in this way, the first sequence is > >> used to guess the alphabet of the SeqIO object, from which all > >> the sequences obtain their alphabet. > >> > >> > >> > >> Possible limitations: > >> > >> 1) all seqs in the SeqIO object can only be of the same type - no > >> testing done to see if this is not the case. > >> > >> > >> > >> Does this sound ok and reasonable? > >> > >> Nathan > >> > >> > >> > >> -----Original Message----- > >> From: Brian Osborne [mailto:brian_osborne@cognia.com] > >> Sent: 06 January 2005 12:25 > >> To: nathanhaigh@ukonline.co.uk > >> Subject: RE: SeqIO fails on masked sequences > >> > >> > >> > >> Nathan, > >> > >> > >> > >> The idea is that a sequence with a high proportion of X is more likely > >> to be DNA than protein. The examples I had in mind are > >> unfinished genomic sequence, and there are countless entries in > >> Genbank/EMBL like this. So, someone wrote in and said that their > >> genomic sequence was being characterized as protein since the fraction > >> [gatc] was less than 85%, it was mostly X. By contrast, there > >> are no protein sequences with X in them in these public databases, if > >> I'm not mistaken. So I maintain that in the world of public > >> databases this is the way to go. > >> > >> > >> > >> Now if you venture into the world of sequence analysis it's going to > >> be a different story, since you'll likely mask protein with X, > >> not N, obviously. May I ask, if this person knows his/her sequence is > >> protein then why doesn't s/he set its alphabet to "protein"? > >> Or why don't they mask with A or Z or O or something? > >> > >> > >> > >> They'll be problems either way. What is one's reference? Public > >> sequence or the less well-defined set of possible sequences? > >> > >> > >> > >> Brian O. > >> > >> -----Original Message----- > >> From: Nathan Haigh [mailto:nathanhaigh@ukonline.co.uk] > >> Sent: Wednesday, January 05, 2005 7:38 PM > >> To: 'Brian Osborne' > >> Subject: FW: SeqIO fails on masked sequences > >> > >> You committed a change to Bio::PrimarySeq where 'X' was added to the > >> class of characters that are stripped out of sequences in the > >> _guess_alphabet subroutine. Do you know why sequences containing X > >> were causing a problem, and why X was added to the class of > >> chars? > >> > >> > >> > >> It's causing a problem for someone who has a sequence that containes > >> all masked chars (i.e. all X's), which should still be > >> "guessable" as protein. > >> > >> > >> > >> Cheers > >> > >> Nathan > >> > >> --- > >> avast! Antivirus: Outbound message clean. > >> Virus Database (VPS): 0501-0, 04/01/2005 > >> Tested on: 06/01/2005 00:36:20 > >> avast! is copyright (c) 2000-2003 ALWIL Software. > >> http://www.avast.com > >> > >> > >> > >> --- > >> avast! Antivirus: Inbound message clean. > >> Virus Database (VPS): 0501-0, 04/01/2005 > >> Tested on: 07/01/2005 00:35:30 > >> avast! is copyright (c) 2000-2003 ALWIL Software. > >> http://www.avast.com > >> > >> > >> > >> > >> --- > >> avast! Antivirus: Outbound message clean. > >> Virus Database (VPS): 0501-0, 04/01/2005 > >> Tested on: 07/01/2005 11:39:14 > >> avast! is copyright (c) 2000-2003 ALWIL Software. > >> http://www.avast.com > >> > >> > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l@portal.open-bio.org > >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > -- > Wes Barris > E-Mail: Wes.Barris@csiro.au > --- > avast! Antivirus: Inbound message clean. > Virus Database (VPS): 0501-1, 07/01/2005 > Tested on: 10/01/2005 00:20:13 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0501-1, 07/01/2005 Tested on: 10/01/2005 00:30:15 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com From wes.barris at csiro.au Sun Jan 9 20:05:13 2005 From: wes.barris at csiro.au (Wes Barris) Date: Sun Jan 9 20:03:10 2005 Subject: [Bioperl-l] RE: SeqIO fails on masked sequences In-Reply-To: References: Message-ID: <41E1D4C9.1020806@csiro.au> Nathan Haigh wrote: >>-----Original Message----- >>From: Wes Barris [mailto:wes.barris@csiro.au] >>Sent: 09 January 2005 23:43 >>To: Hilmar Lapp >>Cc: nathanhaigh@ukonline.co.uk; 'Bioperl list'; 'Brian Osborne' >>Subject: Re: [Bioperl-l] RE: SeqIO fails on masked sequences >> >>Hilmar Lapp wrote: >> >>>You should not require by default that all sequences in one file be of >>>the same type (alphabet). We never have required this, nor documented >>>that it is a (not enforced) requirement, and so there may be people out >>>there relying on this 'feature'. >> >>Mixing both DNA and protein sequences in one file and then attempting >>to process it seems like kind of a bizarre thing to want to do. If >>the alphabet is explicitly specified, isn't there a way to make that >>take precedence? > > > Why are you then able to set the alphabet of a SeqIO object if whenever you call next_seq() it trys to guess the alphabet of the > sequence anyway? It seems more logical to me, that the user can specify the alphabet without worrying about bioperl guessing it, and > getting it wrong, or not setting it at all. I am guessing that you meant to direct this question to Hilmar because I agree with you. If one specifies the alphabet, bioperl should not subsequently try to guess it. > > >>> -hilmar >>> >>>On Friday, January 7, 2005, at 03:39 AM, Nathan Haigh wrote: >>> >>> >>>>There appears to be an anomaly with Bio::Seq::fasta. If the SeqIO >>>>object's alphabet is set, next_seq() results in this being undef >>>>and then proceeds to guess the alphabet again, therefore this like the >>>>following do not work: >>>> >>>>my $seq_in = Bio::SeqIO->new(-format=>$format, -fh => \*DATA); >>>> >>>>$seq_in->alphabet('protein'); >>>> >>>>Should setting the SeqIO object's alphabet be honoured even if it is >>>>set to the wrong type or the sequences are not of that >>>>alphabet? >>>> >>>> >>>> >>>>I have a bug fix, that allows you to set the alphabet through the >>>>SeqIO object, but it doesn't do any sort of checking to see if all >>>>the seqs in the object are of the correct type. Essentially, the >>>>alphabet is set in one of the following ways: >>>> >>>>1) if the SeqIO object is set using e.g. $seq_in->alphabet('dna'); all >>>>the seqs that belong to the $seq_in object obtain their >>>>alphabet from the SeqIO object, dna in this case, irrespective of >>>>whether or not it is actually protein. >>>> >>>>2) If alphabet has not been set in this way, the first sequence is >>>>used to guess the alphabet of the SeqIO object, from which all >>>>the sequences obtain their alphabet. >>>> >>>> >>>> >>>>Possible limitations: >>>> >>>>1) all seqs in the SeqIO object can only be of the same type - no >>>>testing done to see if this is not the case. >>>> >>>> >>>> >>>>Does this sound ok and reasonable? >>>> >>>>Nathan >>>> >>>> >>>> >>>>-----Original Message----- >>>>From: Brian Osborne [mailto:brian_osborne@cognia.com] >>>>Sent: 06 January 2005 12:25 >>>>To: nathanhaigh@ukonline.co.uk >>>>Subject: RE: SeqIO fails on masked sequences >>>> >>>> >>>> >>>>Nathan, >>>> >>>> >>>> >>>>The idea is that a sequence with a high proportion of X is more likely >>>>to be DNA than protein. The examples I had in mind are >>>>unfinished genomic sequence, and there are countless entries in >>>>Genbank/EMBL like this. So, someone wrote in and said that their >>>>genomic sequence was being characterized as protein since the fraction >>>>[gatc] was less than 85%, it was mostly X. By contrast, there >>>>are no protein sequences with X in them in these public databases, if >>>>I'm not mistaken. So I maintain that in the world of public >>>>databases this is the way to go. >>>> >>>> >>>> >>>>Now if you venture into the world of sequence analysis it's going to >>>>be a different story, since you'll likely mask protein with X, >>>>not N, obviously. May I ask, if this person knows his/her sequence is >>>>protein then why doesn't s/he set its alphabet to "protein"? >>>>Or why don't they mask with A or Z or O or something? >>>> >>>> >>>> >>>>They'll be problems either way. What is one's reference? Public >>>>sequence or the less well-defined set of possible sequences? >>>> >>>> >>>> >>>>Brian O. >>>> >>>>-----Original Message----- >>>>From: Nathan Haigh [mailto:nathanhaigh@ukonline.co.uk] >>>>Sent: Wednesday, January 05, 2005 7:38 PM >>>>To: 'Brian Osborne' >>>>Subject: FW: SeqIO fails on masked sequences >>>> >>>>You committed a change to Bio::PrimarySeq where 'X' was added to the >>>>class of characters that are stripped out of sequences in the >>>>_guess_alphabet subroutine. Do you know why sequences containing X >>>>were causing a problem, and why X was added to the class of >>>>chars? >>>> >>>> >>>> >>>>It's causing a problem for someone who has a sequence that containes >>>>all masked chars (i.e. all X's), which should still be >>>>"guessable" as protein. >>>> >>>> >>>> >>>>Cheers >>>> >>>>Nathan >>>> >>>>--- >>>>avast! Antivirus: Outbound message clean. >>>>Virus Database (VPS): 0501-0, 04/01/2005 >>>>Tested on: 06/01/2005 00:36:20 >>>>avast! is copyright (c) 2000-2003 ALWIL Software. >>>>http://www.avast.com >>>> >>>> >>>> >>>>--- >>>>avast! Antivirus: Inbound message clean. >>>>Virus Database (VPS): 0501-0, 04/01/2005 >>>>Tested on: 07/01/2005 00:35:30 >>>>avast! is copyright (c) 2000-2003 ALWIL Software. >>>>http://www.avast.com >>>> >>>> >>>> >>>> >>>>--- >>>>avast! Antivirus: Outbound message clean. >>>>Virus Database (VPS): 0501-0, 04/01/2005 >>>>Tested on: 07/01/2005 11:39:14 >>>>avast! is copyright (c) 2000-2003 ALWIL Software. >>>>http://www.avast.com >>>> >>>> >>>> >>>> >>>>_______________________________________________ >>>>Bioperl-l mailing list >>>>Bioperl-l@portal.open-bio.org >>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >> >> >>-- >>Wes Barris >>E-Mail: Wes.Barris@csiro.au >>--- >>avast! Antivirus: Inbound message clean. >>Virus Database (VPS): 0501-1, 07/01/2005 >>Tested on: 10/01/2005 00:20:13 >>avast! is copyright (c) 2000-2003 ALWIL Software. >>http://www.avast.com >> >> > > > --- > avast! Antivirus: Outbound message clean. > Virus Database (VPS): 0501-1, 07/01/2005 > Tested on: 10/01/2005 00:30:15 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > > -- Wes Barris E-Mail: Wes.Barris@csiro.au From hlapp at gmx.net Mon Jan 10 03:13:35 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon Jan 10 03:11:55 2005 Subject: [Bioperl-l] RE: SeqIO fails on masked sequences In-Reply-To: <41E1D4C9.1020806@csiro.au> Message-ID: <866FB518-62DF-11D9-911B-000A959EB4C4@gmx.net> On Sunday, January 9, 2005, at 05:05 PM, Wes Barris wrote: >>> Hilmar Lapp wrote: >>> >>>> You should not require by default that all sequences in one file be >>>> of >>>> the same type (alphabet). We never have required this, nor >>>> documented >>>> that it is a (not enforced) requirement, and so there may be people >>>> out >>>> there relying on this 'feature'. >>> >>> Mixing both DNA and protein sequences in one file and then attempting >>> to process it seems like kind of a bizarre thing to want to do. If >>> the alphabet is explicitly specified, isn't there a way to make that >>> take precedence? >> Why are you then able to set the alphabet of a SeqIO object if >> whenever you call next_seq() it trys to guess the alphabet of the >> sequence anyway? It seems more logical to me, that the user can >> specify the alphabet without worrying about bioperl guessing it, and >> getting it wrong, or not setting it at all. > > I am guessing that you meant to direct this question to Hilmar because > I agree with you. If one specifies the alphabet, bioperl should not > subsequently try to guess it. Right, that's what I agree with too. If an alphabet set for the stream gets reset to undef after every sequence then I'd call that a bug. My point was, if the user doesn't specify the alphabet, then don't make assumptions that you don't absolutely have to make. You had suggested to guess the alphabet from the first sequence in this case and then assume every subsequent sequence in that stream will have that same alphabet. That's what I think is not a good idea and not necessary either. If the user doesn't preset the alphabet, just keep on guessing for every new sequence. Mixing alphabets is indeed bizarre but people who do bizarre things are everywhere. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From nathanhaigh at ukonline.co.uk Mon Jan 10 03:50:02 2005 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Mon Jan 10 03:46:40 2005 Subject: [Bioperl-l] RE: SeqIO fails on masked sequences In-Reply-To: <866FB518-62DF-11D9-911B-000A959EB4C4@gmx.net> Message-ID: > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp@gmx.net] > Sent: 10 January 2005 08:14 > To: Wes Barris > Cc: nathanhaigh@ukonline.co.uk; 'Bioperl list'; 'Brian Osborne' > Subject: Re: [Bioperl-l] RE: SeqIO fails on masked sequences > > > On Sunday, January 9, 2005, at 05:05 PM, Wes Barris wrote: > > >>> Hilmar Lapp wrote: > >>> > >>>> You should not require by default that all sequences in one file be > >>>> of > >>>> the same type (alphabet). We never have required this, nor > >>>> documented > >>>> that it is a (not enforced) requirement, and so there may be people > >>>> out > >>>> there relying on this 'feature'. > >>> > >>> Mixing both DNA and protein sequences in one file and then attempting > >>> to process it seems like kind of a bizarre thing to want to do. If > >>> the alphabet is explicitly specified, isn't there a way to make that > >>> take precedence? > >> Why are you then able to set the alphabet of a SeqIO object if > >> whenever you call next_seq() it trys to guess the alphabet of the > >> sequence anyway? It seems more logical to me, that the user can > >> specify the alphabet without worrying about bioperl guessing it, and > >> getting it wrong, or not setting it at all. > > > > I am guessing that you meant to direct this question to Hilmar because > > I agree with you. If one specifies the alphabet, bioperl should not > > subsequently try to guess it. > > Right, that's what I agree with too. If an alphabet set for the stream > gets reset to undef after every sequence then I'd call that a bug. > agreed :o) > My point was, if the user doesn't specify the alphabet, then don't make > assumptions that you don't absolutely have to make. You had suggested > to guess the alphabet from the first sequence in this case and then > assume every subsequent sequence in that stream will have that same > alphabet. That's what I think is not a good idea and not necessary > either. If the user doesn't preset the alphabet, just keep on guessing > for every new sequence. > Hmm, yes I think the former was what I had suggested, but soon realised this wasn't a good thing and forgot to correct myself later. I'll get this fix ready today hopefully. Nath > Mixing alphabets is indeed bizarre but people who do bizarre things are > everywhere. > > -hilmar > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > --- > avast! Antivirus: Inbound message clean. > Virus Database (VPS): 0501-1, 07/01/2005 > Tested on: 10/01/2005 08:36:48 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0501-1, 07/01/2005 Tested on: 10/01/2005 08:49:42 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com From taerwin at tpg.com.au Mon Jan 10 18:47:47 2005 From: taerwin at tpg.com.au (Tim Erwin) Date: Mon Jan 10 18:46:46 2005 Subject: [Bioperl-l] Storing Blast object in a local database Message-ID: <1105400867.4274.4.camel@bacp4> Hi all, Is it possible to store a blast object (Bio::Search::Result::BlastResult) in a mysql database? Any pointers would be appreciated. Regards, Tim From barry.moore at genetics.utah.edu Mon Jan 10 20:10:28 2005 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Mon Jan 10 20:06:57 2005 Subject: [Bioperl-l] BioDB.pm Message-ID: <41E32784.6060409@genetics.utah.edu> I've just installed bioperl 1.4 (bioperl-core, bioperl-run and bioperl-db) on a new system (Debian woody). I run a test script that works fine on my old system and get an error that BioDB.pm can't be found. Sure enough BioDB.pm isn't on my new system, but it is on the old (also bioperl 1.4 Debian woody). I look in cvs and BioDB.pm is there, but I look in the distribution downloaded from bioperl.org and it seems to be missing BioDB.pm and several other files? I can get the files from cvs, but is this an error in the distribution file? Barry -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT From smarkel at scitegic.com Mon Jan 10 21:05:04 2005 From: smarkel at scitegic.com (Scott Markel) Date: Mon Jan 10 21:01:59 2005 Subject: [Bioperl-l] RE: SeqIO fails on masked sequences In-Reply-To: <41E1C169.4010302@csiro.au> References: <2AA3B49A-6148-11D9-947F-000A959EB4C4@gmx.net> <41E1C169.4010302@csiro.au> Message-ID: <41E33450.5080809@scitegic.com> PDB distibutes a FASTA file of the sequences associated with the structures in the database. The FASTA file contains both nucleotides and proteins. See pdb_seqres.txt in ftp://ftp.rcsb.org/pub/pdb/derived_data/. Scott Wes Barris wrote: > Hilmar Lapp wrote: > >> You should not require by default that all sequences in one file be of >> the same type (alphabet). We never have required this, nor documented >> that it is a (not enforced) requirement, and so there may be people >> out there relying on this 'feature'. > > > Mixing both DNA and protein sequences in one file and then attempting > to process it seems like kind of a bizarre thing to want to do. If > the alphabet is explicitly specified, isn't there a way to make that > take precedence? > >> >> -hilmar -- Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel@scitegic.com SciTegic Inc. mobile: +1 858 205 3653 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 San Diego, CA 92123 fax: +1 858 279 8804 USA web: http://www.scitegic.com From hlapp at gnf.org Mon Jan 10 23:03:52 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Mon Jan 10 23:00:22 2005 Subject: [Bioperl-l] BioDB.pm In-Reply-To: <41E32784.6060409@genetics.utah.edu> References: <41E32784.6060409@genetics.utah.edu> Message-ID: Did the test script that you ran come with bioperl? Bio::DB::BioDB comes with bioperl-db, and is not needed for anything else. Also, bioperl-db is not included in the bioperl 1.4 distribution. If you want it, you do need to obtain from CVS at this point. Let me know if you have problems with that. Also, if you do want to use bioperl-db I do recommend you obtain bioperl 1.4 from the CVS branch as well, or otherwise wait for the 1.5 release. The 1.4.0 release has problems in the interpro and GO ontology parsers, and 1.4.1 was never released in anticipation of 1.5. -hilmar On Jan 10, 2005, at 5:10 PM, Barry Moore wrote: > I've just installed bioperl 1.4 (bioperl-core, bioperl-run and > bioperl-db) on a new system (Debian woody). I run a test script that > works fine on my old system and get an error that BioDB.pm can't be > found. Sure enough BioDB.pm isn't on my new system, but it is on the > old (also bioperl 1.4 Debian woody). I look in cvs and BioDB.pm is > there, but I look in the distribution downloaded from bioperl.org and > it seems to be missing BioDB.pm and several other files? I can get > the files from cvs, but is this an error in the distribution file? > > Barry > > -- > Barry Moore > Dept. of Human Genetics > University of Utah > Salt Lake City, UT > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From Marc.Logghe at devgen.com Tue Jan 11 04:18:17 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Tue Jan 11 04:18:12 2005 Subject: [Bioperl-l] Storing Blast object in a local database Message-ID: Hi, > Is it possible to store a blast object > (Bio::Search::Result::BlastResult) in a mysql database? > > Any pointers would be appreciated. I only know that the biosql schema contains a SIMILARITY table that should be suited to store similarity results. However: a) no fields are available to store the homology strings (query, hsp, consensus) and b) no API code is available (yet) to load the objects. Guess Hilmar can tell more about this. It is possible however to store blast results in a GFF or Chado database. Quite a while ago (before the time that the gbrowse plugin Aligner.pm existed) we turned blast results into GFF format. We used tags to store the homology strings. Of course, this also needed to make a customized plugin in order to dump the alignments afterwards. BTW, Bioperl contains a script to turn SearchIO results into GFF (bp_search2gff.pl) but needs some adaptations in case you also want to have the homology strings. Like I already mentioned, gbrowse actually stores alignments (bla(s)t results) in Chado and these can be dumped using the Aligner plugin. See for yourself at http://www.wormbase.org/db/seq/gbrowse/wormbase?name=I%3A12765180..12775179;source=wormbase;width=800;version=100;label=CG-OP-ESTB-ESTO and dump the alignments. I am not sure about how everything is stored in the database and how the alignments are regenerated. I asume both the hits and query sequences are in the database plus the search results as features. Based on the locations associated with the features and the sequences, the alignements are regenerated by the plugin. Of course all this runs in the framework of gbrowse and this is probably not what you need. BioSQL would be a better option in case you only want to store the results and you don't need a gbrowse environment. But then, you need to write the API ;-) Regards, Marc From nathanhaigh at ukonline.co.uk Tue Jan 11 04:35:46 2005 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Tue Jan 11 04:32:20 2005 Subject: [Bioperl-l] developer cvs login Message-ID: I have recently received a developer cvs login account, but I'm unsure how to login. I will mainly use a Windows box but I also use Linux. I have cvsnt installed under windows and have used it to checkout bioperl anonymously, but don't know how to login and commit to cvs, could one of the existing developers help me out? Thanks Nathan --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0502-0, 10/01/2005 Tested on: 11/01/2005 09:32:18 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com From Marc.Logghe at devgen.com Tue Jan 11 04:47:36 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Tue Jan 11 04:44:19 2005 Subject: [Bioperl-l] developer cvs login Message-ID: Hi Nathan, by coincidence I had to find the very same out for myself, a split second ago. The stuff I needed to know was here: http://bioperl.org/UserInfo/CVShelp.shtml HTH, Marc > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of > Nathan Haigh > Sent: Tuesday, January 11, 2005 10:36 AM > To: 'Bioperl list' > Subject: [Bioperl-l] developer cvs login > > > I have recently received a developer cvs login account, but > I'm unsure how to login. I will mainly use a Windows box but > I also use > Linux. I have cvsnt installed under windows and have used it > to checkout bioperl anonymously, but don't know how to login > and commit > to cvs, could one of the existing developers help me out? > > > > Thanks > > Nathan > > --- > avast! Antivirus: Outbound message clean. > Virus Database (VPS): 0502-0, 10/01/2005 > Tested on: 11/01/2005 09:32:18 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From nathanhaigh at ukonline.co.uk Tue Jan 11 06:54:07 2005 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Tue Jan 11 06:50:49 2005 Subject: [Bioperl-l] developer cvs login In-Reply-To: Message-ID: Hmm, no go as far as getting it to work from a windows box without cygwin. Does anyone know if/how to setup ssh for windows, should it be possible to get putty (or something else) as the ssh client? Thanks Nathan > -----Original Message----- > From: Marc Logghe [mailto:Marc.Logghe@devgen.com] > Sent: 11 January 2005 09:48 > To: nathanhaigh@ukonline.co.uk; Bioperl list > Subject: RE: [Bioperl-l] developer cvs login > > Hi Nathan, > by coincidence I had to find the very same out for myself, a split second ago. > The stuff I needed to know was here: > http://bioperl.org/UserInfo/CVShelp.shtml > > HTH, > Marc > > > -----Original Message----- > > From: bioperl-l-bounces@portal.open-bio.org > > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of > > Nathan Haigh > > Sent: Tuesday, January 11, 2005 10:36 AM > > To: 'Bioperl list' > > Subject: [Bioperl-l] developer cvs login > > > > > > I have recently received a developer cvs login account, but > > I'm unsure how to login. I will mainly use a Windows box but > > I also use > > Linux. I have cvsnt installed under windows and have used it > > to checkout bioperl anonymously, but don't know how to login > > and commit > > to cvs, could one of the existing developers help me out? > > > > > > > > Thanks > > > > Nathan > > > > --- > > avast! Antivirus: Outbound message clean. > > Virus Database (VPS): 0502-0, 10/01/2005 > > Tested on: 11/01/2005 09:32:18 > > avast! is copyright (c) 2000-2003 ALWIL Software. > > http://www.avast.com > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > --- > avast! Antivirus: Inbound message clean. > Virus Database (VPS): 0502-0, 10/01/2005 > Tested on: 11/01/2005 09:52:06 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0502-0, 10/01/2005 Tested on: 11/01/2005 11:53:53 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0502-0, 10/01/2005 Tested on: 11/01/2005 11:54:05 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com From sdavis2 at mail.nih.gov Tue Jan 11 06:22:48 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue Jan 11 07:30:26 2005 Subject: [Bioperl-l] Storing Blast object in a local database In-Reply-To: References: Message-ID: <1FB5849E-63C3-11D9-A2FE-000D933565E8@mail.nih.gov> I'm not positive how Wormbase does it, but in my version of Gbrowse, the sequences are stored and the aligner plugin aligns them (the ones in the current window), irrespective of the blat results, which are stored as any features are stored (something compatible with GFF). So, the realignment doesn't rely on the blat results. Sean On Jan 11, 2005, at 4:18 AM, Marc Logghe wrote: > Hi, > >> Is it possible to store a blast object >> (Bio::Search::Result::BlastResult) in a mysql database? >> >> Any pointers would be appreciated. > > I only know that the biosql schema contains a SIMILARITY table that > should be suited to store similarity results. However: > a) no fields are available to store the homology strings (query, hsp, > consensus) and b) no API code is available (yet) to load the objects. > Guess Hilmar can tell more about this. > It is possible however to store blast results in a GFF or Chado > database. > Quite a while ago (before the time that the gbrowse plugin Aligner.pm > existed) we turned blast results into GFF format. We used tags to > store the homology strings. Of course, this also needed to make a > customized plugin in order to dump the alignments afterwards. BTW, > Bioperl contains a script to turn SearchIO results into GFF > (bp_search2gff.pl) but needs some adaptations in case you also want to > have the homology strings. > Like I already mentioned, gbrowse actually stores alignments (bla(s)t > results) in Chado and these can be dumped using the Aligner plugin. > See for yourself at > http://www.wormbase.org/db/seq/gbrowse/wormbase? > name=I%3A12765180..12775179;source=wormbase;width=800;version=100; > label=CG-OP-ESTB-ESTO and dump the alignments. I am not sure about how > everything is stored in the database and how the alignments are > regenerated. I asume both the hits and query sequences are in the > database plus the search results as features. Based on the locations > associated with the features and the sequences, the alignements are > regenerated by the plugin. > Of course all this runs in the framework of gbrowse and this is > probably not what you need. > BioSQL would be a better option in case you only want to store the > results and you don't need a gbrowse environment. But then, you need > to write the API ;-) > > Regards, > Marc > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From Marc.Logghe at devgen.com Tue Jan 11 08:23:55 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Tue Jan 11 08:22:25 2005 Subject: [Bioperl-l] developer cvs login Message-ID: > -----Original Message----- > From: Nathan Haigh [mailto:nathanhaigh@ukonline.co.uk] > Sent: Tuesday, January 11, 2005 12:54 PM > To: Marc Logghe; 'Bioperl list' > Subject: RE: [Bioperl-l] developer cvs login > > > Hmm, no go as far as getting it to work from a windows box > without cygwin. Does anyone know if/how to setup ssh for > windows, should > it be possible to get putty (or something else) as the ssh client? Have you tried wincvs ? I only tested it with pserver connection, not with ssh, but I think it is supported. http://www.wincvs.org/ HTH, Marc From barry.moore at genetics.utah.edu Tue Jan 11 11:59:31 2005 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Tue Jan 11 11:55:55 2005 Subject: [Bioperl-l] developer cvs login In-Reply-To: References: Message-ID: <41E405F3.7010007@genetics.utah.edu> Nathan, Not sure if you are asking just if ssh works from windows or if ssh works to connect to bioperl cvs from windows. If you question is the first, then the answer is yes. You should be able to use putty, openSSH, or others. I use and like the free version from ssh.com. You can find it here: http://ftp.ssh.com/pub/ssh/SSHSecureShellClient-3.2.9.exe Barry Marc Logghe wrote: > > >>-----Original Message----- >>From: Nathan Haigh [mailto:nathanhaigh@ukonline.co.uk] >>Sent: Tuesday, January 11, 2005 12:54 PM >>To: Marc Logghe; 'Bioperl list' >>Subject: RE: [Bioperl-l] developer cvs login >> >> >>Hmm, no go as far as getting it to work from a windows box >>without cygwin. Does anyone know if/how to setup ssh for >>windows, should >>it be possible to get putty (or something else) as the ssh client? >> >> > >Have you tried wincvs ? I only tested it with pserver connection, not with ssh, but I think it is supported. >http://www.wincvs.org/ > >HTH, >Marc > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT From nathanhaigh at ukonline.co.uk Tue Jan 11 12:03:07 2005 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Tue Jan 11 12:00:15 2005 Subject: [Bioperl-l] developer cvs login In-Reply-To: Message-ID: Thanks Jason The problem I was having was the lack of info available for a windows client. I have now managed to get things working by installing: ftp://ftp.ssh.com/pub/ssh/SSHSecureShellClient-3.2.9.exe and setting the windows env variable CVS_RSH = ssh2 Executing: cvs -d :ext:nathan@pub.open-bio.org:/home/repository/bioperl co bioperl-live using cvsnt (v 2.0.51d) now works fine! Nathan > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich@duke.edu] > Sent: 11 January 2005 16:42 > To: nathanhaigh@ukonline.co.uk > Cc: 'Marc Logghe'; Open-Bio Admins > Subject: Re: [Bioperl-l] developer cvs login > > You probably should have gotten a copy of the newuser info. I send it > out when I create new accounts - am attaching it now. --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0502-1, 11/01/2005 Tested on: 11/01/2005 17:02:58 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com From barry.moore at genetics.utah.edu Tue Jan 11 17:04:38 2005 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Tue Jan 11 17:01:05 2005 Subject: [Bioperl-l] Using a bioperl cvs checkout In-Reply-To: References: <41E32784.6060409@genetics.utah.edu> Message-ID: <41E44D76.2090402@genetics.utah.edu> Hilmar (or others)- The bioperl cvs documentation was great, and I've managed to checked out bioperl-live, bioperl-db and bioperl-run from anonymous cvs into a directory off of my home. Now I've got a couple of questions about how to best utilize this code from cvs. I've got an existing installation of bioperl 1.4 which I probably don't need to duplicate, but I'm unsure of what is the best way to utilize the code from cvs. I see that the cvs checkout comes with Makefile.PL etc. Should I run make process on the cvs checkout and let it install everything into my standard perl library location, or should I keep my cvs checkouts seperate and tell perl where it is? I don't have a developer account on bioperl cvs, so I won't be commiting (or even changing my local copy) at this point, but I might as well do things the right way and from reading 'Open Source Development with CVS' it seems like I ought to be using the cvs checkout without 'installing' it or moving it anywhere. If I keep them seperate the bioperl cvs docs suggest to export PERL5LIB='$HOME/src/bioperl' . If I do that I think perl will see two copies of the bioperl modules when I run a script (the cvs copy and the installed 1.4 copy). How do I know which copy of the modules a script will be using? I don't want to completely do away with the system installation of bioperl 1.4 because another user is using that. Barry Hilmar Lapp wrote: > Did the test script that you ran come with bioperl? Bio::DB::BioDB > comes with bioperl-db, and is not needed for anything else. Also, > bioperl-db is not included in the bioperl 1.4 distribution. If you > want it, you do need to obtain from CVS at this point. Let me know if > you have problems with that. > > Also, if you do want to use bioperl-db I do recommend you obtain > bioperl 1.4 from the CVS branch as well, or otherwise wait for the 1.5 > release. The 1.4.0 release has problems in the interpro and GO > ontology parsers, and 1.4.1 was never released in anticipation of 1.5. > > -hilmar > > On Jan 10, 2005, at 5:10 PM, Barry Moore wrote: > >> I've just installed bioperl 1.4 (bioperl-core, bioperl-run and >> bioperl-db) on a new system (Debian woody). I run a test script that >> works fine on my old system and get an error that BioDB.pm can't be >> found. Sure enough BioDB.pm isn't on my new system, but it is on the >> old (also bioperl 1.4 Debian woody). I look in cvs and BioDB.pm is >> there, but I look in the distribution downloaded from bioperl.org and >> it seems to be missing BioDB.pm and several other files? I can get >> the files from cvs, but is this an error in the distribution file? >> >> Barry >> >> -- >> Barry Moore >> Dept. of Human Genetics >> University of Utah >> Salt Lake City, UT >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT From allenday at ucla.edu Tue Jan 11 18:28:43 2005 From: allenday at ucla.edu (Allen Day) Date: Tue Jan 11 17:27:03 2005 Subject: [Bioperl-l] Using a bioperl cvs checkout In-Reply-To: <41E44D76.2090402@genetics.utah.edu> References: <41E32784.6060409@genetics.utah.edu> <41E44D76.2090402@genetics.utah.edu> Message-ID: in a calling perl script you can add a line like: use lib 'path/to/bioperl-live'; and it will take precedence over PERL5LIB / PERLLIB / @INC / etc. or you can do it like this (my preferred method for using an alternate lib for a one-off: perl -Ipath/to/bioperl-live path/to/myscript.pl this basically puts the '-I' argument into the 0th slot of @INC so it gets used first. you can give multiple '-I' args if needed. -Allen On Tue, 11 Jan 2005, Barry Moore wrote: > Hilmar (or others)- > > The bioperl cvs documentation was great, and I've managed to checked out > bioperl-live, bioperl-db and bioperl-run from anonymous cvs into a > directory off of my home. Now I've got a couple of questions about how > to best utilize this code from cvs. I've got an existing installation > of bioperl 1.4 which I probably don't need to duplicate, but I'm unsure > of what is the best way to utilize the code from cvs. I see that the > cvs checkout comes with Makefile.PL etc. Should I run make process on > the cvs checkout and let it install everything into my standard perl > library location, or should I keep my cvs checkouts seperate and tell > perl where it is? I don't have a developer account on bioperl cvs, so I > won't be commiting (or even changing my local copy) at this point, but I > might as well do things the right way and from reading 'Open Source > Development with CVS' it seems like I ought to be using the cvs checkout > without 'installing' it or moving it anywhere. If I keep them seperate > the bioperl cvs docs suggest to export PERL5LIB='$HOME/src/bioperl' . If > I do that I think perl will see two copies of the bioperl modules when I > run a script (the cvs copy and the installed 1.4 copy). How do I know > which copy of the modules a script will be using? I don't want to > completely do away with the system installation of bioperl 1.4 because > another user is using that. > > Barry > > Hilmar Lapp wrote: > > > Did the test script that you ran come with bioperl? Bio::DB::BioDB > > comes with bioperl-db, and is not needed for anything else. Also, > > bioperl-db is not included in the bioperl 1.4 distribution. If you > > want it, you do need to obtain from CVS at this point. Let me know if > > you have problems with that. > > > > Also, if you do want to use bioperl-db I do recommend you obtain > > bioperl 1.4 from the CVS branch as well, or otherwise wait for the 1.5 > > release. The 1.4.0 release has problems in the interpro and GO > > ontology parsers, and 1.4.1 was never released in anticipation of 1.5. > > > > -hilmar > > > > On Jan 10, 2005, at 5:10 PM, Barry Moore wrote: > > > >> I've just installed bioperl 1.4 (bioperl-core, bioperl-run and > >> bioperl-db) on a new system (Debian woody). I run a test script that > >> works fine on my old system and get an error that BioDB.pm can't be > >> found. Sure enough BioDB.pm isn't on my new system, but it is on the > >> old (also bioperl 1.4 Debian woody). I look in cvs and BioDB.pm is > >> there, but I look in the distribution downloaded from bioperl.org and > >> it seems to be missing BioDB.pm and several other files? I can get > >> the files from cvs, but is this an error in the distribution file? > >> > >> Barry > >> > >> -- > >> Barry Moore > >> Dept. of Human Genetics > >> University of Utah > >> Salt Lake City, UT > >> > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l@portal.open-bio.org > >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >> > > From brian_osborne at cognia.com Tue Jan 11 21:41:47 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Tue Jan 11 21:38:42 2005 Subject: [Bioperl-l] Using a bioperl cvs checkout In-Reply-To: <41E44D76.2090402@genetics.utah.edu> Message-ID: Barry, By setting PERL5LIB to some directory you're telling Perl to search that directory first when searching for the module or modules in question. So yes, Perl will have at least 2 directories in its @INC variable but it will use the modules it finds first, in PERL5LIB, and ignore the rest. This is analogous to how the OS treats the PATH variable. I commend you on your clever setup, you have the best of both worlds this way. Brian O. 105 ~>perl -e 'print @INC' /usr/lib/perl5/5.8.2/cygwin-thread-multi-64int/usr/lib/perl5/5.8.2/usr/lib/p erl5 /site_perl/5.8.2/cygwin-thread-multi-64int/usr/lib/perl5/site_perl/5.8.2/usr /lib /perl5/site_perl 106 ~>setenv PERL5LIB /fake 107 ~>perl -e 'print @INC' /fake/usr/lib/perl5/5.8.2/cygwin-thread-multi-64int/usr/lib/perl5/5.8.2/usr/ lib/ perl5/site_perl/5.8.2/cygwin-thread-multi-64int/usr/lib/perl5/site_perl/5.8. 2/us r/lib/perl5/site_perl -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Barry Moore Sent: Tuesday, January 11, 2005 5:05 PM To: Hilmar Lapp Cc: bioperl Subject: [Bioperl-l] Using a bioperl cvs checkout Hilmar (or others)- The bioperl cvs documentation was great, and I've managed to checked out bioperl-live, bioperl-db and bioperl-run from anonymous cvs into a directory off of my home. Now I've got a couple of questions about how to best utilize this code from cvs. I've got an existing installation of bioperl 1.4 which I probably don't need to duplicate, but I'm unsure of what is the best way to utilize the code from cvs. I see that the cvs checkout comes with Makefile.PL etc. Should I run make process on the cvs checkout and let it install everything into my standard perl library location, or should I keep my cvs checkouts seperate and tell perl where it is? I don't have a developer account on bioperl cvs, so I won't be commiting (or even changing my local copy) at this point, but I might as well do things the right way and from reading 'Open Source Development with CVS' it seems like I ought to be using the cvs checkout without 'installing' it or moving it anywhere. If I keep them seperate the bioperl cvs docs suggest to export PERL5LIB='$HOME/src/bioperl' . If I do that I think perl will see two copies of the bioperl modules when I run a script (the cvs copy and the installed 1.4 copy). How do I know which copy of the modules a script will be using? I don't want to completely do away with the system installation of bioperl 1.4 because another user is using that. Barry Hilmar Lapp wrote: > Did the test script that you ran come with bioperl? Bio::DB::BioDB > comes with bioperl-db, and is not needed for anything else. Also, > bioperl-db is not included in the bioperl 1.4 distribution. If you > want it, you do need to obtain from CVS at this point. Let me know if > you have problems with that. > > Also, if you do want to use bioperl-db I do recommend you obtain > bioperl 1.4 from the CVS branch as well, or otherwise wait for the 1.5 > release. The 1.4.0 release has problems in the interpro and GO > ontology parsers, and 1.4.1 was never released in anticipation of 1.5. > > -hilmar > > On Jan 10, 2005, at 5:10 PM, Barry Moore wrote: > >> I've just installed bioperl 1.4 (bioperl-core, bioperl-run and >> bioperl-db) on a new system (Debian woody). I run a test script that >> works fine on my old system and get an error that BioDB.pm can't be >> found. Sure enough BioDB.pm isn't on my new system, but it is on the >> old (also bioperl 1.4 Debian woody). I look in cvs and BioDB.pm is >> there, but I look in the distribution downloaded from bioperl.org and >> it seems to be missing BioDB.pm and several other files? I can get >> the files from cvs, but is this an error in the distribution file? >> >> Barry >> >> -- >> Barry Moore >> Dept. of Human Genetics >> University of Utah >> Salt Lake City, UT >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From davidg at lsi.upc.edu Wed Jan 12 06:35:52 2005 From: davidg at lsi.upc.edu (=?iso-8859-1?Q?David_Garc=EDa_Cort=E9s?=) Date: Wed Jan 12 06:32:44 2005 Subject: [Bioperl-l] Getting entire descriptions from FASTA files Message-ID: <006e01c4f89a$e25108b0$cf1e5393@Davidg> Hello. I want to get the entire description line in FASTA format (i mean: description, accession number, etc...). Ive tried with display_id this way: my $seq_inIO = Bio::SeqIO->new(-file => "$proteasa", -format => 'Fasta'); my $seq_in = $seq_inIO->next_seq(); my $id_peptid = $seq_in->display_id; but I only obtain the gi and gb numbers, not the description line. Then, I tried with $seq_in->desc instead of $seq_in->display_id , but then I only obtain the description (or part of it). Is there a way to get the entire description line the same way you see it at the FASTA file? Thanks. -- David Garc?a Cort?s Instituto Nacional de Bioinform?tica (INB) Nodo Computacional GNHC-2 UPC-CIRI c/. Jordi Girona 1-3 Modul C6-E201 Tel. : 934 011 650 E-08034 Barcelona Fax : 934 017 014 Catalunya (Spain) e-mail: davidg@lsi.upc.edu From Marc.Logghe at devgen.com Wed Jan 12 07:53:21 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Wed Jan 12 07:50:18 2005 Subject: [Bioperl-l] Getting entire descriptions from FASTA files Message-ID: Hi David, > I want to get the entire description line in FASTA format (i > mean: description, accession number, etc...). Ive tried with > display_id this way: > > my $seq_inIO = Bio::SeqIO->new(-file => "$proteasa", > -format => 'Fasta'); > > my $seq_in = $seq_inIO->next_seq(); > > my $id_peptid = $seq_in->display_id; > > but I only obtain the gi and gb numbers, not the description line. > > Then, I tried with $seq_in->desc instead of > $seq_in->display_id , but then I only obtain the description > (or part of it). > > Is there a way to get the entire description line the same > way you see it at the FASTA file? You can reconstruct it by concatenating the id and description: my $fasta_line = join ' ', $seq_in->display_id, $seq_in->desc; Of course, I don't know what's the purpose of your script, but if it is only to fetch the > line, why not just a plain-ol' grep ? something like: grep '^>' /your/fastafile | sed "s/^>//" HTH, Marc From brian_osborne at cognia.com Wed Jan 12 08:33:27 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Wed Jan 12 08:30:20 2005 Subject: [Bioperl-l] Getting entire descriptions from FASTA files In-Reply-To: <006e01c4f89a$e25108b0$cf1e5393@Davidg> Message-ID: David, $seq_in->display_id and $seq_in->desc together should constitute the entire line - you're not seeing this? Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of David Garc?a Cort?s Sent: Wednesday, January 12, 2005 6:36 AM To: bioperl-l@bioperl.org Subject: [Bioperl-l] Getting entire descriptions from FASTA files Hello. I want to get the entire description line in FASTA format (i mean: description, accession number, etc...). Ive tried with display_id this way: my $seq_inIO = Bio::SeqIO->new(-file => "$proteasa", -format => 'Fasta'); my $seq_in = $seq_inIO->next_seq(); my $id_peptid = $seq_in->display_id; but I only obtain the gi and gb numbers, not the description line. Then, I tried with $seq_in->desc instead of $seq_in->display_id , but then I only obtain the description (or part of it). Is there a way to get the entire description line the same way you see it at the FASTA file? Thanks. -- David Garc?a Cort?s Instituto Nacional de Bioinform?tica (INB) Nodo Computacional GNHC-2 UPC-CIRI c/. Jordi Girona 1-3 Modul C6-E201 Tel. : 934 011 650 E-08034 Barcelona Fax : 934 017 014 Catalunya (Spain) e-mail: davidg@lsi.upc.edu _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From brian_osborne at cognia.com Wed Jan 12 09:37:53 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Wed Jan 12 09:34:23 2005 Subject: [Bioperl-l] POD note Message-ID: bioperl-l, There are various short tags that you can use in POD to italicize, emphasize, etc. The POD utilities, like pod2html, will only interpret these tags if the POD line containing them is not indented. So, this works: The L module... But this doesn't: The L module... What happens in that last case is that the line is treated literally, the "L<" and ">" end up in the resultant HTML, if you've run pod2html. I only mention this because it seems to me that I'm removing tags that I've removed before - perhaps someone is putting these back? I could be wrong about that... Brian O. From amackey at pcbi.upenn.edu Wed Jan 12 11:31:42 2005 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Wed Jan 12 11:28:11 2005 Subject: [Bioperl-l] POD note In-Reply-To: References: Message-ID: <70F13BBA-64B7-11D9-AC7B-000D93392082@pcbi.upenn.edu> POD markup (L<>, B<>, etc) is only valid in auto-formatted text (which in POD, is only non-indented text). It sounds like we have some indented text that shouldn't be indented (rather than removing otherwise valid markup). POD interprets any indented text as literal, pre-formatted text (much like
 in HTML).  Is this why much of our 
documentation is so poorly line-wrapped !?!

-Aaron

On Jan 12, 2005, at 9:37 AM, Brian Osborne wrote:

> bioperl-l,
>
> There are various short tags that you can use in POD to italicize,
> emphasize, etc. The POD utilities, like pod2html, will only interpret 
> these
> tags if the POD line containing them is not indented. So, this works:
>
> The L module...
>
> But this doesn't:
>
>    The L module...
>
> What happens in that last case is that the line is treated literally, 
> the
> "L<" and ">" end up in the resultant HTML, if you've run pod2html. I 
> only
> mention this because it seems to me that I'm removing tags that I've 
> removed
> before - perhaps someone is putting these back? I could be wrong about
> that...
>
>
> Brian O.
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Aaron J. Mackey, Ph.D.
Dept. of Biology, Goddard 212
University of Pennsylvania       email:  amackey@pcbi.upenn.edu
415 S. University Avenue         office: 215-898-1205
Philadelphia, PA  19104-6017     fax:    215-746-6697

From amackey at pcbi.upenn.edu  Wed Jan 12 11:35:08 2005
From: amackey at pcbi.upenn.edu (Aaron J. Mackey)
Date: Wed Jan 12 11:31:29 2005
Subject: [Bioperl-l] POD note
In-Reply-To: <70F13BBA-64B7-11D9-AC7B-000D93392082@pcbi.upenn.edu>
References: 
	<70F13BBA-64B7-11D9-AC7B-000D93392082@pcbi.upenn.edu>
Message-ID: 


Ahh, I see now, these are in our pre-formatted API summaries ...

-Aaron

On Jan 12, 2005, at 11:31 AM, Aaron J. Mackey wrote:

>
> POD markup (L<>, B<>, etc) is only valid in auto-formatted text (which 
> in POD, is only non-indented text).  It sounds like we have some 
> indented text that shouldn't be indented (rather than removing 
> otherwise valid markup).  POD interprets any indented text as literal, 
> pre-formatted text (much like 
 in HTML).  Is this why much of our 
> documentation is so poorly line-wrapped !?!
>
> -Aaron
>
> On Jan 12, 2005, at 9:37 AM, Brian Osborne wrote:
>
>> bioperl-l,
>>
>> There are various short tags that you can use in POD to italicize,
>> emphasize, etc. The POD utilities, like pod2html, will only interpret 
>> these
>> tags if the POD line containing them is not indented. So, this works:
>>
>> The L module...
>>
>> But this doesn't:
>>
>>    The L module...
>>
>> What happens in that last case is that the line is treated literally, 
>> the
>> "L<" and ">" end up in the resultant HTML, if you've run pod2html. I 
>> only
>> mention this because it seems to me that I'm removing tags that I've 
>> removed
>> before - perhaps someone is putting these back? I could be wrong about
>> that...
>>
>>
>> Brian O.
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> --
> Aaron J. Mackey, Ph.D.
> Dept. of Biology, Goddard 212
> University of Pennsylvania       email:  amackey@pcbi.upenn.edu
> 415 S. University Avenue         office: 215-898-1205
> Philadelphia, PA  19104-6017     fax:    215-746-6697
>
>
--
Aaron J. Mackey, Ph.D.
Dept. of Biology, Goddard 212
University of Pennsylvania       email:  amackey@pcbi.upenn.edu
415 S. University Avenue         office: 215-898-1205
Philadelphia, PA  19104-6017     fax:    215-746-6697

From brian_osborne at cognia.com  Wed Jan 12 15:07:10 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Wed Jan 12 15:04:05 2005
Subject: [Bioperl-l] Storing Blast object in a local database
In-Reply-To: <1105400867.4274.4.camel@bacp4>
Message-ID: 

Tim,

One way is to "stringify" the object like so:

use Data::Dumper;
$str = Dumper($blast_object);

Then store the string in your database. To re-create the Blast object
retrieve the string, then something like:

$blast_object = eval "$str";


Brian O.


-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Tim Erwin
Sent: Monday, January 10, 2005 6:48 PM
To: Bioperl List
Subject: [Bioperl-l] Storing Blast object in a local database


Hi all,

Is it possible to store a blast object
(Bio::Search::Result::BlastResult) in a mysql database?

Any pointers would be appreciated.

Regards,

Tim

_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at duke.edu  Wed Jan 12 15:14:17 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed Jan 12 15:10:53 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
Message-ID: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu>

In preparation for Bioperl 1.5.0 developer release I have put up 
Release Candidate 2.

  http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.gz
  http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.bz2
  http://bioperl.org/DIST/bioperl-1.5.0-RC2.zip


We need people to test on this.  So download, run
  perl Makefile.PL
  make
  make test

Let us know what breaks.  I've tested on OS X and few different linux 
installs with different auxiliary modules installed.  Would be nice to 
have a few more combinations of OS, perl versions, and suite of modules 
installed before we make a release.

Thanks for your help.
-jason
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From jason.stajich at duke.edu  Wed Jan 12 15:46:45 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed Jan 12 15:43:12 2005
Subject: [Bioperl-l] Storing Blast object in a local database
In-Reply-To: <1105400867.4274.4.camel@bacp4>
References: <1105400867.4274.4.camel@bacp4>
Message-ID: <128A603A-64DB-11D9-A0F3-000393C44276@duke.edu>

Ensembl has a strategy.  Their objects extend Bio::Search and store the 
full data I believe.  WIll could probably speak more to what the 
strategy is.

-jason

On Jan 10, 2005, at 6:47 PM, Tim Erwin wrote:

> Hi all,
>
> Is it possible to store a blast object
> (Bio::Search::Result::BlastResult) in a mysql database?
>
> Any pointers would be appreciated.
>
> Regards,
>
> Tim
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From allenday at ucla.edu  Wed Jan 12 17:48:23 2005
From: allenday at ucla.edu (Allen Day)
Date: Wed Jan 12 17:44:48 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu>
References: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu>
Message-ID: 

.....
t/Ontology...................set_attribute: not a compat02 graph at 
/net/groove/lib/perl5/site_perl/5.8.0/Graph.pm line 2253,  line 10.
t/Ontology...................dubious                                         
        Test returned status 255 (wstat 65280, 0xff00)
DIED. FAILED tests 1-50
        Failed 50/50 tests, 0.00% okay
t/OntologyEngine.............ok                                              
t/OntologyStore..............FAILED tests 3-6                                
        Failed 4/6 tests, 33.33% okay
.....
t/simpleGOparser.............set_attribute: not a compat02 graph at 
/net/groove/lib/perl5/site_perl/5.8.0/Graph.pm line 2253,  line 14.
t/simpleGOparser.............dubious                                         
        Test returned status 255 (wstat 65280, 0xff00)
DIED. FAILED tests 1-101
        Failed 101/101 tests, 0.00% okay
.....
Failed Test        Stat Wstat Total Fail  Failed  List of Failed
-------------------------------------------------------------------------------
t/Ontology.t        255 65280    50  100 200.00%  1-50
t/OntologyStore.t                 6    4  66.67%  3-6
t/simpleGOparser.t  255 65280   101  202 200.00%  1-101
114 subtests skipped.
Failed 3/193 test scripts, 98.45% okay. 155/8964 subtests failed, 98.27% 
okay.
make: *** [test_dynamic] Error 29

~~~~~

This is perl, v5.8.0 built for i386-linux-thread-multi
(with 1 registered patch, see perl -V for more detail)

Copyright 1987-2002, Larry Wall

Perl may be copied only under the terms of either the Artistic License or 
the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using `man perl' or `perldoc perl'.  If you have access to the
Internet, point your browser at http://www.perl.com/, the Perl Home Page.

~~~~

Looks like this is caused by Graph.pm.  I've seen other reports about "not
a compat02 graph" recently, maybe there is a Graph.pm versioning problem?

-Allen


On Wed, 12 Jan 2005, Jason Stajich wrote:

> In preparation for Bioperl 1.5.0 developer release I have put up 
> Release Candidate 2.
> 
>   http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.gz
>   http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.bz2
>   http://bioperl.org/DIST/bioperl-1.5.0-RC2.zip
> 
> 
> We need people to test on this.  So download, run
>   perl Makefile.PL
>   make
>   make test
> 
> Let us know what breaks.  I've tested on OS X and few different linux 
> installs with different auxiliary modules installed.  Would be nice to 
> have a few more combinations of OS, perl versions, and suite of modules 
> installed before we make a release.
> 
> Thanks for your help.
> -jason
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
From nathanhaigh at ukonline.co.uk  Thu Jan 13 03:56:07 2005
From: nathanhaigh at ukonline.co.uk (Nathan Haigh)
Date: Thu Jan 13 03:52:35 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu>
Message-ID: 

......
t/GOR4.......................ok 3/13Can't call method "start" on an undefined value at t/GOR4.t line 80,  line 1.
t/GOR4.......................dubious
        Test returned status 76 (wstat 19456, 0x4c00)
DIED. FAILED test 7
        Failed 1/13 tests, 92.31% okay
t/GOterm.....................ok
.......
t/HNN........................FAILED tests 7, 12
        Failed 2/13 tests, 84.62% okay
.......
t/Sopma......................FAILED tests 7-8, 14
        Failed 3/15 tests, 80.00% okay
.......
Failed Test Stat Wstat Total Fail  Failed  List of Failed
-------------------------------------------------------------------------------
t/GOR4.t      76 19456    13    1   7.69%  7
t/HNN.t                   13    2  15.38%  7 12
t/Sopma.t                 15    3  20.00%  7-8 14
2 subtests skipped.

~~~~~~~
WinXP Pro v5.1.2600 Service Pack 1 Build 2600
~~~~~~~~
This is perl, v5.8.0 built for MSWin32-x86-multi-thread
(with 1 registered patch, see perl -V for more detail)

Copyright 1987-2002, Larry Wall

Binary build 804 provided by ActiveState Corp. http://www.ActiveState.com
Built 23:15:13 Dec  1 2002

If you need a hand working these problems out give me a shout!
Nathan


> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Jason Stajich
> Sent: 12 January 2005 20:14
> To: Bioperl list
> Subject: [Bioperl-l] bioperl-1.5.0 RC2
> 
> In preparation for Bioperl 1.5.0 developer release I have put up
> Release Candidate 2.
> 
>   http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.gz
>   http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.bz2
>   http://bioperl.org/DIST/bioperl-1.5.0-RC2.zip
> 
> 
> We need people to test on this.  So download, run
>   perl Makefile.PL
>   make
>   make test
> 
> Let us know what breaks.  I've tested on OS X and few different linux
> installs with different auxiliary modules installed.  Would be nice to
> have a few more combinations of OS, perl versions, and suite of modules
> installed before we make a release.
> 
> Thanks for your help.
> -jason
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> ---
> avast! Antivirus: Inbound message clean.
> Virus Database (VPS): 0502-2, 11/01/2005
> Tested on: 12/01/2005 21:49:55
> avast! is copyright (c) 2000-2003 ALWIL Software.
> http://www.avast.com
> 
> 

---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0502-2, 11/01/2005
Tested on: 13/01/2005 08:54:25
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com




From brian_osborne at cognia.com  Thu Jan 13 07:22:14 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Thu Jan 13 07:19:24 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu>
Message-ID: 

Jason,

All tests pass on CYGWIN_NT-5.0.

Brian O.

-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Jason Stajich
Sent: Wednesday, January 12, 2005 3:14 PM
To: Bioperl list
Subject: [Bioperl-l] bioperl-1.5.0 RC2


In preparation for Bioperl 1.5.0 developer release I have put up 
Release Candidate 2.

  http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.gz
  http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.bz2
  http://bioperl.org/DIST/bioperl-1.5.0-RC2.zip


We need people to test on this.  So download, run
  perl Makefile.PL
  make
  make test

Let us know what breaks.  I've tested on OS X and few different linux 
installs with different auxiliary modules installed.  Would be nice to 
have a few more combinations of OS, perl versions, and suite of modules 
installed before we make a release.

Thanks for your help.
-jason
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From razi at genet.sickkids.on.ca  Wed Jan 12 22:52:32 2005
From: razi at genet.sickkids.on.ca (Razi Khaja)
Date: Thu Jan 13 08:19:20 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu>
Message-ID: <20050113035233.16569.qmail@web51602.mail.yahoo.com>

I've tested RC2 on FreeBSD 5.3 running perl5.8.5 on i386.  This has been
tested with all prerequisite modules installed (including Graph::Directed
(J/JH/JHI/Graph-0.51.tar.gz)as perl output of 'perl Makefile.PL'. 

Attached is the output of make test (make_test.out.gz).

Summary of make test included here:
Failed 3/193 test scripts, 98.45% okay. 155/8956 subtests failed, 98.27%
okay.
Failed Test        Stat Wstat Total Fail  Failed  List of Failed
-------------------------------------------------------------------------------
t/Ontology.t        255 65280    50  100 200.00%  1-50
t/OntologyStore.t                 6    4  66.67%  3-6
t/simpleGOparser.t  255 65280   101  202 200.00%  1-101
2 subtests skipped.
*** Error code 25

Stop in /usr/home/bioperl/bioperl-1.5.0-RC2.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: make_test.out.gz
Type: application/x-gzip
Size: 14925 bytes
Desc: make_test.out.gz
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050112/15f05e3f/make_test.out.bin
From danielucgbioinfo at yahoo.com.br  Thu Jan 13 09:05:50 2005
From: danielucgbioinfo at yahoo.com.br (Danielucg Sousa)
Date: Thu Jan 13 09:03:28 2005
Subject: [Bioperl-l] Clickable Graphics
Message-ID: <20050113140550.22237.qmail@web53504.mail.yahoo.com>

Hi, I'm trying to do is to render a Sequence as a png
file, but clickable. I need to make each glyph
clickable(online whith CGI). But I haven't achieved
nor one glyph clickable.
Any can send me a exempla this kind of code. I have
used Bio:Graphics::Panel 


Thank you,
Daniel Xavier - 
BioinfoUCG - Brazil


	
	
		
_______________________________________________________ 
Yahoo! Acesso Gr?tis - Instale o discador do Yahoo! agora. http://br.acesso.yahoo.com/ - Internet r?pida e gr?tis
From paulo.david at netvisao.pt  Thu Jan 13 09:43:48 2005
From: paulo.david at netvisao.pt (Paulo Almeida)
Date: Thu Jan 13 09:41:25 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: 
References: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu>
	
Message-ID: <59455.193.137.94.3.1105627428.squirrel@193.137.94.3>

This is my error output on Linux (Debian testing freshly installed,
including the bioperl-1.4 package; kernel 2.6.8; perl v5.8.4), but it
skipped many tests:

Failed Test           Stat Wstat Total Fail  Failed  List of Failed
-------------------------------------------------------------------------------
t/SeqFeatCollection.t              432    1   0.23%  423
109 subtests skipped.
Failed 1/193 test scripts, 99.48% okay. 1/8762 subtests failed, 99.99% okay.
make: *** [test_dynamic] Error 255


> -----Original Message-----
>
> In preparation for Bioperl 1.5.0 developer release I have put up
> Release Candidate 2.
>
>   http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.gz
>   http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.bz2
>   http://bioperl.org/DIST/bioperl-1.5.0-RC2.zip
>
>
> We need people to test on this.  So download, run
>   perl Makefile.PL
>   make
>   make test
>
> Let us know what breaks.  I've tested on OS X and few different linux
> installs with different auxiliary modules installed.  Would be nice to
> have a few more combinations of OS, perl versions, and suite of modules
> installed before we make a release.
>
> Thanks for your help.
> -jason

From e-just at northwestern.edu  Thu Jan 13 11:58:32 2005
From: e-just at northwestern.edu (Eric Just)
Date: Thu Jan 13 11:55:07 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu>
References: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu>
Message-ID: <6.1.1.1.2.20050113105812.05ea36a8@hecky.it.northwestern.edu>




activestate 5.8, windows XP

Failed Test  Stat Wstat Total Fail  Failed  List of Failed
-------------------------------------------------------------------------------
t\Registry.t  255 65280    13   11  84.62%  8-13
t\flat.t        2   512    16   30 187.50%  2-16
4 subtests skipped.
Failed 2/193 test scripts, 98.96% okay. 21/8916 subtests failed, 99.76% okay.

D:\tmp\New Folder\bioperl-1.5.0-RC2>perl t/flat.t
1..16
ok 1

------------- EXCEPTION  -------------
MSG: flat file D:\tmp\New Folder\bioperl-1.5.0-RC2\tmpNew cannot be read: 
No such file or directory
STACK Bio::DB::Flat::add_flat_file D:/Perl/site/lib/Bio/DB/Flat.pm:358
STACK Bio::DB::Flat::_path2fileno D:/Perl/site/lib/Bio/DB/Flat.pm:519
STACK Bio::DB::Flat::BDB::_index_file D:/Perl/site/lib/Bio/DB/Flat/BDB.pm:228
STACK Bio::DB::Flat::BDB::build_index D:/Perl/site/lib/Bio/DB/Flat/BDB.pm:218
STACK toplevel t/flat.t:70

--------------------------------------

D:\tmp\New Folder\bioperl-1.5.0-RC2>perl t/Registry.t
1..13
ok 1
ok 2
ok 3 # DB_File or BerkeleyDB not found, skipping DB_File tests
ok 4 # DB_File or BerkeleyDB not found, skipping DB_File tests
This Perl doesn't implement function getpwuid(). Skipping...

-------------------- WARNING ---------------------
MSG: Couldn't call new_from_registry on [Bio::DB::Flat]

------------- EXCEPTION  -------------
MSG: No flatfile fileid files in config - check the index has been made 
correctly
STACK Bio::DB::Flat::BinarySearch::read_config_file 
D:/Perl/site/lib/Bio/DB/Flat/BinarySearch.pm:1297
STACK Bio::DB::Flat::BinarySearch::new 
D:/Perl/site/lib/Bio/DB/Flat/BinarySearch.pm:280
STACK Bio::DB::Flat::new D:/Perl/site/lib/Bio/DB/Flat.pm:181
STACK Bio::DB::Flat::new_from_registry D:/Perl/site/lib/Bio/DB/Flat.pm:256
STACK (eval) D:/Perl/site/lib/Bio/DB/Registry.pm:184
STACK Bio::DB::Registry::_load_registry D:/Perl/site/lib/Bio/DB/Registry.pm:183
STACK Bio::DB::Registry::new D:/Perl/site/lib/Bio/DB/Registry.pm:96
STACK toplevel t/Registry.t:69

--------------------------------------

---------------------------------------------------
ok 5
ok 6
ok 7
not ok 8
# Failed test 8 in t/Registry.t at line 77
Can't call method "seq" on an undefined value at t/Registry.t line 78.

D:\tmp\New Folder\bioperl-1.5.0-RC2>


At 02:14 PM 1/12/2005, Jason Stajich wrote:
>In preparation for Bioperl 1.5.0 developer release I have put up Release 
>Candidate 2.
>
>  http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.gz
>  http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.bz2
>  http://bioperl.org/DIST/bioperl-1.5.0-RC2.zip
>
>
>We need people to test on this.  So download, run
>  perl Makefile.PL
>  make
>  make test
>
>Let us know what breaks.  I've tested on OS X and few different linux 
>installs with different auxiliary modules installed.  Would be nice to 
>have a few more combinations of OS, perl versions, and suite of modules 
>installed before we make a release.
>
>Thanks for your help.
>-jason
>--
>Jason Stajich
>jason.stajich at duke.edu
>http://www.duke.edu/~jes12/
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l

============================================

Eric Just
e-just@northwestern.edu
dictyBase Programmer
Center for Genetic Medicine
Northwestern University
http://dictybase.org

============================================ 

From Jan.Aerts at wur.nl  Thu Jan 13 12:20:41 2005
From: Jan.Aerts at wur.nl (Aerts, Jan)
Date: Thu Jan 13 12:17:23 2005
Subject: [Bioperl-l] Clickable Graphics
Message-ID: <7D030487F1A3D143A76F2A1E91F570350186DB62@scomp0010>

Hi Daniel,

Do you mean you want a graphical representation of your sequence in e.g. a web-browser, with the features linking to additional information or other websites? If so, I'd seriously suggest gbrowse (or Generic Genome Browser; http://www.gmod.org/ggb/gbrowse.shtml). Look at the website for the (very easy) installation instructions and the (equally simple) tutorial.

GBrowse actually uses the Bio::Graphics objects to build its graphics.

If you'd happen to be at the PAG meeting in San Diego: Scott Cain will give a demo of gbrowse.

Good luck,
Jan Aerts

-----Original Message-----
From:	bioperl-l-bounces@portal.open-bio.org on behalf of Danielucg Sousa
Sent:	Thu 13-Jan-05 15:05
To:	bioperl-l@portal.open-bio.org
Cc:	
Subject:	[Bioperl-l] Clickable Graphics
Hi, I'm trying to do is to render a Sequence as a png
file, but clickable. I need to make each glyph
clickable(online whith CGI). But I haven't achieved
nor one glyph clickable.
Any can send me a exempla this kind of code. I have
used Bio:Graphics::Panel 


Thank you,
Daniel Xavier - 
BioinfoUCG - Brazil


	
	
		
_______________________________________________________ 
Yahoo! Acesso Gr?tis - Instale o discador do Yahoo! agora. http://br.acesso.yahoo.com/ - Internet r?pida e gr?tis
_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l




From khh103 at york.ac.uk  Thu Jan 13 12:24:21 2005
From: khh103 at york.ac.uk (Kat Hull)
Date: Thu Jan 13 12:22:18 2005
Subject: [Bioperl-l] Getting started with Bio::Perl
Message-ID: <41E6AEC5.2050302@york.ac.uk>

*Dear Users,
I have a newbie question!  I am interested in the following module 'Bio::Tools::Run::PiseApplication::codonw' but really don't
know how to start to use it.  I have looked at the documentation etc but am confused about how to pass my array of sequences to
the module and then how to call the individual functions to perform the calculations (e.g. gc, cai, fop...).

Does anyone have a simple script showing how to run this module with the input as an array of fasta format sequences?
Many thanks,

Kat
*


From jason.stajich at duke.edu  Thu Jan 13 12:34:41 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu Jan 13 12:30:57 2005
Subject: [Bioperl-l] Getting started with Bio::Perl
In-Reply-To: <41E6AEC5.2050302@york.ac.uk>
References: <41E6AEC5.2050302@york.ac.uk>
Message-ID: <68529540-6589-11D9-9682-000393C44276@duke.edu>

See the documentation in the SYNOPSIS of
Bio::Tools::Run::PiseApplication


-jason
On Jan 13, 2005, at 12:24 PM, Kat Hull wrote:

> Bio::Tools::Run::PiseApplication
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From nathanhaigh at ukonline.co.uk  Thu Jan 13 14:49:00 2005
From: nathanhaigh at ukonline.co.uk (Nathan Haigh)
Date: Thu Jan 13 14:45:54 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: <6.1.1.1.2.20050113105812.05ea36a8@hecky.it.northwestern.edu>
Message-ID: 

I'm a little confused about what your results are from. Obviously the failed test table was from an "nmake test".
However, it appears that you then did a "perl t\flat.t" why? Was it to get the full details of that particular test? If so you need
to run:
"perl -I. -w t\flat.t"
The -I. ensures you use the bioperl modules from the bioperl-1.5.0-RC2 not your installed version of bioperl (which may be 1.4 or
from the cvs). Also make sure that you are running from a path that contains no spaces - it appears as though you unpacked the
contents of bioperl-1.5.0-RC2 into a folder called "New Folder", this path contains a space, so may (and probably will) cause
unexpected effects/results.

Nathan

> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Eric Just
> Sent: 13 January 2005 16:59
> To: Jason Stajich; Bioperl list
> Subject: Re: [Bioperl-l] bioperl-1.5.0 RC2
> 
> 
> 
> 
> activestate 5.8, windows XP
> 
> Failed Test  Stat Wstat Total Fail  Failed  List of Failed
> -------------------------------------------------------------------------------
> t\Registry.t  255 65280    13   11  84.62%  8-13
> t\flat.t        2   512    16   30 187.50%  2-16
> 4 subtests skipped.
> Failed 2/193 test scripts, 98.96% okay. 21/8916 subtests failed, 99.76% okay.
> 
> D:\tmp\New Folder\bioperl-1.5.0-RC2>perl t/flat.t
> 1..16
> ok 1
> 
> ------------- EXCEPTION  -------------
> MSG: flat file D:\tmp\New Folder\bioperl-1.5.0-RC2\tmpNew cannot be read:
> No such file or directory
> STACK Bio::DB::Flat::add_flat_file D:/Perl/site/lib/Bio/DB/Flat.pm:358
> STACK Bio::DB::Flat::_path2fileno D:/Perl/site/lib/Bio/DB/Flat.pm:519
> STACK Bio::DB::Flat::BDB::_index_file D:/Perl/site/lib/Bio/DB/Flat/BDB.pm:228
> STACK Bio::DB::Flat::BDB::build_index D:/Perl/site/lib/Bio/DB/Flat/BDB.pm:218
> STACK toplevel t/flat.t:70
> 
> --------------------------------------
> 
> D:\tmp\New Folder\bioperl-1.5.0-RC2>perl t/Registry.t
> 1..13
> ok 1
> ok 2
> ok 3 # DB_File or BerkeleyDB not found, skipping DB_File tests
> ok 4 # DB_File or BerkeleyDB not found, skipping DB_File tests
> This Perl doesn't implement function getpwuid(). Skipping...
> 
> -------------------- WARNING ---------------------
> MSG: Couldn't call new_from_registry on [Bio::DB::Flat]
> 
> ------------- EXCEPTION  -------------
> MSG: No flatfile fileid files in config - check the index has been made
> correctly
> STACK Bio::DB::Flat::BinarySearch::read_config_file
> D:/Perl/site/lib/Bio/DB/Flat/BinarySearch.pm:1297
> STACK Bio::DB::Flat::BinarySearch::new
> D:/Perl/site/lib/Bio/DB/Flat/BinarySearch.pm:280
> STACK Bio::DB::Flat::new D:/Perl/site/lib/Bio/DB/Flat.pm:181
> STACK Bio::DB::Flat::new_from_registry D:/Perl/site/lib/Bio/DB/Flat.pm:256
> STACK (eval) D:/Perl/site/lib/Bio/DB/Registry.pm:184
> STACK Bio::DB::Registry::_load_registry D:/Perl/site/lib/Bio/DB/Registry.pm:183
> STACK Bio::DB::Registry::new D:/Perl/site/lib/Bio/DB/Registry.pm:96
> STACK toplevel t/Registry.t:69
> 
> --------------------------------------
> 
> ---------------------------------------------------
> ok 5
> ok 6
> ok 7
> not ok 8
> # Failed test 8 in t/Registry.t at line 77
> Can't call method "seq" on an undefined value at t/Registry.t line 78.
> 
> D:\tmp\New Folder\bioperl-1.5.0-RC2>
> 
> 
> At 02:14 PM 1/12/2005, Jason Stajich wrote:
> >In preparation for Bioperl 1.5.0 developer release I have put up Release
> >Candidate 2.
> >
> >  http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.gz
> >  http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.bz2
> >  http://bioperl.org/DIST/bioperl-1.5.0-RC2.zip
> >
> >
> >We need people to test on this.  So download, run
> >  perl Makefile.PL
> >  make
> >  make test
> >
> >Let us know what breaks.  I've tested on OS X and few different linux
> >installs with different auxiliary modules installed.  Would be nice to
> >have a few more combinations of OS, perl versions, and suite of modules
> >installed before we make a release.
> >
> >Thanks for your help.
> >-jason
> >--
> >Jason Stajich
> >jason.stajich at duke.edu
> >http://www.duke.edu/~jes12/
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l@portal.open-bio.org
> >http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
> ============================================
> 
> Eric Just
> e-just@northwestern.edu
> dictyBase Programmer
> Center for Genetic Medicine
> Northwestern University
> http://dictybase.org
> 
> ============================================
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0502-2, 11/01/2005
Tested on: 13/01/2005 19:46:20
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com




From amackey at pcbi.upenn.edu  Thu Jan 13 15:05:25 2005
From: amackey at pcbi.upenn.edu (Aaron J. Mackey)
Date: Thu Jan 13 15:02:02 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: 
References: 
Message-ID: <76EF80F6-659E-11D9-AFAA-000D93392082@pcbi.upenn.edu>


To be pedantic, you should use "perl -Mblib t\flat.t" to ensure that  
all the right "build lib" files are being used.  But it looks like part  
of the problem is a mismatch between the expected number of tests and  
the number of tests actually run ...

-Aaron

On Jan 13, 2005, at 2:49 PM, Nathan Haigh wrote:

> I'm a little confused about what your results are from. Obviously the  
> failed test table was from an "nmake test".
> However, it appears that you then did a "perl t\flat.t" why? Was it to  
> get the full details of that particular test? If so you need
> to run:
> "perl -I. -w t\flat.t"
> The -I. ensures you use the bioperl modules from the bioperl-1.5.0-RC2  
> not your installed version of bioperl (which may be 1.4 or
> from the cvs). Also make sure that you are running from a path that  
> contains no spaces - it appears as though you unpacked the
> contents of bioperl-1.5.0-RC2 into a folder called "New Folder", this  
> path contains a space, so may (and probably will) cause
> unexpected effects/results.
>
> Nathan
>
>> -----Original Message-----
>> From: bioperl-l-bounces@portal.open-bio.org  
>> [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Eric Just
>> Sent: 13 January 2005 16:59
>> To: Jason Stajich; Bioperl list
>> Subject: Re: [Bioperl-l] bioperl-1.5.0 RC2
>>
>>
>>
>>
>> activestate 5.8, windows XP
>>
>> Failed Test  Stat Wstat Total Fail  Failed  List of Failed
>> ---------------------------------------------------------------------- 
>> ---------
>> t\Registry.t  255 65280    13   11  84.62%  8-13
>> t\flat.t        2   512    16   30 187.50%  2-16
>> 4 subtests skipped.
>> Failed 2/193 test scripts, 98.96% okay. 21/8916 subtests failed,  
>> 99.76% okay.
>>
>> D:\tmp\New Folder\bioperl-1.5.0-RC2>perl t/flat.t
>> 1..16
>> ok 1
>>
>> ------------- EXCEPTION  -------------
>> MSG: flat file D:\tmp\New Folder\bioperl-1.5.0-RC2\tmpNew cannot be  
>> read:
>> No such file or directory
>> STACK Bio::DB::Flat::add_flat_file D:/Perl/site/lib/Bio/DB/Flat.pm:358
>> STACK Bio::DB::Flat::_path2fileno D:/Perl/site/lib/Bio/DB/Flat.pm:519
>> STACK Bio::DB::Flat::BDB::_index_file  
>> D:/Perl/site/lib/Bio/DB/Flat/BDB.pm:228
>> STACK Bio::DB::Flat::BDB::build_index  
>> D:/Perl/site/lib/Bio/DB/Flat/BDB.pm:218
>> STACK toplevel t/flat.t:70
>>
>> --------------------------------------
>>
>> D:\tmp\New Folder\bioperl-1.5.0-RC2>perl t/Registry.t
>> 1..13
>> ok 1
>> ok 2
>> ok 3 # DB_File or BerkeleyDB not found, skipping DB_File tests
>> ok 4 # DB_File or BerkeleyDB not found, skipping DB_File tests
>> This Perl doesn't implement function getpwuid(). Skipping...
>>
>> -------------------- WARNING ---------------------
>> MSG: Couldn't call new_from_registry on [Bio::DB::Flat]
>>
>> ------------- EXCEPTION  -------------
>> MSG: No flatfile fileid files in config - check the index has been  
>> made
>> correctly
>> STACK Bio::DB::Flat::BinarySearch::read_config_file
>> D:/Perl/site/lib/Bio/DB/Flat/BinarySearch.pm:1297
>> STACK Bio::DB::Flat::BinarySearch::new
>> D:/Perl/site/lib/Bio/DB/Flat/BinarySearch.pm:280
>> STACK Bio::DB::Flat::new D:/Perl/site/lib/Bio/DB/Flat.pm:181
>> STACK Bio::DB::Flat::new_from_registry  
>> D:/Perl/site/lib/Bio/DB/Flat.pm:256
>> STACK (eval) D:/Perl/site/lib/Bio/DB/Registry.pm:184
>> STACK Bio::DB::Registry::_load_registry  
>> D:/Perl/site/lib/Bio/DB/Registry.pm:183
>> STACK Bio::DB::Registry::new D:/Perl/site/lib/Bio/DB/Registry.pm:96
>> STACK toplevel t/Registry.t:69
>>
>> --------------------------------------
>>
>> ---------------------------------------------------
>> ok 5
>> ok 6
>> ok 7
>> not ok 8
>> # Failed test 8 in t/Registry.t at line 77
>> Can't call method "seq" on an undefined value at t/Registry.t line 78.
>>
>> D:\tmp\New Folder\bioperl-1.5.0-RC2>
>>
>>
>> At 02:14 PM 1/12/2005, Jason Stajich wrote:
>>> In preparation for Bioperl 1.5.0 developer release I have put up  
>>> Release
>>> Candidate 2.
>>>
>>>  http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.gz
>>>  http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.bz2
>>>  http://bioperl.org/DIST/bioperl-1.5.0-RC2.zip
>>>
>>>
>>> We need people to test on this.  So download, run
>>>  perl Makefile.PL
>>>  make
>>>  make test
>>>
>>> Let us know what breaks.  I've tested on OS X and few different linux
>>> installs with different auxiliary modules installed.  Would be nice  
>>> to
>>> have a few more combinations of OS, perl versions, and suite of  
>>> modules
>>> installed before we make a release.
>>>
>>> Thanks for your help.
>>> -jason
>>> --
>>> Jason Stajich
>>> jason.stajich at duke.edu
>>> http://www.duke.edu/~jes12/
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>> ============================================
>>
>> Eric Just
>> e-just@northwestern.edu
>> dictyBase Programmer
>> Center for Genetic Medicine
>> Northwestern University
>> http://dictybase.org
>>
>> ============================================
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> ---
> avast! Antivirus: Outbound message clean.
> Virus Database (VPS): 0502-2, 11/01/2005
> Tested on: 13/01/2005 19:46:20
> avast! is copyright (c) 2000-2003 ALWIL Software.
> http://www.avast.com
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Aaron J. Mackey, Ph.D.
Dept. of Biology, Goddard 212
University of Pennsylvania       email:  amackey@pcbi.upenn.edu
415 S. University Avenue         office: 215-898-1205
Philadelphia, PA  19104-6017     fax:    215-746-6697

From davidg at lsi.upc.edu  Thu Jan 13 17:21:26 2005
From: davidg at lsi.upc.edu (=?iso-8859-1?Q?David_Garc=EDa_Cort=E9s?=)
Date: Thu Jan 13 17:19:02 2005
Subject: [Bioperl-l] Problems parsing PSI-BLAST results
Message-ID: <003d01c4f9be$56287150$30b01950@maxpower>

Hello.

I'm trying to parse the hits of all the iterations in a PSI-BLAST result 
(which I have in a variable, not in a file), but I can't  make it work.
It gives me the following error:

Use of uninitialized value in index at 
/usr/opt/perl5/lib/site_perl/5.6.0/Bio/Tools/BPlite/Sbjct.pm line 271, 
 line 54.
Use of uninitialized value in length at 
/usr/opt/perl5/lib/site_perl/5.6.0/Bio/Tools/BPlite/Sbjct.pm line 272, 
 line 54.
Use of uninitialized value in join or string at 
/usr/opt/perl5/lib/site_perl/5.6.0/Bio/Tools/BPlite/Sbjct.pm line 283, 
 line 54.
Use of uninitialized value in join or string at 
/usr/opt/perl5/lib/site_perl/5.6.0/Bio/Tools/BPlite/Sbjct.pm line 284, 
 line 54.
Use of uninitialized value in numeric gt (>) at 
/usr/opt/perl5/lib/site_perl/5.6.0/Bio/Tools/BPlite/HSP.pm line 185,  
line 54.
Use of uninitialized value in numeric gt (>) at 
/usr/opt/perl5/lib/site_perl/5.6.0/Bio/Tools/BPlite/HSP.pm line 197,  
line 54.

-------------------- WARNING ---------------------
MSG: Possible error (2) while parsing BLAST report!
---------------------------------------------------
Use of uninitialized value in substitution (s///) at 
/usr/opt/perl5/lib/site_perl/5.6.0/Bio/Tools/BPlite/Iteration.pm line 207, 
 line 54.
Use of uninitialized value in substitution (s///) at 
/usr/opt/perl5/lib/site_perl/5.6.0/Bio/Tools/BPlite/Iteration.pm line 208, 
 line 54.
Use of uninitialized value in substitution (s///) at 
/usr/opt/perl5/lib/site_perl/5.6.0/Bio/Tools/BPlite/Iteration.pm line 209, 
 line 54.
Use of uninitialized value in pattern match (m//) at 
/usr/opt/perl5/lib/site_perl/5.6.0/Bio/Tools/BPlite/Iteration.pm line 211, 
 line 54.


I don't know why it doesn't work... I looked at the bioperl API and there is 
a method "nextSbjct()" for Bio::Tools::BPlite::Iteration !
This is the part of the code:

 my $seqsfich  = Bio::SeqIO->new(-file=>"$proteasa"
 , '-format' => 'Fasta');

 # blast parameters
 my @pars = (
 'database' => "nr"
 , 'j' => '2'
 );

 my $factory = Bio::Tools::Run::StandAloneBlast->new(@pars);

while (my $seq = $seqsfich->next_seq()) {
    my $report = $factory->blastpgp($seq);
    my $max_iter = $report->number_of_iterations;
    my $iter = $report->round($max_iter);

     while (my $sbjct = $iter->nextSbjct()){
          while (my $hsp = $sbjct->nextHSP()){
                       printf("%-70s   %s\n", substr($hsp->hit->seqname, 0, 
70), $hsp->score);
          }
     }
}


The error must be in this part:

  while (my $sbjct = $iter->nextSbjct()){
          while (my $hsp = $sbjct->nextHSP()){
                       printf("%-70s   %s\n", substr($hsp->hit->seqname, 0, 
70), $hsp->score);
          }
     }

because when I remove it from the code, the errors don't appear.
What am I doing wrong?

Thank you very much.




From golharam at umdnj.edu  Thu Jan 13 17:59:15 2005
From: golharam at umdnj.edu (Ryan Golhar)
Date: Thu Jan 13 17:53:37 2005
Subject: [Bioperl-l] Losing STDOUT
Message-ID: <001c01c4f9c3$81546ff0$bb028a0a@GOLHARMOBILE1>

If I open a file using Bio::SearchIO, I'm unable to redirect STDOUT
anymore:

$searchio = new Bio::SearchIO (-format=>'blast', -file=>"myfile.blast");
$hit = $searchio->next_result->next_hit;
print "name", $hit->name, "\n";

This works from the shell.  If you put this in a script and redirect the
output, you don't get anything.  I'm wondering if SearchIO does
something with STDOUT?


-----
Ryan Golhar
Computational Biologist
The Informatics Institute at
The University of Medicine & Dentistry of NJ

Phone: 973-972-5034
Fax: 973-972-7412
Email: golharam@umdnj.edu

From rob at salmonella.org  Thu Jan 13 18:35:40 2005
From: rob at salmonella.org (Rob Edwards)
Date: Thu Jan 13 18:32:06 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu>
References: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu>
Message-ID: 

Jason,

Thanks for herding us and release 1.5 together, and also (a little 
belatedly) thanks for the vision of Bioperl in 2005 you sent out a 
couple of weeks ago.

We definitely all be lost without you.

On my Mac OSX machine I get

All tests successful, 114 subtests skipped.
Files=193, Tests=8956, 782 wallclock secs (352.60 cusr + 46.69 csys = 
399.29 CPU)

% uname -a
Darwin Robs-Computer.local 7.7.0 Darwin Kernel Version 7.7.0: Sun Nov  
7 16:06:51 PST 2004; root:xnu/xnu-517.9.5.obj~1/RELEASE_PPC  Power 
Macintosh powerpc

% perl -v

This is perl, v5.8.1-RC3 built for darwin-thread-multi-2level
(with 1 registered patch, see perl -V for more detail)



Rob

From e-just at northwestern.edu  Thu Jan 13 19:54:15 2005
From: e-just at northwestern.edu (Eric Just)
Date: Thu Jan 13 19:50:54 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: 
References: <6.1.1.1.2.20050113105812.05ea36a8@hecky.it.northwestern.edu>
	
Message-ID: <6.1.1.1.2.20050113185146.06145758@hecky.it.northwestern.edu>

hey
It was the directory name.  I moved it up one directory and the tests 
worked.  My apologies.

Thanks for your responses.
Eric
At 01:49 PM 1/13/2005, Nathan Haigh wrote:
>I'm a little confused about what your results are from. Obviously the 
>failed test table was from an "nmake test".
>However, it appears that you then did a "perl t\flat.t" why? Was it to get 
>the full details of that particular test? If so you need
>to run:
>"perl -I. -w t\flat.t"
>The -I. ensures you use the bioperl modules from the bioperl-1.5.0-RC2 not 
>your installed version of bioperl (which may be 1.4 or
>from the cvs). Also make sure that you are running from a path that 
>contains no spaces - it appears as though you unpacked the
>contents of bioperl-1.5.0-RC2 into a folder called "New Folder", this path 
>contains a space, so may (and probably will) cause
>unexpected effects/results.
>
>Nathan
>
> > -----Original Message-----
> > From: bioperl-l-bounces@portal.open-bio.org 
> [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Eric Just
> > Sent: 13 January 2005 16:59
> > To: Jason Stajich; Bioperl list
> > Subject: Re: [Bioperl-l] bioperl-1.5.0 RC2
> >
> >
> >
> >
> > activestate 5.8, windows XP
> >
> > Failed Test  Stat Wstat Total Fail  Failed  List of Failed
> > 
> -------------------------------------------------------------------------------
> > t\Registry.t  255 65280    13   11  84.62%  8-13
> > t\flat.t        2   512    16   30 187.50%  2-16
> > 4 subtests skipped.
> > Failed 2/193 test scripts, 98.96% okay. 21/8916 subtests failed, 99.76% 
> okay.
> >
> > D:\tmp\New Folder\bioperl-1.5.0-RC2>perl t/flat.t
> > 1..16
> > ok 1
> >
> > ------------- EXCEPTION  -------------
> > MSG: flat file D:\tmp\New Folder\bioperl-1.5.0-RC2\tmpNew cannot be read:
> > No such file or directory
> > STACK Bio::DB::Flat::add_flat_file D:/Perl/site/lib/Bio/DB/Flat.pm:358
> > STACK Bio::DB::Flat::_path2fileno D:/Perl/site/lib/Bio/DB/Flat.pm:519
> > STACK Bio::DB::Flat::BDB::_index_file 
> D:/Perl/site/lib/Bio/DB/Flat/BDB.pm:228
> > STACK Bio::DB::Flat::BDB::build_index 
> D:/Perl/site/lib/Bio/DB/Flat/BDB.pm:218
> > STACK toplevel t/flat.t:70
> >
> > --------------------------------------
> >
> > D:\tmp\New Folder\bioperl-1.5.0-RC2>perl t/Registry.t
> > 1..13
> > ok 1
> > ok 2
> > ok 3 # DB_File or BerkeleyDB not found, skipping DB_File tests
> > ok 4 # DB_File or BerkeleyDB not found, skipping DB_File tests
> > This Perl doesn't implement function getpwuid(). Skipping...
> >
> > -------------------- WARNING ---------------------
> > MSG: Couldn't call new_from_registry on [Bio::DB::Flat]
> >
> > ------------- EXCEPTION  -------------
> > MSG: No flatfile fileid files in config - check the index has been made
> > correctly
> > STACK Bio::DB::Flat::BinarySearch::read_config_file
> > D:/Perl/site/lib/Bio/DB/Flat/BinarySearch.pm:1297
> > STACK Bio::DB::Flat::BinarySearch::new
> > D:/Perl/site/lib/Bio/DB/Flat/BinarySearch.pm:280
> > STACK Bio::DB::Flat::new D:/Perl/site/lib/Bio/DB/Flat.pm:181
> > STACK Bio::DB::Flat::new_from_registry D:/Perl/site/lib/Bio/DB/Flat.pm:256
> > STACK (eval) D:/Perl/site/lib/Bio/DB/Registry.pm:184
> > STACK Bio::DB::Registry::_load_registry 
> D:/Perl/site/lib/Bio/DB/Registry.pm:183
> > STACK Bio::DB::Registry::new D:/Perl/site/lib/Bio/DB/Registry.pm:96
> > STACK toplevel t/Registry.t:69
> >
> > --------------------------------------
> >
> > ---------------------------------------------------
> > ok 5
> > ok 6
> > ok 7
> > not ok 8
> > # Failed test 8 in t/Registry.t at line 77
> > Can't call method "seq" on an undefined value at t/Registry.t line 78.
> >
> > D:\tmp\New Folder\bioperl-1.5.0-RC2>
> >
> >
> > At 02:14 PM 1/12/2005, Jason Stajich wrote:
> > >In preparation for Bioperl 1.5.0 developer release I have put up Release
> > >Candidate 2.
> > >
> > >  http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.gz
> > >  http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.bz2
> > >  http://bioperl.org/DIST/bioperl-1.5.0-RC2.zip
> > >
> > >
> > >We need people to test on this.  So download, run
> > >  perl Makefile.PL
> > >  make
> > >  make test
> > >
> > >Let us know what breaks.  I've tested on OS X and few different linux
> > >installs with different auxiliary modules installed.  Would be nice to
> > >have a few more combinations of OS, perl versions, and suite of modules
> > >installed before we make a release.
> > >
> > >Thanks for your help.
> > >-jason
> > >--
> > >Jason Stajich
> > >jason.stajich at duke.edu
> > >http://www.duke.edu/~jes12/
> > >
> > >_______________________________________________
> > >Bioperl-l mailing list
> > >Bioperl-l@portal.open-bio.org
> > >http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> > ============================================
> >
> > Eric Just
> > e-just@northwestern.edu
> > dictyBase Programmer
> > Center for Genetic Medicine
> > Northwestern University
> > http://dictybase.org
> >
> > ============================================
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
>---
>avast! Antivirus: Outbound message clean.
>Virus Database (VPS): 0502-2, 11/01/2005
>Tested on: 13/01/2005 19:46:20
>avast! is copyright (c) 2000-2003 ALWIL Software.
>http://www.avast.com
>
>
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l

============================================

Eric Just
e-just@northwestern.edu
dictyBase Programmer
Center for Genetic Medicine
Northwestern University
http://dictybase.org

============================================ 

From jason.stajich at duke.edu  Thu Jan 13 23:12:12 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu Jan 13 23:09:17 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: <20050113035233.16569.qmail@web51602.mail.yahoo.com>
References: <20050113035233.16569.qmail@web51602.mail.yahoo.com>
Message-ID: <77987E0F-65E2-11D9-A1F6-000393C44276@duke.edu>

Thanks Razi.

Tests were succeeding for me with Graph 0.20105-1 - when I upgraded to  
Graph 0.52 it also worked. I am running perl 5.8.3 though so don't know  
what is the problem with compatibility.  Do you have the same problems  
with Graph 0.52?
Going to need someone who can reproduce the bug to debug and fix.

Since this is a developer release I am not going to hold out on this  
part too much.  We'll try and get it closed out, otherwise release  
which ship with some tests turned off.

I would like to do the release on this coming Monday if possible.

-jason
On Jan 12, 2005, at 10:52 PM, Razi Khaja wrote:

> I've tested RC2 on FreeBSD 5.3 running perl5.8.5 on i386.  This has  
> been
> tested with all prerequisite modules installed (including  
> Graph::Directed
> (J/JH/JHI/Graph-0.51.tar.gz)as perl output of 'perl Makefile.PL'.
>
> Attached is the output of make test (make_test.out.gz).
>
> Summary of make test included here:
> Failed 3/193 test scripts, 98.45% okay. 155/8956 subtests failed,  
> 98.27%
> okay.
> Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> ----------------------------------------------------------------------- 
> --------
> t/Ontology.t        255 65280    50  100 200.00%  1-50
> t/OntologyStore.t                 6    4  66.67%  3-6
> t/simpleGOparser.t  255 65280   101  202 200.00%  1-101
> 2 subtests skipped.
> *** Error code 25
>
> Stop in  
> /usr/home/bioperl/bioperl-1.5.0- 
> RC2._______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From taerwin at tpg.com.au  Fri Jan 14 00:32:14 2005
From: taerwin at tpg.com.au (Tim Erwin)
Date: Fri Jan 14 00:31:18 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
Message-ID: <1105680734.4274.59.camel@bacp4>

I am using Fedora Core 3 with perl, v5.8.5 built for i386-linux-thread-
multi. 


make test output:



t/DBCUTG.....................ok
        22/24 skipped: tests which require remote servers - set env
variable BIOPERLDEBUG to test
t/DBFasta....................ok
t/DNAMutation................ok
t/Domcut.....................ok
        22/25 skipped: tests which require remote servers - set env
variable BIOPERLDEBUG to test
t/ECnumber...................ok
t/ELM........................ok 9/14
-------------------- WARNING ---------------------
MSG: Bio::Tools::Analysis::Protein::ELM Request Error:
500 (Internal Server Error) Can't connect to elm.eu.org:80 (connect:
timeout)
Content-Type: text/plain
Client-Date: Fri, 14 Jan 2005 01:29:27 GMT
Client-Warning: Internal response

500 Can't connect to elm.eu.org:80 (connect: timeout)

---------------------------------------------------
t/ELM........................ok
t/ESEfinder..................error is 0
t/ESEfinder..................ok
        10/12 skipped: tests which require remote servers - set env
variable BIOPERLDEBUG to test
t/Genewise...................ok
        2/51 skipped:
t/GOR4.......................ok
        10/13 skipped: tests which require remote servers - set env
variable BIOPERLDEBUG to test
t/HNN........................ok
        10/13 skipped: tests which require remote servers - set env
variable BIOPERLDEBUG to test
t/MitoProt...................ok
        5/8 skipped: tests which require remote servers - set env
variable BIOPERLDEBUG to test
t/protgraph..................Class::AutoClass or Clone not installed.
This means that the module is not usable. Skipping tests at
t/protgraph.t line 23.
t/RefSeq.....................ok
        10/13 skipped: tests which require remote servers - set env
variable BIOPERLDEBUG to test
t/RemoteBlast................ok
        4/6 skipped: to avoid timeout
t/SeqIO......................XML::DOM::XPath not found - skipping
interpro tests
XML::SAX::Base or XML::SAX or XML::SAX::Writer not found - skipping
BSML_SAX tests
t/SeqIO......................ok
t/simpleGOparser.............ok 88/101Use of uninitialized value in hash
element at /home/te07/bioperl-1.5.0-
RC2/blib/lib/Bio/Ontology/OntologyStore.pm line 263,  line 11.
t/Sopma......................ok
        12/15 skipped: tests which require remote servers - set env
variable BIOPERLDEBUG to test
t/Taxonomy...................ok
        7/8 skipped: to avoid blocking
t/tutorial...................ok 18/21Use of uninitialized value in print
at /home/te07/bioperl-1.5.0-RC2/blib/lib/bptutorial.pl line 4039,
 line 934.

All tests successful, 114 subtests skipped.
Files=193, Tests=8942, 959 wallclock secs (108.35 cusr +  7.82 csys =
116.17 CPU)



From nathanhaigh at ukonline.co.uk  Fri Jan 14 02:40:25 2005
From: nathanhaigh at ukonline.co.uk (Nathan Haigh)
Date: Fri Jan 14 02:37:08 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: 
Message-ID: 

I ran the tests again and they pretty much completed without error! I say "pretty much" because perl sometimes crashes on a test
(which obviously results in that test failing), but running this/these tests separately using "perl -MBlib t\" resulted in
the test completing successfully.

Therefore, everything ok for me!

Nathan

> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Nathan Haigh
> Sent: 13 January 2005 08:56
> To: 'Jason Stajich'; 'Bioperl list'
> Subject: RE: [Bioperl-l] bioperl-1.5.0 RC2
> 
> ......
> t/GOR4.......................ok 3/13Can't call method "start" on an undefined value at t/GOR4.t line 80,  line 1.
> t/GOR4.......................dubious
>         Test returned status 76 (wstat 19456, 0x4c00)
> DIED. FAILED test 7
>         Failed 1/13 tests, 92.31% okay
> t/GOterm.....................ok
> .......
> t/HNN........................FAILED tests 7, 12
>         Failed 2/13 tests, 84.62% okay
> .......
> t/Sopma......................FAILED tests 7-8, 14
>         Failed 3/15 tests, 80.00% okay
> .......
> Failed Test Stat Wstat Total Fail  Failed  List of Failed
> -------------------------------------------------------------------------------
> t/GOR4.t      76 19456    13    1   7.69%  7
> t/HNN.t                   13    2  15.38%  7 12
> t/Sopma.t                 15    3  20.00%  7-8 14
> 2 subtests skipped.
> 
> ~~~~~~~
> WinXP Pro v5.1.2600 Service Pack 1 Build 2600
> ~~~~~~~~
> This is perl, v5.8.0 built for MSWin32-x86-multi-thread
> (with 1 registered patch, see perl -V for more detail)
> 
> Copyright 1987-2002, Larry Wall
> 
> Binary build 804 provided by ActiveState Corp. http://www.ActiveState.com
> Built 23:15:13 Dec  1 2002
> 
> If you need a hand working these problems out give me a shout!
> Nathan
> 
> 
> > -----Original Message-----
> > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Jason Stajich
> > Sent: 12 January 2005 20:14
> > To: Bioperl list
> > Subject: [Bioperl-l] bioperl-1.5.0 RC2
> >
> > In preparation for Bioperl 1.5.0 developer release I have put up
> > Release Candidate 2.
> >
> >   http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.gz
> >   http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.bz2
> >   http://bioperl.org/DIST/bioperl-1.5.0-RC2.zip
> >
> >
> > We need people to test on this.  So download, run
> >   perl Makefile.PL
> >   make
> >   make test
> >
> > Let us know what breaks.  I've tested on OS X and few different linux
> > installs with different auxiliary modules installed.  Would be nice to
> > have a few more combinations of OS, perl versions, and suite of modules
> > installed before we make a release.
> >
> > Thanks for your help.
> > -jason
> > --
> > Jason Stajich
> > jason.stajich at duke.edu
> > http://www.duke.edu/~jes12/
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > ---
> > avast! Antivirus: Inbound message clean.
> > Virus Database (VPS): 0502-2, 11/01/2005
> > Tested on: 12/01/2005 21:49:55
> > avast! is copyright (c) 2000-2003 ALWIL Software.
> > http://www.avast.com
> >
> >
> 
> ---
> avast! Antivirus: Outbound message clean.
> Virus Database (VPS): 0502-2, 11/01/2005
> Tested on: 13/01/2005 08:54:25
> avast! is copyright (c) 2000-2003 ALWIL Software.
> http://www.avast.com
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> ---
> avast! Antivirus: Inbound message clean.
> Virus Database (VPS): 0502-2, 11/01/2005
> Tested on: 13/01/2005 10:06:51
> avast! is copyright (c) 2000-2003 ALWIL Software.
> http://www.avast.com
> 
> 

---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0502-2, 11/01/2005
Tested on: 14/01/2005 07:36:48
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com




From razi at genet.sickkids.on.ca  Fri Jan 14 03:52:06 2005
From: razi at genet.sickkids.on.ca (Razi Khaja)
Date: Fri Jan 14 03:48:29 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: <77987E0F-65E2-11D9-A1F6-000393C44276@duke.edu>
Message-ID: <20050114085206.67889.qmail@web51601.mail.yahoo.com>

 --- Jason Stajich  wrote: 
> Thanks Razi.
> 
> Tests were succeeding for me with Graph 0.20105-1 - when I upgraded to  
> Graph 0.52 it also worked. I am running perl 5.8.3 though so don't know  
> what is the problem with compatibility.  Do you have the same problems  
> with Graph 0.52?

Yes, I upgraded to Graph 0.52, ... same errors (installed by perl -MCPAN
-eshell).
I downgraded to Graph 0.20105 and no problems (installed by making port
/usr/ports/math/p5-Graph on FreeBSD5.3)

This may only be a BSD/OSX problem ... might have to wait for Graph.pm 0.52
to get ported to BSD to run with the newest version of Graph.pm

Summary of 'make test' with BIOPERLDEBUG=1 (on FreeBSD5.3, perl5.8.5,
Graph.pm ver 0.20105):
...
t/Variation_IO...............ok
t/WABA.......................ok
t/XEMBL_DB...................ok
All tests successful, 2 subtests skipped.
Files=193, Tests=8956, 600 wallclock secs (310.09 cusr + 30.23 csys =
340.32 CPU)

> Going to need someone who can reproduce the bug to debug and fix.
> 
> Since this is a developer release I am not going to hold out on this  
> part too much.  We'll try and get it closed out, otherwise release  
> which ship with some tests turned off.
> 
> I would like to do the release on this coming Monday if possible.

Great!

> -jason


=====
/**
 * Razi Khaja, Bioinformatics Analyst
 * The Hospital for Sick Children, Toronto
 * The Centre for Applied Genomics, www.tcag.ca
 * Tel 416-813-7032, Fax 416-813-8319
 */
From letondal at pasteur.fr  Fri Jan 14 04:29:54 2005
From: letondal at pasteur.fr (Catherine Letondal)
Date: Fri Jan 14 04:23:47 2005
Subject: [Bioperl-l] Getting started with Bio::Perl
In-Reply-To: <41E6AEC5.2050302@york.ac.uk>
References: <41E6AEC5.2050302@york.ac.uk>
Message-ID: 


On Jan 13, 2005, at 6:24 PM, Kat Hull wrote:

> *Dear Users,
> I have a newbie question!  I am interested in the following module 
> 'Bio::Tools::Run::PiseApplication::codonw' but really don't
> know how to start to use it.  I have looked at the documentation etc 
> but am confused about how to pass my array of sequences to
> the module and then how to call the individual functions to perform 
> the calculations (e.g. gc, cai, fop...).
>
> Does anyone have a simple script showing how to run this module with 
> the input as an array of fasta format sequences?
> Many thanks,
>
> Kat
> *

Hi again,

As Jason already answered, first look at 
Bio::Tools::Run::PiseApplication where there is an example of use. You 
can also look at the examples/pise directory.
Regarding the parameters of a specific program (codonw), I suggest to 
first look at the interactive service 
(http://bioweb.pasteur.fr/seqanal/interfaces/codonw.html), for it's 
exactly the same that the one that is run through bioperl). You will 
get a better understanding of the available values and the output files 
of interest (which differ from on program to another).

--
Catherine Letondal -- Institut Pasteur

From jurgen.pletinckx at algonomics.com  Fri Jan 14 07:21:29 2005
From: jurgen.pletinckx at algonomics.com (Jurgen Pletinckx)
Date: Fri Jan 14 06:55:03 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu>
Message-ID: 

> perl -v

This is perl, v5.6.1 built for IP27-irix
...

> uname -a
IRIX64 deepskyblue 6.5 10181058 IP27

> make test

t/ESEfinder..................error is 0
t/ESEfinder..................ok
        10/12 skipped: tests which require remote servers - set env variable
BIOPERLDEBUG to test

t/Genewise...................ok
        2/51 skipped:

t/RestrictionIO..............FAILED test 10
        Failed 1/14 tests, 92.86% okay

t/SeqIO......................XML::DOM::XPath not found - skipping interpro tests
t/SeqIO......................ok

t/simpleGOparser.............ok 88/101Use of uninitialized value in hash element
at
/xlv2/users/jpletinc/00Perl/bioperl-1.5-rc2/bioperl-1.5.0-RC2/blib/lib/Bio/Ontol
ogy/OntologyStore.pm line 263,  line 11.
Use of uninitialized value in hash element at
/xlv2/users/jpletinc/00Perl/bioperl-1.5-rc2/bioperl-1.5.0-RC2/blib/lib/Bio/Ontol
ogy/OntologyStore.pm line 263,  line 11.

t/simpleGOparser.............ok
t/tutorial...................ok 18/21Use of uninitialized value in print at
/xlv2/users/jpletinc/00Perl/bioperl-1.5-rc2/bioperl-1.5.0-RC2/blib/lib/bptutoria
l.pl line 4039,  line 934.
t/tutorial...................ok

t/XEMBL_DB...................SOAP::Lite and/or XML::DOM not installed. This
means that Bio::DB::XEMBL module is not usable. Skipping tests.
t/XEMBL_DB...................ok

Failed Test       Stat Wstat Total Fail  Failed  List of Failed
-------------------------------------------------------------------------------
t/RestrictionIO.t               14    1   7.14%  10
114 subtests skipped.
Failed 1/193 test scripts, 99.48% okay. 1/8956 subtests failed, 99.99% okay.
*** Error code 11 (bu21)


More specific:
> perl -w -Iblib/lib t/RestrictionIO.t
1..14
ok 1
ok 2
ok 3
ok 4
ok 5
ok 6
ok 7
ok 8
ok 9
not ok 10
# Test 10 got: '9' (t/RestrictionIO.t at line 53)
#    Expected: '11'
ok 11
ok 12
ok 13
ok 14

The 'error is 0' message with ESEfinder seems to indicate correct test
completion.


(Thanks, Jason!)

--
Jurgen Pletinckx
AlgoNomics NV

From danielucgbioinfo at yahoo.com.br  Fri Jan 14 07:30:14 2005
From: danielucgbioinfo at yahoo.com.br (Danielucg Sousa)
Date: Fri Jan 14 07:26:49 2005
Subject: [Bioperl-l] Method image_and_map
Message-ID: <20050114123014.21358.qmail@web53505.mail.yahoo.com>

HI,

I would like to use method image_and_map of classe
Bio::Graphics::Panel, but I have this menssage : Can't
locate object method "image_and_map" via package
"Bio::Graphics::Panel". 

I'm using Bioperl 1.4, and I saw this method in
Biopel-live and Bioperl 1.5 e not on Bioperl 1.4, but
where I get these version? Or how I do for use
image_and_map?

Thank,
Daniel 

Part of my code :
my $panel = Bio::Graphics::Panel->new(-length      =>
$seq->length,-width       => 1000,-pad_left    => 10,	
		      -pad_right   => 10,				      -key_color   =>
'white',				      -key_spacing => 15,				     
-key_style   => 'bottom',				      -spacing     =>
-0.25,				      -box_subparts => 'true'				      );y
($url,$map,$mapname) = $panel->image_and_map(-root =>
'/cgi-bin',-url => '/tmpimages', );
 
$panel->add_track($wholeseq,  -glyph  => 'arrow', 
-bump   => +1,  -double => 1,  -tick   => 2  );


__________________________________________________
Converse com seus amigos em tempo real com o Yahoo! Messenger 
http://br.download.yahoo.com/messenger/ 
From Marc.Logghe at devgen.com  Fri Jan 14 07:49:37 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Fri Jan 14 07:46:08 2005
Subject: [Bioperl-l] Method image_and_map
Message-ID: 

Hi,

> I would like to use method image_and_map of classe
> Bio::Graphics::Panel, but I have this menssage : Can't
> locate object method "image_and_map" via package
> "Bio::Graphics::Panel". 
> 
> I'm using Bioperl 1.4, and I saw this method in
> Biopel-live and Bioperl 1.5 e not on Bioperl 1.4, but

That is correct, it was introduced in Bio::Graphics::Panel revision 1.74. Meaning after bioperl-release-1-4-0 which contained 1.70.

> where I get these version? Or how I do for use
> image_and_map?

Bioperl 1.5 RC 2 can be downloaded from http://news.open-bio.org/archives/2005_01.html#000073

HTH,
Marc

From paulo.david at netvisao.pt  Fri Jan 14 08:00:54 2005
From: paulo.david at netvisao.pt (Paulo Almeida)
Date: Fri Jan 14 08:08:41 2005
Subject: [Bioperl-l] ProtDist with Phylip 3.6
Message-ID: <10517.193.137.94.3.1105707654.squirrel@193.137.94.3>

I can't get ProtDist running, with Bioperl-run 1.4 and Phylip 3.6. I tried
setting PHYLIPVERSION = 3.6 , because I saw that on the mailing list, but
it didn't work. The test (perl -I. -w t/ProtDist.t) returns "Protdist
program not found". The phylip executable is at /usr/bin . Should there be
a protdist executable too? I don't have that.

-Paulo Almeida

From senger at ebi.ac.uk  Fri Jan 14 08:42:58 2005
From: senger at ebi.ac.uk (Martin Senger)
Date: Fri Jan 14 08:39:25 2005
Subject: [Bioperl-l] Re: Bio::Biblio
In-Reply-To: <200501061710.j06H8RKu023694@portal.open-bio.org>
Message-ID: 

> Since two weeks before Christmas, I have a problem to fetch Articles.
>
   I am sorry about delayed reply - but I was away and at once when I left 
the disk crashed here :-(.
   The service is now back and running... but...
   ...for some citations there may be some errors caused by the bug in the
underlying conversion between html and xml. In other words, some returned
XML may not be valid (because some characters there are not properly
escaped). I will fix it (and let you know) as soon as I get response from
our SRS team.  Again, sorry for the delay...

   With regards,
   Martin



From nathanhaigh at ukonline.co.uk  Fri Jan 14 10:24:22 2005
From: nathanhaigh at ukonline.co.uk (Nathan Haigh)
Date: Fri Jan 14 10:20:58 2005
Subject: [Bioperl-l] ProtDist with Phylip 3.6
In-Reply-To: <10517.193.137.94.3.1105707654.squirrel@193.137.94.3>
Message-ID: 

Phylip is a suite of programs which include around 35 executables, things such as:
Consense
Contml
Drawgram
Drawtree
Neighbor
Proml
Protdist
Protpars
Treedist

Maybe the downloaded file wasn't extracted??? I can't think why else you might have a file called phylip?
Nathan

> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Paulo Almeida
> Sent: 14 January 2005 13:01
> To: bioperl-l@portal.open-bio.org
> Subject: [Bioperl-l] ProtDist with Phylip 3.6
> 
> I can't get ProtDist running, with Bioperl-run 1.4 and Phylip 3.6. I tried
> setting PHYLIPVERSION = 3.6 , because I saw that on the mailing list, but
> it didn't work. The test (perl -I. -w t/ProtDist.t) returns "Protdist
> program not found". The phylip executable is at /usr/bin . Should there be
> a protdist executable too? I don't have that.
> 
> -Paulo Almeida
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0502-3, 14/01/2005
Tested on: 14/01/2005 15:23:44
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com




From paulo.david at netvisao.pt  Fri Jan 14 10:40:08 2005
From: paulo.david at netvisao.pt (Paulo Almeida)
Date: Fri Jan 14 12:21:32 2005
Subject: [Bioperl-l] ProtDist with Phylip 3.6
In-Reply-To: 
References: <10517.193.137.94.3.1105707654.squirrel@193.137.94.3>
	
Message-ID: <32086.193.137.94.3.1105717208.squirrel@193.137.94.3>

Thanks, I just found the protdist executable, and the script works fine. I
installed phylip with a packaging program, and it put all the other
executables in /usr/lib/phylip ...

-Paulo


> Phylip is a suite of programs which include around 35 executables, things
> such as:
> Consense
> Contml
> Drawgram
> Drawtree
> Neighbor
> Proml
> Protdist
> Protpars
> Treedist
>
> Maybe the downloaded file wasn't extracted??? I can't think why else you
> might have a file called phylip?
> Nathan

From talcon at iastate.edu  Fri Jan 14 13:30:38 2005
From: talcon at iastate.edu (Tim Alcon)
Date: Fri Jan 14 13:27:04 2005
Subject: [Bioperl-l] t/data for bptutorial for windows
Message-ID: <41E80FCE.8000900@iastate.edu>

I downloaded ActivePerl 5.6.1 on Windows XP and used ppm to install 
bioperl and bundle-bioperl.  When I tried to run bptutorial.pl, it 
complained about not having io::string, which I then installed.  My 
current problem is that now when I try to run bptutorial.pl, it 
complains about not finding stuff in folder t/data.  I checked the 
directory and tried reinstalling bioperl, but there's still no folder 
called t/data, which apparently contains data necessary for the examples 
in bptutorial.pl.  If anyone could help me with this, I'd appreciate 
it.  Thanks.

Tim
From gyang at plantbio.uga.edu  Fri Jan 14 14:01:29 2005
From: gyang at plantbio.uga.edu (Guojun Yang)
Date: Fri Jan 14 13:57:47 2005
Subject: [Bioperl-l] regular expression help?
In-Reply-To: <41E80FCE.8000900@iastate.edu>
Message-ID: <20050114140129.68047175@dogwood.plantbio.uga.edu>

Hi, Everybody,
I was trying to use a regex recognizing a patter of inverted repeat DNA seq flanked by direct repeats (see below), it returns errors saying "(?{...}) not terminated or {...} not balanced. Can anybody help me sorting this out?
The regex I have is:
$regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})\1.*/i;
Thank you,
Yang


From gyang at plantbio.uga.edu  Fri Jan 14 14:12:46 2005
From: gyang at plantbio.uga.edu (Guojun Yang)
Date: Fri Jan 14 14:09:28 2005
Subject: [Bioperl-l] regular expression help!
Message-ID: <20050114141246.94c7cb46@dogwood.plantbio.uga.edu>

Hi, Everybody,
I was trying to use a regex recognizing a patter of inverted repeat DNA seq flanked by direct repeats (see below), it returns errors saying "(?{...}) not terminated or {...} not balanced. Can anybody help me sorting this out?
The regex I have is:
$regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})\1.*/i;
Thank you,
Yang


From brian_osborne at cognia.com  Fri Jan 14 14:33:51 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Fri Jan 14 14:30:59 2005
Subject: [Bioperl-l] t/data for bptutorial for windows
In-Reply-To: <41E80FCE.8000900@iastate.edu>
Message-ID: 

Tim,

I'm not sure where that t/data directory ends up when you use ppm to install
Bioperl but it's in there somewhere. You'll need to find it and execute
bptutorial.pl fro within the directory containing t/. A bit awkward, yes.

Brian O.

-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Tim Alcon
Sent: Friday, January 14, 2005 1:31 PM
To: bioperl-l@portal.open-bio.org
Subject: [Bioperl-l] t/data for bptutorial for windows


I downloaded ActivePerl 5.6.1 on Windows XP and used ppm to install
bioperl and bundle-bioperl.  When I tried to run bptutorial.pl, it
complained about not having io::string, which I then installed.  My
current problem is that now when I try to run bptutorial.pl, it
complains about not finding stuff in folder t/data.  I checked the
directory and tried reinstalling bioperl, but there's still no folder
called t/data, which apparently contains data necessary for the examples
in bptutorial.pl.  If anyone could help me with this, I'd appreciate
it.  Thanks.

Tim
_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l



From fernan at iib.unsam.edu.ar  Fri Jan 14 14:40:22 2005
From: fernan at iib.unsam.edu.ar (Fernan Aguero)
Date: Fri Jan 14 14:37:25 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu>
References: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu>
Message-ID: <20050114194022.GC55770@iib.unsam.edu.ar>

+----[ Jason Stajich  (12.Jan.2005 17:25):
|
| We need people to test on this.  So download, run
|  perl Makefile.PL
|  make
|  make test
| 
| Let us know what breaks. 
|
+----]

This is on FreeBSD-4.10p5 (RELENG_4_10), i386, perl v5.8.5.

A summary of the failed tests follows (anyone knows why I'm
getting percentages over 100%?)

A complete log is available at 
http://genoma.unsam.edu.ar/~fernan/freebsd/bioperl-1.5.0-RC2.tests.gz

And the list of perl modules installed in my box and their
versions is here:
http://genoma.unsam.edu.ar/~fernan/freebsd/p5-ports.txt

If some tests can be fixed just by updating perl modules,
let me know. Perl modules in FreeBSD are installed from the
ports system, so when updating the bioperl port to 1.5 we
should also make sure to update dependencies as needed.
The versions of my installed perl modules are the latest
available in the ports tree (of course there could be newer
versions in CPAN that are not yet in the FreeBSD ports tree)

Fernan


Failed Test           Stat Wstat Total Fail  Failed  List of Failed
-------------------------------------------------------------------------------
t/AlignIO.t            255 65280   152  230 151.32%  10-11
39-152
t/AlignUtil.t                       16   14  87.50%  2-15
t/AnnotationAdaptor.t  255 65280    19   36 189.47%  2-19
t/CodonTable.t         255 65280    44    6  13.64%  42-44
t/LocationFactory.t                179    1   0.56%  64
t/OntologyStore.t                    6    1  16.67%  6
t/PAML.t               255 65280   142    0   0.00%  ??
t/PopGen.t             255 65280    85  170 200.00%  1-85
t/ProtPsm.t            255 65280     5    6 120.00%  3-5
t/Registry.t           255 65280    13   11  84.62%  8-13
t/SearchIO.t           255 65280  1216   17   1.40%  170 211 264-265 469 580
                                                     582 584 598 600 643 685-
						     686 713-714 1216
t/SeqFeature.t                     192    2   1.04%  76 81
t/SeqIO.t              255 65280   345  562 162.90%  65-345
t/Species.t            255 65280    21   22 104.76%  11-21
t/StandAloneBlast.t                 18    1   5.56%  3
t/Tree.t               255 65280    26   47 180.77%  3-26
t/TreeBuild.t          255 65280     7   14 200.00%  1-7
t/TreeIO.t             255 65280    50   43  86.00%  28-50
t/Unflattener2.t                    11    2  18.18%  7 10
t/UniGene.t                         63    1   1.59%  12
t/game.t                            23    1   4.35%  9
t/hmmer.t                          136   14  10.29%  8 13 125-136
t/primaryqual.t        255 65280    32    8  25.00%  29-32
t/psm.t                255 65280    48   78 162.50%  10-48
t/qual.t               255 65280    12    0   0.00%  ??
t/simpleGOparser.t                 101    5   4.95%  78-79 83-84 87
t/singlet.t                          3    2  66.67%  2-3
114 subtests skipped.
Failed 27/193 test scripts, 86.01% okay. 680/8956 subtests failed, 92.41% okay.
From danielucgbioinfo at yahoo.com.br  Fri Jan 14 15:22:24 2005
From: danielucgbioinfo at yahoo.com.br (Danielucg Sousa)
Date: Fri Jan 14 15:19:09 2005
Subject: [Bioperl-l] A error with Bio::Graphics::Pane, help me!!!!!
Message-ID: <20050114202224.63661.qmail@web53503.mail.yahoo.com>

HI,

I have done my code with image_and_map of
Bio::Graphics::Pane Classe. So, out this message:
> gd-png:  fatal libpng error: Image width or height
is zero in IHDR
> gd-png error: setjmp returns error condition

My code until this erro is:
#!/usr/bin/perl -wT

use strict;
use Bio::Graphics;
use Bio::SeqIO;
use Bio::SeqFeature::Generic;
#use CGI ':standard';
use CGI::Pretty;

my $file = '/var/www/cgi-bin/AL391145.gb';
my $io = Bio::SeqIO->new(-file=>$file);
my $seq = $io->next_seq;
#my $wholeseq =
Bio::SeqFeature::Generic->new(-start=>1,-end=>$seq->length);
my @features = $seq->all_SeqFeatures;
 my $q = new CGI;

# sort features by their primary tags
my %sorted_features;
for my $f (@features) {
  my $tag = $f->primary_tag;
  push @{$sorted_features{$tag}},$f;
}

print $q->header('text/html');
print $q->start_html('A Vector Rendering ');

my $panel =
Bio::Graphics::Panel->new(-length=>$seq->length,
-width       => 1000,				      -pad_left    => 10,				
     -pad_right   => 10,				      -key_color   =>
'white',				-key_spacing => 15,				      -key_style  
=> 'bottom',			      -spacing     => -0.25,				     
-box_subparts => 'true'				      );
my ($url,$map,$mapname) = $panel->image_and_map(-root
=> '/home/bioinfo/cgi-bin',-url => '/tmpimages', );

----
I don't know what I have to do. Please somebody help
me.

By,
Daniel



	
	
		
_______________________________________________________ 
Yahoo! Acesso Gr?tis - Instale o discador do Yahoo! agora. http://br.acesso.yahoo.com/ - Internet r?pida e gr?tis
From danielucgbioinfo at yahoo.com.br  Fri Jan 14 15:25:58 2005
From: danielucgbioinfo at yahoo.com.br (Danielucg Sousa)
Date: Fri Jan 14 15:22:09 2005
Subject: [Bioperl-l] gd-png: fatal libpng error: Image width or height is
	zero in IHDR
Message-ID: <20050114202558.33598.qmail@web53509.mail.yahoo.com>

HI,

I have done my code with image_and_map of
Bio::Graphics::Pane Classe. So, out this message:
> gd-png:  fatal libpng error: Image width or height
is zero in IHDR
> gd-png error: setjmp returns error condition

My code until this erro is:
#!/usr/bin/perl -wT

use strict;
use Bio::Graphics;
use Bio::SeqIO;
use Bio::SeqFeature::Generic;
#use CGI ':standard';
use CGI::Pretty;

my $file = '/var/www/cgi-bin/AL391145.gb';
my $io = Bio::SeqIO->new(-file=>$file);
my $seq = $io->next_seq;
#my $wholeseq =
Bio::SeqFeature::Generic->new(-start=>1,-end=>$seq->length);
my @features = $seq->all_SeqFeatures;
 my $q = new CGI;

# sort features by their primary tags
my %sorted_features;
for my $f (@features) {
  my $tag = $f->primary_tag;
  push @{$sorted_features{$tag}},$f;
}

print $q->header('text/html');
print $q->start_html('A Vector Rendering ');

my $panel =
Bio::Graphics::Panel->new(-length=>$seq->length,
-width       => 1000,				      -pad_left    => 10,				
     -pad_right   => 10,				      -key_color   =>
'white',				-key_spacing => 15,				      -key_style  
=> 'bottom',			      -spacing     => -0.25,				     
-box_subparts => 'true'				      );
my ($url,$map,$mapname) = $panel->image_and_map(-root
=> '/home/bioinfo/cgi-bin',-url => '/tmpimages', );

----
I don't know what I have to do. Please somebody help
me.

By,
Daniel


__________________________________________________
Converse com seus amigos em tempo real com o Yahoo! Messenger 
http://br.download.yahoo.com/messenger/ 
From paulo.david at netvisao.pt  Fri Jan 14 14:53:59 2005
From: paulo.david at netvisao.pt (Paulo Almeida)
Date: Fri Jan 14 18:57:31 2005
Subject: [Bioperl-l] regular expression help?
In-Reply-To: <20050114140129.68047175@dogwood.plantbio.uga.edu>
References: <41E80FCE.8000900@iastate.edu>
	<20050114140129.68047175@dogwood.plantbio.uga.edu>
Message-ID: <15455.193.137.94.3.1105732439.squirrel@193.137.94.3>

Hi,

There is something I don't understand in your expression (actually, there
is a lot, but the rest I can't even comment on). If you have
\S+(\S+)(\S{10}) , isn't the first \S+ going to match through the whole
string?

-Paulo

> Hi, Everybody,
> I was trying to use a regex recognizing a patter of inverted repeat DNA
> seq flanked by direct repeats (see below), it returns errors saying
> "(?{...}) not terminated or {...} not balanced. Can anybody help me
> sorting this out?
> The regex I have is:
> $regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~
> tr/ATCG/TAGC/i);})\1.*/i;
> Thank you,
> Yang

From Marc.Logghe at devgen.com  Sat Jan 15 03:34:10 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Sat Jan 15 03:35:43 2005
Subject: [Bioperl-l] regular expression help!
Message-ID: 

Hi,
In the part (??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);}) I have my doubts about the double question marks.
In case you don't want capturing braces and you want to execute code, I think it should look like:
(?:?{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})
In case you want to capture, then there is one ? too many because the syntax is '?{ code }' and not '??{ code }' for executing code in a regex.
HTH,
Marc 


-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org on behalf of Guojun Yang
Sent: Fri 1/14/2005 8:12 PM
To: bioperl-l@portal.open-bio.org
Subject: [Bioperl-l] regular expression help!
 
Hi, Everybody,
I was trying to use a regex recognizing a patter of inverted repeat DNA seq flanked by direct repeats (see below), it returns errors saying "(?{...}) not terminated or {...} not balanced. Can anybody help me sorting this out?
The regex I have is:
$regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})\1.*/i;
Thank you,
Yang


_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l




From Jan.Aerts at wur.nl  Sat Jan 15 04:24:39 2005
From: Jan.Aerts at wur.nl (Aerts, Jan)
Date: Sat Jan 15 04:21:18 2005
Subject: [Bioperl-l] regular expression help!
Message-ID: <7D030487F1A3D143A76F2A1E91F570350186DB63@scomp0010>

Without taking the time to look at the actual expression (sorry): a nice aid for developing more complicated regexes is the Regex Coach (http://www.weitz.de/regex-coach/). It allows you to experiment with regex and shows you the result interactively.

Good luck,
Jan Aerts


-----Original Message-----
From:	bioperl-l-bounces@portal.open-bio.org on behalf of Marc Logghe
Sent:	Sat 15-Jan-05 09:34
To:	Guojun Yang; bioperl-l@portal.open-bio.org
Cc:	
Subject:	RE: [Bioperl-l] regular expression help!
Hi,
In the part (??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);}) I have my doubts about the double question marks.
In case you don't want capturing braces and you want to execute code, I think it should look like:
(?:?{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})
In case you want to capture, then there is one ? too many because the syntax is '?{ code }' and not '??{ code }' for executing code in a regex.
HTH,
Marc 


-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org on behalf of Guojun Yang
Sent: Fri 1/14/2005 8:12 PM
To: bioperl-l@portal.open-bio.org
Subject: [Bioperl-l] regular expression help!
 
Hi, Everybody,
I was trying to use a regex recognizing a patter of inverted repeat DNA seq flanked by direct repeats (see below), it returns errors saying "(?{...}) not terminated or {...} not balanced. Can anybody help me sorting this out?
The regex I have is:
$regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})\1.*/i;
Thank you,
Yang


_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l




_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l





From Marc.Logghe at devgen.com  Sat Jan 15 06:31:59 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Sat Jan 15 06:29:41 2005
Subject: [Bioperl-l] regular expression help!
Message-ID: 

Hi Jan !
Nice goody indeed.
But I am afraid that the extended regular expression feature ${code} is not supported by regex-coach.
Maybe this had to do with the fact that this regex feature seems to be highly experimental and might be changed or even deleted in future Perl versions.
Cheers,
Marc


-----Oorspronkelijk bericht-----
Van: bioperl-l-bounces@portal.open-bio.org namens Aerts, Jan
Verzonden: za 15-1-2005 10:24
Aan: Guojun Yang; bioperl-l@portal.open-bio.org
Onderwerp: RE: [Bioperl-l] regular expression help!
 
Without taking the time to look at the actual expression (sorry): a nice aid for developing more complicated regexes is the Regex Coach (http://www.weitz.de/regex-coach/). It allows you to experiment with regex and shows you the result interactively.

Good luck,
Jan Aerts


-----Original Message-----
From:	bioperl-l-bounces@portal.open-bio.org on behalf of Marc Logghe
Sent:	Sat 15-Jan-05 09:34
To:	Guojun Yang; bioperl-l@portal.open-bio.org
Cc:	
Subject:	RE: [Bioperl-l] regular expression help!
Hi,
In the part (??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);}) I have my doubts about the double question marks.
In case you don't want capturing braces and you want to execute code, I think it should look like:
(?:?{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})
In case you want to capture, then there is one ? too many because the syntax is '?{ code }' and not '??{ code }' for executing code in a regex.
HTH,
Marc 


-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org on behalf of Guojun Yang
Sent: Fri 1/14/2005 8:12 PM
To: bioperl-l@portal.open-bio.org
Subject: [Bioperl-l] regular expression help!
 
Hi, Everybody,
I was trying to use a regex recognizing a patter of inverted repeat DNA seq flanked by direct repeats (see below), it returns errors saying "(?{...}) not terminated or {...} not balanced. Can anybody help me sorting this out?
The regex I have is:
$regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})\1.*/i;
Thank you,
Yang


_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l




_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l





_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l




From nathanhaigh at ukonline.co.uk  Sat Jan 15 09:03:13 2005
From: nathanhaigh at ukonline.co.uk (Nathan Haigh)
Date: Sat Jan 15 08:59:45 2005
Subject: [Bioperl-l] t/data for bptutorial for windows
In-Reply-To: 
Message-ID: 

Actually, t\data doesn't end up in the tar.gz file that is downloaded when using ppm to install bioperl. If you like I (or someone
else) could send you the data files and then you could proceed as per Brian's instructions.

Nathan  

> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Brian Osborne
> Sent: 14 January 2005 19:34
> To: Tim Alcon; bioperl-l@portal.open-bio.org
> Subject: RE: [Bioperl-l] t/data for bptutorial for windows
> 
> Tim,
> 
> I'm not sure where that t/data directory ends up when you use ppm to install
> Bioperl but it's in there somewhere. You'll need to find it and execute
> bptutorial.pl fro within the directory containing t/. A bit awkward, yes.
> 
> Brian O.
> 
> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org
> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Tim Alcon
> Sent: Friday, January 14, 2005 1:31 PM
> To: bioperl-l@portal.open-bio.org
> Subject: [Bioperl-l] t/data for bptutorial for windows
> 
> 
> I downloaded ActivePerl 5.6.1 on Windows XP and used ppm to install
> bioperl and bundle-bioperl.  When I tried to run bptutorial.pl, it
> complained about not having io::string, which I then installed.  My
> current problem is that now when I try to run bptutorial.pl, it
> complains about not finding stuff in folder t/data.  I checked the
> directory and tried reinstalling bioperl, but there's still no folder
> called t/data, which apparently contains data necessary for the examples
> in bptutorial.pl.  If anyone could help me with this, I'd appreciate
> it.  Thanks.
> 
> Tim
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0502-3, 14/01/2005
Tested on: 15/01/2005 13:59:02
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com




From Jan.Aerts at wur.nl  Sat Jan 15 09:17:28 2005
From: Jan.Aerts at wur.nl (Aerts, Jan)
Date: Sat Jan 15 09:14:05 2005
Subject: [Bioperl-l] regular expression help!
Message-ID: <7D030487F1A3D143A76F2A1E91F570350186DB64@scomp0010>

You're right... Should have looked at the actual expression.
Idea: is it possible in this case to call subroutines from within a regex and evaluating them using the 'e' switch?

j.



-----Original Message-----
From:	Marc Logghe [mailto:Marc.Logghe@devgen.com]
Sent:	Sat 15-Jan-05 12:31
To:	Aerts, Jan; Guojun Yang; bioperl-l@portal.open-bio.org
Cc:	
Subject:	RE: [Bioperl-l] regular expression help!
Hi Jan !
Nice goody indeed.
But I am afraid that the extended regular expression feature ${code} is not supported by regex-coach.
Maybe this had to do with the fact that this regex feature seems to be highly experimental and might be changed or even deleted in future Perl versions.
Cheers,
Marc


-----Oorspronkelijk bericht-----
Van: bioperl-l-bounces@portal.open-bio.org namens Aerts, Jan
Verzonden: za 15-1-2005 10:24
Aan: Guojun Yang; bioperl-l@portal.open-bio.org
Onderwerp: RE: [Bioperl-l] regular expression help!
 
Without taking the time to look at the actual expression (sorry): a nice aid for developing more complicated regexes is the Regex Coach (http://www.weitz.de/regex-coach/). It allows you to experiment with regex and shows you the result interactively.

Good luck,
Jan Aerts


-----Original Message-----
From:	bioperl-l-bounces@portal.open-bio.org on behalf of Marc Logghe
Sent:	Sat 15-Jan-05 09:34
To:	Guojun Yang; bioperl-l@portal.open-bio.org
Cc:	
Subject:	RE: [Bioperl-l] regular expression help!
Hi,
In the part (??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);}) I have my doubts about the double question marks.
In case you don't want capturing braces and you want to execute code, I think it should look like:
(?:?{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})
In case you want to capture, then there is one ? too many because the syntax is '?{ code }' and not '??{ code }' for executing code in a regex.
HTH,
Marc 


-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org on behalf of Guojun Yang
Sent: Fri 1/14/2005 8:12 PM
To: bioperl-l@portal.open-bio.org
Subject: [Bioperl-l] regular expression help!
 
Hi, Everybody,
I was trying to use a regex recognizing a patter of inverted repeat DNA seq flanked by direct repeats (see below), it returns errors saying "(?{...}) not terminated or {...} not balanced. Can anybody help me sorting this out?
The regex I have is:
$regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})\1.*/i;
Thank you,
Yang


_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l




_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l





_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l








From zayed.albertyn at gmail.com  Fri Jan 14 01:50:40 2005
From: zayed.albertyn at gmail.com (zayed albertyn)
Date: Sat Jan 15 10:11:22 2005
Subject: [Bioperl-l] Finding Alignment overlaps
Message-ID: <81da19f3050113225018d1c01a@mail.gmail.com>

Dear Bioperl Community

I have output from an alignment program that produces coordinates with
reference to the query sequence e.g.

3665384,3665702-1770163,1770480
3665130,3665474-3695657,3696000
3665115,3665357-1770508,1770749

Each line represent ,-,

I know how to add each line as a sequence feature using
Bio::Seqfeature::Generic. Is there a bioperl class or associated
method that can be used for determing possible overlaps in these
alignments?
Eventually I would like to find all overlaps and merge them if possible.

Thanks for the help,
Zayed




-- 
-----------------------------------------------------------
Zayed Albertyn
From jason.stajich at duke.edu  Sat Jan 15 10:46:25 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sat Jan 15 10:42:40 2005
Subject: [Bioperl-l] Finding Alignment overlaps
In-Reply-To: <81da19f3050113225018d1c01a@mail.gmail.com>
References: <81da19f3050113225018d1c01a@mail.gmail.com>
Message-ID: <9D4002D6-670C-11D9-83B1-000393C44276@duke.edu>

Bio::SeqFeature::Collection lets you efficiently extract subsets of 
Features or Locations that overlap using Lincoln's binning algorithm 
that is in Bio::DB::GFF.  It is done storing data in a flatfile 
BerkeleyDB  B-Trees using the DB_File module.

-jason
On Jan 14, 2005, at 1:50 AM, zayed albertyn wrote:

> Dear Bioperl Community
>
> I have output from an alignment program that produces coordinates with
> reference to the query sequence e.g.
>
> 3665384,3665702-1770163,1770480
> 3665130,3665474-3695657,3696000
> 3665115,3665357-1770508,1770749
>
> Each line represent ,-,
>
> I know how to add each line as a sequence feature using
> Bio::Seqfeature::Generic. Is there a bioperl class or associated
> method that can be used for determing possible overlaps in these
> alignments?
> Eventually I would like to find all overlaps and merge them if 
> possible.
>
> Thanks for the help,
> Zayed
>
>
>
>
> -- 
> -----------------------------------------------------------
> Zayed Albertyn
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From sac at portal.open-bio.org  Sat Jan 15 12:50:25 2005
From: sac at portal.open-bio.org (Steve Chervitz)
Date: Sat Jan 15 16:55:57 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu>
Message-ID: 

I just committed a small fix to Bio::DB::NCBIHelper, to deal with
downloading gbwithparts records.

It now allows you to deal with some genbank nucleotide records that by
default don't contain all CDS features, such as L42023.

To force the Bio::DB object to get all the features, you can do the
following:

my $gb = new Bio::DB::GenBank;
$gb->request_format('gbwithparts');

Not sure if this is the best approach, but it seems reasonable.

Steve

> From: Jason Stajich 
> Date: Wed, 12 Jan 2005 15:14:17 -0500
> To: Bioperl list 
> Subject: [Bioperl-l] bioperl-1.5.0 RC2
> 
> In preparation for Bioperl 1.5.0 developer release I have put up
> Release Candidate 2.
> 
>   http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.gz
>   http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.bz2
>   http://bioperl.org/DIST/bioperl-1.5.0-RC2.zip
> 
> 
> We need people to test on this.  So download, run
>   perl Makefile.PL
>   make
>   make test
> 
> Let us know what breaks.  I've tested on OS X and few different linux
> installs with different auxiliary modules installed.  Would be nice to
> have a few more combinations of OS, perl versions, and suite of modules
> installed before we make a release.
> 
> Thanks for your help.
> -jason
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l


From perlguy at hotmail.com  Sat Jan 15 17:50:44 2005
From: perlguy at hotmail.com (Philip Parker)
Date: Sat Jan 15 17:48:39 2005
Subject: [Bioperl-l] regular expression help!
Message-ID: 

If posting a question regarding a questionable regexp, it would be a good 
idea if you'd include a small sample of the text you're running it on. That 
way one can get a better idea of what you're after and possibly help you 
create a more efficient regexp. Those first \S+ have me worried.

Philip Parker -  perlguy@hotmail.com


From rob at salmonella.org  Sat Jan 15 18:22:23 2005
From: rob at salmonella.org (Rob Edwards)
Date: Sat Jan 15 18:18:43 2005
Subject: [Bioperl-l] GFF3
Message-ID: <4FC537A9-674C-11D9-9C9B-000A959E1622@salmonella.org>

Because I need it for some things that I am doing, I have worked quite 
a bit on the GFF3 parser Bio::FeatureIO::gff. Several people have 
written this module, I have just made some cosmetic changes:

I have improved the validation processes that are applied as a gff3 
file is parsed, and the module should now validate essentially 
everything in the file except alignments. Validation is optional and is 
based on the specification described at : 
http://song.sourceforge.net/gff3.shtml

For clarification and edification I have created a couple of tables 
describing the module and the validation that is applied to GFF3 files, 
which you can see online: http://www.salmonella.org/bioperl/gff3.html

I also wrote a Bio::SeqIO::gff module. Since gff3 files can hold 
sequences, it seems that you'd want to be able to call the next_seq 
methods, and therefore SeqIO is more appropriate than FeatureIO for 
those aspects. Currently the SeqIO module uses the FeatureIO module for 
parsing the file, it just reorganizes things.

This provides two different interfaces for getting objects out of GFF3 
files:
	Bio::FeatureIO::gff will return Bio::SeqFeature::Annotated objects 
representing the annotations.
	Bio::SeqIO::gff will return Bio::Seq objects representing the 
sequences with all the annotations attached.

The other difference between the two is that the former passes out the 
objects as they are read, but the latter has to read the whole file to 
get the annotations and the sequences.

At the moment I focussed on reading GFF3 files.

I have not committed these to cvs yet, pending comments from others. I 
have some specific questions:
	Should I wait until after 1.5 is out?
	Is two separate modules really the right way to go about this?
	What about other GFF modules (like Bio::Tools::GFF)?
	Could someone give the modules a workout and let me know about bugs? I 
am sure there are many.

I have posted these modules online via anonymous ftp at 
ftp://ftp.salmonella.org/rob/bioperl/GFF_modules.tgz
Take a look and let me know what you do and don't like!

Rob

From george_titus6 at yahoo.com  Sun Jan 16 01:29:33 2005
From: george_titus6 at yahoo.com (george titus)
Date: Sun Jan 16 09:45:23 2005
Subject: [Bioperl-l] Drawing Chromosomes
Message-ID: <20050116062933.1054.qmail@web52210.mail.yahoo.com>

hai 
 please help me drawing ideograms , igot the data from ncbi ,which module shold i use?.i am confused with the data.help me .
 
 

		
---------------------------------
Do you Yahoo!?
 Yahoo! Mail - now with 250MB free storage. Learn more.
From lstein at cshl.edu  Sat Jan 15 22:14:56 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Sun Jan 16 09:45:55 2005
Subject: [Bioperl-l] A error with Bio::Graphics::Pane, help me!!!!!
In-Reply-To: <20050114202224.63661.qmail@web53503.mail.yahoo.com>
References: <20050114202224.63661.qmail@web53503.mail.yahoo.com>
Message-ID: <200501151914.56616.lstein@cshl.edu>

This happens when you try to create an image with height or width of 
zero.  Check to make sure that your sequence has positive length.

Lincoln

On Friday 14 January 2005 12:22 pm, Danielucg Sousa wrote:
> HI,
>
> I have done my code with image_and_map of
>
> Bio::Graphics::Pane Classe. So, out this message:
> > gd-png:  fatal libpng error: Image width or height
>
> is zero in IHDR
>
> > gd-png error: setjmp returns error condition
>
> My code until this erro is:
> #!/usr/bin/perl -wT
>
> use strict;
> use Bio::Graphics;
> use Bio::SeqIO;
> use Bio::SeqFeature::Generic;
> #use CGI ':standard';
> use CGI::Pretty;
>
> my $file = '/var/www/cgi-bin/AL391145.gb';
> my $io = Bio::SeqIO->new(-file=>$file);
> my $seq = $io->next_seq;
> #my $wholeseq =
> Bio::SeqFeature::Generic->new(-start=>1,-end=>$seq->length);
> my @features = $seq->all_SeqFeatures;
>  my $q = new CGI;
>
> # sort features by their primary tags
> my %sorted_features;
> for my $f (@features) {
>   my $tag = $f->primary_tag;
>   push @{$sorted_features{$tag}},$f;
> }
>
> print $q->header('text/html');
> print $q->start_html('A Vector Rendering ');
>
> my $panel =
> Bio::Graphics::Panel->new(-length=>$seq->length,
> -width       => 1000,				      -pad_left    => 10,
>      -pad_right   => 10,				      -key_color   =>
> 'white',				-key_spacing => 15,				      -key_style
> => 'bottom',			      -spacing     => -0.25,
> -box_subparts => 'true'				      );
> my ($url,$map,$mapname) = $panel->image_and_map(-root
> => '/home/bioinfo/cgi-bin',-url => '/tmpimages', );
>
> ----
> I don't know what I have to do. Please somebody help
> me.
>
> By,
> Daniel
>
>
>
>
>
>
> _______________________________________________________
> Yahoo! Acesso Gr?tis - Instale o discador do Yahoo! agora.
> http://br.acesso.yahoo.com/ - Internet r?pida e gr?tis
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724

NOTE: Please copy Sandra Michelsen  on
all emails regarding scheduling and other time-critical topics.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050115/9c85ade9/attachment-0001.bin
From robinxml at yahoo.com  Sun Jan 16 01:20:05 2005
From: robinxml at yahoo.com (Robin XML)
Date: Sun Jan 16 09:45:57 2005
Subject: [Bioperl-l] bioperl
Message-ID: <20050116062005.37768.qmail@web30103.mail.mud.yahoo.com>

Dear Sir,
I am a beginner in bioinformatics. I am being excited
by your fantastic biopel functions. But some questions
confuse me:
1.Is it possible to call bioperl functions by Java
under Windows? because I need a GUI and need Java to
handle XML template modification.
2. Is it correct that with Bio::DB::GenBank() and
Bio::SeqIO, I can get full GanBank data in XML format?
Is it means include the features part?


Thank you!!!!!!

Best regards,
Robin



	
		
__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - You care about security. So do we. 
http://promotions.yahoo.com/new_mail
From corenth at gmail.com  Sun Jan 16 09:53:55 2005
From: corenth at gmail.com (Willy West)
Date: Sun Jan 16 09:50:03 2005
Subject: [Bioperl-l] regular expression help!
In-Reply-To: <7D030487F1A3D143A76F2A1E91F570350186DB65@scomp0010>
References: <7D030487F1A3D143A76F2A1E91F570350186DB65@scomp0010>
Message-ID: <4f10f19405011606531737d90@mail.gmail.com>

oops- i'd forgotten to "reply to all" with this... i apologize.


On Sun, 16 Jan 2005 11:13:45 +0100, Aerts, Jan  wrote:
> The problem is (or I might miss something here), that he wants to _test_ a regex. It's not possible to write something like
> $_ =~ /(.*)(.*)foo(\2)(.*)/e
> I think...
> 
> jan.

now i'm trying to do this with the test regex and am not successful :(
  this is an interesting problem and i really would love to find a
way..

one solution would be to explode the whole thing in another
subroutine... but if it's
not  what you want, i'm not yet sure how to do it.

good challenge though.....

:)

> 
> 
> -----Original Message-----
> From:   Willy West [mailto:corenth@gmail.com]
> Sent:   Sun 16-Jan-05 00:09
> To:     Aerts, Jan
> Cc:
> Subject:        Re: [Bioperl-l] regular expression help!
> On Sat, 15 Jan 2005 15:17:28 +0100, Aerts, Jan  wrote:
> > You're right... Should have looked at the actual expression.
> > Idea: is it possible in this case to call subroutines from within a regex and evaluating them using the 'e' switch?
> 
> if i recall::
> 
> sub foo {
>            return 'hello genome';
> }
> 
> $data = "ih ho hum bababa";
> 
> $data =~ s/ih/foo/e; #one way to do it.
> 
> print "$data\n";
> 
> seems to work..


-- 
Willy
http://www.hackswell.com/corenth
From senger at ebi.ac.uk  Sun Jan 16 11:29:37 2005
From: senger at ebi.ac.uk (Martin Senger)
Date: Sun Jan 16 11:25:54 2005
Subject: [Bioperl-l] Re: Bio::Biblio
In-Reply-To: 
Message-ID: 

Hi again,

>    ...for some citations there may be some errors caused by the bug in the
> underlying conversion between html and xml. In other words, some returned
> XML may not be valid (because some characters there are not properly
> escaped). I will fix it (and let you know) as soon as I get response from
> our SRS team.
> 
   This was fixed over ths weekend. Now you should not get back bad XML
entries.

   With regards,
   Martin

-- 
Martin Senger

EMBL Outstation - Hinxton                Senger@EBI.ac.uk     
European Bioinformatics Institute        Phone: (+44) 1223 494636      
Wellcome Trust Genome Campus             (Switchboard:     494444)
Hinxton                                  Fax  : (+44) 1223 494468
Cambridge CB10 1SD
United Kingdom                           http://industry.ebi.ac.uk/~senger

From mlemieux at bioinfo.ca  Sun Jan 16 14:00:45 2005
From: mlemieux at bioinfo.ca (Madeleine Lemieux)
Date: Sun Jan 16 13:57:00 2005
Subject: [Bioperl-l] regular expression help!
Message-ID: 

I'm not sure if this is the sort of thing you mean:

#!/usr/bin/perl -w

my @test_strings = ("acgttgcaacgt", "acgtacgt", "acgttgca", "ata");

foreach my $seq (@test_strings) {
     # force case change here, if necessary
     $seq =~ /([acgt]+)(?=([acgt]+)\1)/;
     my $fwd = $1;
     (my $rev = $2) =~ tr/acgt/tgca/;
     if ($fwd eq $rev) {
         print $seq, ' ', $fwd, ' ', $2, "\n";
     }
}

HTH,
Madeleine

> Hi, Everybody,
> I was trying to use a regex recognizing a patter of inverted repeat 
> DNA seq flanked by direct repeats (see below), it returns errors 
> saying "(?{...}) not terminated or {...} not balanced. Can anybody 
> help me sorting this out?
> The regex I have is:
> $regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ 
> tr/ATCG/TAGC/i);})\1.*/i;
> Thank you,
> Yang
>
>

From senger at ebi.ac.uk  Mon Jan 17 06:42:24 2005
From: senger at ebi.ac.uk (Martin Senger)
Date: Mon Jan 17 06:38:41 2005
Subject: [Bioperl-l] Re: Bio::Biblio
In-Reply-To: 
Message-ID: 

Hi once again,

   There is one thing that is different from the previous citations
returned in XML format: now the returned XML starts with the full XML
declaration. Something like this:




   This probably does not break any of your code - but it may (e.g. when
your code adds the XML declarations on its own).

   With regards,
   Martin

-- 
Martin Senger

EMBL Outstation - Hinxton                Senger@EBI.ac.uk     
European Bioinformatics Institute        Phone: (+44) 1223 494636      
Wellcome Trust Genome Campus             (Switchboard:     494444)
Hinxton                                  Fax  : (+44) 1223 494468
Cambridge CB10 1SD
United Kingdom                           http://industry.ebi.ac.uk/~senger

From Peter.Robinson at t-online.de  Mon Jan 17 06:06:02 2005
From: Peter.Robinson at t-online.de (Peter Robinson)
Date: Mon Jan 17 08:33:52 2005
Subject: [Bioperl-l] Entrez Gene and bioperl-db
In-Reply-To: 
References: <2ED9C47A-5898-11D9-AC01-000A959EB4C4@gmx.net>
	<1104792001.3186.17.camel@localhost.localdomain>
	<0F5A3AE4-5DDA-11D9-AA3C-000393C44276@duke.edu>
	<1104871954.3102.24.camel@localhost.localdomain>
	<1DA5FD5C-5E94-11D9-9C0C-000393C44276@duke.edu>
	<1105044266.3084.27.camel@localhost.localdomain>
	
Message-ID: <1105959962.21808.15.camel@localhost.localdomain>

Hi list,

here's an update on Entrez Gene. 
1) NCBI apparently does not have plans to offer the files in XML format
for FTP download. It is possible to download the files in XML format
from the website, even including the files for the entire species with
corresponding queries (although I havent tried this yet). It seems this
might be too complicated for many users and there could be issues of
stability for browsers downloading files of that size.


2) I have completed two reasonably simple modules for parsing gene_info
and gene2accession using the SeqIO interface. These are attached
together with simple demo programs. These modules can be used to do some
useful things. For instance, we often want to generate a list of
correspondences between NCBI accession numbers and MGI accession numbers
so as to be able to use MGI's Gene Ontology annotations for the mouse.I
have included a script (accession2mgi.pl) that uses the above modules to
parse gene_info and gene2accession to do this (you need to use both
files)

3) In the meantime I have also gotten a lex/yacc parser in C to parse
the species-specific Gene files (which is by far the most interesting
file in the Entrez gene system). In principle this approach could be
done in Perl -- straightforward but a lot of detail work. I will be
needing this kind of thing for my work, so I will continue to work on
this, and once it is bug-free in C I will think about ways of porting it
to Bioperl (this might take a while). As I mentioned before on this
list, if anybody else can do this more quickly please go ahead (but drop
me a line); on the other hand, collaborators who like the idea of
writing a grammer in the style of lex/yacc or ANTLR are also welcome.

--peter


On Tue, 2005-01-11 at 02:33, Chris Mungall wrote: 
> Hi Peter
> 
> Have you tried asking NCBI to make XML available as well as ASN? In
> general they seem keen to offer both for most of their datasets. If not, I
> believe the NCBI toolkit has an ASN->XML converter.
> 
> Cheers
> Chris
> 
> On Thu, 6 Jan 2005, Peter Robinson wrote:
> 
> > Dear Bioperlers,
> >
> > I have started looking at writing some modules to parse the new Entrez
> > gene, which is kind of an expanded LocusLink. The really interesting
> > files are species specific and are in the ASN.1 format, and I am still
> > experimenting around with the best way of parsing them. To get started,
> > I am looking at the tab-delimited flat files. It seems to me that it
> > would be interesting to be able to parse gene_info and gene2accession
> > using the Bio::SeqIO system, the other files such as gene2unigene seem
> > less suited for this (the latter has just two entries which could be
> > parsed ad hoc easily enough).
> >
> > In any case, I am sending a proposed module Bio::SeqIO::geneinfo.pm as
> > well as a test script (which contains a small excerpt of gene_info in
> > the data section) for comments and criticism to the list. I am presently
> > working on another module for Bio::SeqIO::gene2accession and plan to
> > write a demo script using both modules to convert NCBI accession numbers
> > to MGI accession numbers (which is something one might want to do in
> > order to use Gene Ontology for affymetrix data, although one needs
> > additional work for probesets which are only related to ESTs).
> >
> > For the moment it seemed better to just parse in the NCBI taxon id into
> > the Bio::Species object (only this info is supplied by gene_info), and
> > expect users who need the information to use the taxonomy support of
> > other Bioperl modules in their scripts.
> >
> > I will continue to work on parsing the species specific ASN.1 files, but
> > I will be trying a combination of lex/yacc/C to do this. If that works I
> > will look into trying perl support for lex/yacc for potential use in
> > Bioperl, but since I am not sure how long this will take me, I do not
> > want to scare off anyone else who would like to give this a shot.
> >
> > best,
> > peter
> >
> >
> > On Tue, 2005-01-04 at 22:03, Jason Stajich wrote:
> > > On Jan 4, 2005, at 3:52 PM, Peter Robinson wrote:
> > >
> > > > Hi Jason,
> > > >
> > > > thanks for the advice. It seems as if the documentation of
> > > > Bio::DB::Taxonomy is a bit out of sync.
> > > >  my $db = new Bio::DB::Taxonomy(-source => 'flatfile'
> > > >                                  -nodesfile => $nodesfile,
> > > >                                  -namesfile => $namefile);
> > > > What does 'flatfile' refer to here? It is not apparent upon looking at
> > > > the code for new.
> > > >
> > > See Bio::DB::Taxonomy::flatfile for more information.  As I mentioned
> > > in the mail I sent, flatfile is for downloading the taxonomy DB from
> > > NCBI.  This lets you run it locally using an indexed  (BerkelyDB via
> > > DB_File) version of the file.
> > >
> > > You must need the most up-to-date verion of the modules - works fine
> > > for me for both the entrez and flatfile code, but you may have to
> > > upgrade off of the 1.4.0 release. Code from CVS or the bioperl-1.5 RC1
> > > code should work fine.
> > >
> > >
> > >
> > > > I had somewhat better luck using the entrez version, but I got a
> > > > pretty amusing error
> > > > message:
> > > >
> > > > MSG: can't create a species object for Homo sapiens (human) because it
> > > > isn't a species but is a '' instead
> > > >
> > > > ###
> > > > Full error and a dump of the script follow:
> > > >
> > > > my $db = new Bio::DB::Taxonomy(-source => 'entrez'); #
> > > > my $taxaid = $db->get_taxonid('Homo sapiens');
> > > > my $species = $db->get_Taxonomy_Node(-taxonid => '9606');
> > > > print Dumper($species);
> > > >
> > > > ###
> > > >
> > > > Use of uninitialized value in string eq at
> > > > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 192.
> > > > Use of uninitialized value in sprintf at
> > > > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 201.
> > > >
> > > > -------------------- WARNING ---------------------
> > > > MSG: can't create a species object for Homo sapiens (human) because it
> > > > isn't a species but is a '' instead
> > > > ---------------------------------------------------
> > > > Use of uninitialized value in string eq at
> > > > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 192.
> > > > Use of uninitialized value in sprintf at
> > > > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 201.
> > > >
> > > > -------------------- WARNING ---------------------
> > > > MSG: can't create a species object for Homo sapiens (human) because it
> > > > isn't a species but is a '' instead
> > > > ---------------------------------------------------
> > > > $VAR1 = {
> > > >           'TaxId' => '9606',
> > > >           'Division' => 'mammals',
> > > >           'GeneNumber' => '32775',
> > > >           'Rank' => 'species',
> > > >           'ProtNumber' => '247791',
> > > >           'ScientificName' => 'Homo sapiens',
> > > >           'CommonName' => 'human',
> > > >           'NucNumber' => '9025800',
> > > >           'GenNumber' => '25',
> > > >           'StructNumber' => '5638'
> > > >         };
> > > > peter@anna:~/programs/bioperlTest$
> > > >
> > > >
> > > > --best, peter
> > > >
> > > > On Mon, 2005-01-03 at 23:51, Jason Stajich wrote:
> > > >> Bio::DB::Taxonomy is the factory code - it is pretty easy to get a
> > > >> species object (or equivalent) using this code.  But you cannot (or
> > > >> could not when I wrote this, not sure of the current status) get the
> > > >> full classification from the NCBI taxonomy retrieval via cgi.  i.e.
> > > >> you
> > > >> can only get genus and species for a taxon id and I don't know how to
> > > >> walk up the hierarchy using the web API.  Earlier emails to NCBI
> > > >> seemed
> > > >> to indicate this is all they intended to provide, but not sure what
> > > >> the
> > > >> current status is.
> > > >>
> > > >>   my $db = new Bio::DB::Taxonomy(-source => 'entrez'); # use NCBI
> > > >> Entrez
> > > >> over HTTP
> > > >>    my $taxaid = $db->get_taxonid('Homo sapiens');
> > > >>    my $taxonnode = $db->get_Taxonomy_Node(-taxonid => '9606');
> > > >>
> > > >> You can get the full classification if you use the
> > > >> Bio::DB::Taxonomy::flatfile factory which requires you to have
> > > >> downloaded the taxonomy db flatfile from NCBI.  Since this is more
> > > >> reliable (and faster) it is what I have tended to use for grouping
> > > >> sets
> > > >> of seqDB search results, etc.
> > > >>
> > > >> -jason
> > > >> On Jan 3, 2005, at 5:40 PM, Peter Robinson wrote:
> > > >>
> > > >>> Hi Bioperlers, hi Hilmar,
> > > >>>
> > > >>> after some thinking I have embarked on a lex/yacc parser for the
> > > >>> Entrez
> > > >>> Gene ASN.1 format as the way of least resistance, although I am not
> > > >>> sure
> > > >>> how that would fit in to BioPerl. If anyone is interested in this (or
> > > >>> has a better idea of how to go about it..), please drop me a line.
> > > >>>
> > > >>> In the meantime I have been looking at writing code to parse some of
> > > >>> the
> > > >>> "easy" Entrez gene documents, starting off with gene_info. This file
> > > >>> includes the NCBI taxon id for each entry. I would like to convert
> > > >>> this
> > > >>> to a Bio::Species object to pass to the following
> > > >>> 	my $seq = $self->sequence_factory->create(
> > > >>> 			     -verbose => $self->verbose(),
> > > >>> 			     -accession_number => $geneID,
> > > >>> 			     -desc => $description,
> > > >>> 			     -display_id => $symbol,
> > > >>> 			     -species =>  ???
> > > >>> 			     -annotation => $ann);
> > > >>>
> > > >>> and saw the Bio::Taxonomy::FactoryI code, which appears to want to do
> > > >>> this sort of thing. However, the code for that is pretty preliminary.
> > > >>> Is
> > > >>> anyone working on this at the moment? Or is there a better way of
> > > >>> doing
> > > >>> this (it seems a shame not to provide the actual species name if one
> > > >>> has
> > > >>> the taxid...)
> > > >>>
> > > >>> best
> > > >>>
> > > >>> Peter
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Tue, 2004-12-28 at 07:17, Hilmar Lapp wrote:
> > > >>>> Great to hear that someone is giving this a shot. Yes at this point
> > > >>>> is
> > > >>>> appears that NCBI is only offering the ASN.1, not a conversion to
> > > >>>> XML.
> > > >>>> Their asn2xml tool will not work with this ASN.1 format either, just
> > > >>>> checked it to be sure. They do seem to be mulling the option of XML
> > > >>>> though on the Gene FAQ. Maybe if enough people get in their ears
> > > >>>> they
> > > >>>> will spend some effort towards that. After all, the entrez gene web
> > > >>>> interface can display XML on demand - even though it looks fairly
> > > >>>> hideous.
> > > >>>>
> > > >>>> There is no ASN.1 support in bioperl at all. Also, ASN.1 support in
> > > >>>> perl is actually thin - there is Convert::ASN1 at version 0.18 two
> > > >>>> years ago that I could find ... doesn't make me feel warm and fuzzy.
> > > >>>>
> > > >>>> In the absence of any XML available from NCBI, gene_info might be
> > > >>>> the
> > > >>>> best start. An option could be to check for the presence of the
> > > >>>> other
> > > >>>> tab-delimited files and use those that are present. These are
> > > >>>> tab-delimited and hence the format itself is trivial so you can
> > > >>>> focus
> > > >>>> entirely on setting up a Bio::Seq plus annotation that's
> > > >>>> comparable/compatible to what the current SeqIO::locuslink does.
> > > >>>>
> > > >>>> My $0.02 (worth less and less almost every day).
> > > >>>>
> > > >>>> 	-hilmar
> > > >>>>
> > > >>>> On Thursday, December 23, 2004, at 10:51  AM, Peter Robinson wrote:
> > > >>>>
> > > >>>>> Hi,
> > > >>>>>
> > > >>>>> I have been thinking about given a BioPerl EntrezGene parser a try
> > > >>>>> since
> > > >>>>> I have been a heavy user of locus link to date. One issue is that
> > > >>>>> the
> > > >>>>> files that correspond to LL_tmpl (which was a flat file) are now in
> > > >>>>> asn
> > > >>>>> format
> > > >>>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/help/
> > > >>>>> genehelp.html#query
> > > >>>>> Although I saw some mention of ASN support in Bioperl by googling,
> > > >>>>> I
> > > >>>>> can't seem to find any module that does this in the present
> > > >>>>> distribution. What is the status on that? In any case, I will be
> > > >>>>> working
> > > >>>>> on this in the next month or two and if anything nice comes of it I
> > > >>>>> will
> > > >>>>> send it to you / BioPerpl.
> > > >>>>>
> > > >>>>> best wishes & happy holidays
> > > >>>>>
> > > >>>>> Peter
> > > >>>>>
> > > >>>>> On Tue, 2004-12-14 at 09:00, Hilmar Lapp wrote:
> > > >>>>>> Since load_seqdatabase.pl will use bioperl's SeqIO parsers for
> > > >>>>>> parsing
> > > >>>>>> any input file, what you're asking is whether or not there is a
> > > >>>>>> SeqIO
> > > >>>>>> parser for NCBI Gene.
> > > >>>>>>
> > > >>>>>> The answer to that question is no, not yet. Anybody who feels
> > > >>>>>> motivated
> > > >>>>>> is welcome to give it a try ... Since I'll need it, I'll write the
> > > >>>>>> parser if nobody else does within the next 3 months, but I'm not
> > > >>>>>> going
> > > >>>>>> to promise when exactly this will happen.
> > > >>>>>>
> > > >>>>>> 	-hilmar
> > > >>>>>>
> > > >>>>>> On Monday, December 13, 2004, at 08:03  AM, Law, Annie wrote:
> > > >>>>>>
> > > >>>>>>> Hi,
> > > >>>>>>>
> > > >>>>>>> I was wondering with regards to bioperl-db the scripts and schema
> > > >>>>>>> and
> > > >>>>>>> load_seqdatabase.pl has there been preparation for integration of
> > > >>>>>>> Entrez
> > > >>>>>>> gene information when locuslink is phased out?  Or if it has
> > > >>>>>>> already
> > > >>>>>>> been
> > > >>>>>>> changed could somebody point
> > > >>>>>>> me to the documentation or changed code?
> > > >>>>>>>
> > > >>>>>>> Thanks,
> > > >>>>>>> Annie.
> > > >>>>>>> _______________________________________________
> > > >>>>>>> Bioperl-l mailing list
> > > >>>>>>> Bioperl-l@portal.open-bio.org
> > > >>>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>> --
> > > >>>>> Peter N. Robinson
> > > >>>>> peter.robinson@t-online.de
> > > >>>>> peter.robinson@charite.de
> > > >>>>> http://www.charite.de/ch/medgen/robinson/
> > > >>>>>
> > > >>>>>
> > > >>> --
> > > >>> Peter N. Robinson
> > > >>> peter.robinson@t-online.de
> > > >>> peter.robinson@charite.de
> > > >>> http://www.charite.de/ch/medgen/robinson/
> > > >>>
> > > >>> _______________________________________________
> > > >>> Bioperl-l mailing list
> > > >>> Bioperl-l@portal.open-bio.org
> > > >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > > >>>
> > > >>>
> > > >> --
> > > >> Jason Stajich
> > > >> jason.stajich at duke.edu
> > > >> http://www.duke.edu/~jes12/
> > > > --
> > > > Peter N. Robinson
> > > > peter.robinson@t-online.de
> > > > peter.robinson@charite.de
> > > > http://www.charite.de/ch/medgen/robinson/
> > > >
> > > >
> > > --
> > > Jason Stajich
> > > jason.stajich at duke.edu
> > > http://www.duke.edu/~jes12/
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l@portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >

-- 
Peter N. Robinson
peter.robinson@t-online.de
peter.robinson@charite.de
http://www.charite.de/ch/medgen/robinson/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: accession2mgi.pl
Type: application/x-perl
Size: 2507 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050117/2c5de30b/accession2mgi-0001.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gene2accession.pm
Type: application/x-perl
Size: 8148 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050117/2c5de30b/gene2accession-0001.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gene2accession_test.pl
Type: application/x-perl
Size: 5968 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050117/2c5de30b/gene2accession_test-0001.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: geneinfo.pm
Type: application/x-perl
Size: 10515 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050117/2c5de30b/geneinfo-0001.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: geneinfotest.pl
Type: application/x-perl
Size: 11225 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050117/2c5de30b/geneinfotest-0001.bin
From sdavis2 at mail.nih.gov  Mon Jan 17 09:09:37 2005
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Mon Jan 17 09:08:03 2005
Subject: [Bioperl-l] Entrez Gene and bioperl-db
References: <2ED9C47A-5898-11D9-AC01-000A959EB4C4@gmx.net><1104792001.3186.17.camel@localhost.localdomain><0F5A3AE4-5DDA-11D9-AA3C-000393C44276@duke.edu><1104871954.3102.24.camel@localhost.localdomain><1DA5FD5C-5E94-11D9-9C0C-000393C44276@duke.edu><1105044266.3084.27.camel@localhost.localdomain>
	<1105959962.21808.15.camel@localhost.localdomain>
Message-ID: <009101c4fc9e$2dcb7f80$7d75f345@WATSON>

Peter,

Thanks for doing all this!

Just a bit more on an update.  I checked with some folks in our (NHGRI) 
bioinformatics core.  It sounds like the closest thing to XML that NCBI 
might offer would be an ASN.1 to XML converter and NOT the xml files, as 
Peter already stated.  They have one (like for public consumption) that 
works for each ASN.1 file except for the gene files.  There is no definite 
date for completion as far as I know.  They have also mentioned a bulk ASN.1 
to XML web-based tool, but I agree with Peter that this will have 
significant limitations for "online" use for large datasets like 
human/mouse/rat (but might work well with a user agent).

Sean

----- Original Message ----- 
From: "Peter Robinson" 
To: "Bioperl list" 
Cc: "Peter Robinson" 
Sent: Monday, January 17, 2005 6:06 AM
Subject: Re: [Bioperl-l] Entrez Gene and bioperl-db


> Hi list,
>
> here's an update on Entrez Gene.
> 1) NCBI apparently does not have plans to offer the files in XML format
> for FTP download. It is possible to download the files in XML format
> from the website, even including the files for the entire species with
> corresponding queries (although I havent tried this yet). It seems this
> might be too complicated for many users and there could be issues of
> stability for browsers downloading files of that size.
>
>
> 2) I have completed two reasonably simple modules for parsing gene_info
> and gene2accession using the SeqIO interface. These are attached
> together with simple demo programs. These modules can be used to do some
> useful things. For instance, we often want to generate a list of
> correspondences between NCBI accession numbers and MGI accession numbers
> so as to be able to use MGI's Gene Ontology annotations for the mouse.I
> have included a script (accession2mgi.pl) that uses the above modules to
> parse gene_info and gene2accession to do this (you need to use both
> files)
>
> 3) In the meantime I have also gotten a lex/yacc parser in C to parse
> the species-specific Gene files (which is by far the most interesting
> file in the Entrez gene system). In principle this approach could be
> done in Perl -- straightforward but a lot of detail work. I will be
> needing this kind of thing for my work, so I will continue to work on
> this, and once it is bug-free in C I will think about ways of porting it
> to Bioperl (this might take a while). As I mentioned before on this
> list, if anybody else can do this more quickly please go ahead (but drop
> me a line); on the other hand, collaborators who like the idea of
> writing a grammer in the style of lex/yacc or ANTLR are also welcome.
>
> --peter
>
>
> On Tue, 2005-01-11 at 02:33, Chris Mungall wrote:
>> Hi Peter
>>
>> Have you tried asking NCBI to make XML available as well as ASN? In
>> general they seem keen to offer both for most of their datasets. If not, 
>> I
>> believe the NCBI toolkit has an ASN->XML converter.
>>
>> Cheers
>> Chris
>>
>> On Thu, 6 Jan 2005, Peter Robinson wrote:
>>
>> > Dear Bioperlers,
>> >
>> > I have started looking at writing some modules to parse the new Entrez
>> > gene, which is kind of an expanded LocusLink. The really interesting
>> > files are species specific and are in the ASN.1 format, and I am still
>> > experimenting around with the best way of parsing them. To get started,
>> > I am looking at the tab-delimited flat files. It seems to me that it
>> > would be interesting to be able to parse gene_info and gene2accession
>> > using the Bio::SeqIO system, the other files such as gene2unigene seem
>> > less suited for this (the latter has just two entries which could be
>> > parsed ad hoc easily enough).
>> >
>> > In any case, I am sending a proposed module Bio::SeqIO::geneinfo.pm as
>> > well as a test script (which contains a small excerpt of gene_info in
>> > the data section) for comments and criticism to the list. I am 
>> > presently
>> > working on another module for Bio::SeqIO::gene2accession and plan to
>> > write a demo script using both modules to convert NCBI accession 
>> > numbers
>> > to MGI accession numbers (which is something one might want to do in
>> > order to use Gene Ontology for affymetrix data, although one needs
>> > additional work for probesets which are only related to ESTs).
>> >
>> > For the moment it seemed better to just parse in the NCBI taxon id into
>> > the Bio::Species object (only this info is supplied by gene_info), and
>> > expect users who need the information to use the taxonomy support of
>> > other Bioperl modules in their scripts.
>> >
>> > I will continue to work on parsing the species specific ASN.1 files, 
>> > but
>> > I will be trying a combination of lex/yacc/C to do this. If that works 
>> > I
>> > will look into trying perl support for lex/yacc for potential use in
>> > Bioperl, but since I am not sure how long this will take me, I do not
>> > want to scare off anyone else who would like to give this a shot.
>> >
>> > best,
>> > peter
>> >
>> >
>> > On Tue, 2005-01-04 at 22:03, Jason Stajich wrote:
>> > > On Jan 4, 2005, at 3:52 PM, Peter Robinson wrote:
>> > >
>> > > > Hi Jason,
>> > > >
>> > > > thanks for the advice. It seems as if the documentation of
>> > > > Bio::DB::Taxonomy is a bit out of sync.
>> > > >  my $db = new Bio::DB::Taxonomy(-source => 'flatfile'
>> > > >                                  -nodesfile => $nodesfile,
>> > > >                                  -namesfile => $namefile);
>> > > > What does 'flatfile' refer to here? It is not apparent upon looking 
>> > > > at
>> > > > the code for new.
>> > > >
>> > > See Bio::DB::Taxonomy::flatfile for more information.  As I mentioned
>> > > in the mail I sent, flatfile is for downloading the taxonomy DB from
>> > > NCBI.  This lets you run it locally using an indexed  (BerkelyDB via
>> > > DB_File) version of the file.
>> > >
>> > > You must need the most up-to-date verion of the modules - works fine
>> > > for me for both the entrez and flatfile code, but you may have to
>> > > upgrade off of the 1.4.0 release. Code from CVS or the bioperl-1.5 
>> > > RC1
>> > > code should work fine.
>> > >
>> > >
>> > >
>> > > > I had somewhat better luck using the entrez version, but I got a
>> > > > pretty amusing error
>> > > > message:
>> > > >
>> > > > MSG: can't create a species object for Homo sapiens (human) because 
>> > > > it
>> > > > isn't a species but is a '' instead
>> > > >
>> > > > ###
>> > > > Full error and a dump of the script follow:
>> > > >
>> > > > my $db = new Bio::DB::Taxonomy(-source => 'entrez'); #
>> > > > my $taxaid = $db->get_taxonid('Homo sapiens');
>> > > > my $species = $db->get_Taxonomy_Node(-taxonid => '9606');
>> > > > print Dumper($species);
>> > > >
>> > > > ###
>> > > >
>> > > > Use of uninitialized value in string eq at
>> > > > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 192.
>> > > > Use of uninitialized value in sprintf at
>> > > > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 201.
>> > > >
>> > > > -------------------- WARNING ---------------------
>> > > > MSG: can't create a species object for Homo sapiens (human) because 
>> > > > it
>> > > > isn't a species but is a '' instead
>> > > > ---------------------------------------------------
>> > > > Use of uninitialized value in string eq at
>> > > > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 192.
>> > > > Use of uninitialized value in sprintf at
>> > > > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 201.
>> > > >
>> > > > -------------------- WARNING ---------------------
>> > > > MSG: can't create a species object for Homo sapiens (human) because 
>> > > > it
>> > > > isn't a species but is a '' instead
>> > > > ---------------------------------------------------
>> > > > $VAR1 = {
>> > > >           'TaxId' => '9606',
>> > > >           'Division' => 'mammals',
>> > > >           'GeneNumber' => '32775',
>> > > >           'Rank' => 'species',
>> > > >           'ProtNumber' => '247791',
>> > > >           'ScientificName' => 'Homo sapiens',
>> > > >           'CommonName' => 'human',
>> > > >           'NucNumber' => '9025800',
>> > > >           'GenNumber' => '25',
>> > > >           'StructNumber' => '5638'
>> > > >         };
>> > > > peter@anna:~/programs/bioperlTest$
>> > > >
>> > > >
>> > > > --best, peter
>> > > >
>> > > > On Mon, 2005-01-03 at 23:51, Jason Stajich wrote:
>> > > >> Bio::DB::Taxonomy is the factory code - it is pretty easy to get a
>> > > >> species object (or equivalent) using this code.  But you cannot 
>> > > >> (or
>> > > >> could not when I wrote this, not sure of the current status) get 
>> > > >> the
>> > > >> full classification from the NCBI taxonomy retrieval via cgi. 
>> > > >> i.e.
>> > > >> you
>> > > >> can only get genus and species for a taxon id and I don't know how 
>> > > >> to
>> > > >> walk up the hierarchy using the web API.  Earlier emails to NCBI
>> > > >> seemed
>> > > >> to indicate this is all they intended to provide, but not sure 
>> > > >> what
>> > > >> the
>> > > >> current status is.
>> > > >>
>> > > >>   my $db = new Bio::DB::Taxonomy(-source => 'entrez'); # use NCBI
>> > > >> Entrez
>> > > >> over HTTP
>> > > >>    my $taxaid = $db->get_taxonid('Homo sapiens');
>> > > >>    my $taxonnode = $db->get_Taxonomy_Node(-taxonid => '9606');
>> > > >>
>> > > >> You can get the full classification if you use the
>> > > >> Bio::DB::Taxonomy::flatfile factory which requires you to have
>> > > >> downloaded the taxonomy db flatfile from NCBI.  Since this is more
>> > > >> reliable (and faster) it is what I have tended to use for grouping
>> > > >> sets
>> > > >> of seqDB search results, etc.
>> > > >>
>> > > >> -jason
>> > > >> On Jan 3, 2005, at 5:40 PM, Peter Robinson wrote:
>> > > >>
>> > > >>> Hi Bioperlers, hi Hilmar,
>> > > >>>
>> > > >>> after some thinking I have embarked on a lex/yacc parser for the
>> > > >>> Entrez
>> > > >>> Gene ASN.1 format as the way of least resistance, although I am 
>> > > >>> not
>> > > >>> sure
>> > > >>> how that would fit in to BioPerl. If anyone is interested in this 
>> > > >>> (or
>> > > >>> has a better idea of how to go about it..), please drop me a 
>> > > >>> line.
>> > > >>>
>> > > >>> In the meantime I have been looking at writing code to parse some 
>> > > >>> of
>> > > >>> the
>> > > >>> "easy" Entrez gene documents, starting off with gene_info. This 
>> > > >>> file
>> > > >>> includes the NCBI taxon id for each entry. I would like to 
>> > > >>> convert
>> > > >>> this
>> > > >>> to a Bio::Species object to pass to the following
>> > > >>> my $seq = $self->sequence_factory->create(
>> > > >>>      -verbose => $self->verbose(),
>> > > >>>      -accession_number => $geneID,
>> > > >>>      -desc => $description,
>> > > >>>      -display_id => $symbol,
>> > > >>>      -species =>  ???
>> > > >>>      -annotation => $ann);
>> > > >>>
>> > > >>> and saw the Bio::Taxonomy::FactoryI code, which appears to want 
>> > > >>> to do
>> > > >>> this sort of thing. However, the code for that is pretty 
>> > > >>> preliminary.
>> > > >>> Is
>> > > >>> anyone working on this at the moment? Or is there a better way of
>> > > >>> doing
>> > > >>> this (it seems a shame not to provide the actual species name if 
>> > > >>> one
>> > > >>> has
>> > > >>> the taxid...)
>> > > >>>
>> > > >>> best
>> > > >>>
>> > > >>> Peter
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>> On Tue, 2004-12-28 at 07:17, Hilmar Lapp wrote:
>> > > >>>> Great to hear that someone is giving this a shot. Yes at this 
>> > > >>>> point
>> > > >>>> is
>> > > >>>> appears that NCBI is only offering the ASN.1, not a conversion 
>> > > >>>> to
>> > > >>>> XML.
>> > > >>>> Their asn2xml tool will not work with this ASN.1 format either, 
>> > > >>>> just
>> > > >>>> checked it to be sure. They do seem to be mulling the option of 
>> > > >>>> XML
>> > > >>>> though on the Gene FAQ. Maybe if enough people get in their ears
>> > > >>>> they
>> > > >>>> will spend some effort towards that. After all, the entrez gene 
>> > > >>>> web
>> > > >>>> interface can display XML on demand - even though it looks 
>> > > >>>> fairly
>> > > >>>> hideous.
>> > > >>>>
>> > > >>>> There is no ASN.1 support in bioperl at all. Also, ASN.1 support 
>> > > >>>> in
>> > > >>>> perl is actually thin - there is Convert::ASN1 at version 0.18 
>> > > >>>> two
>> > > >>>> years ago that I could find ... doesn't make me feel warm and 
>> > > >>>> fuzzy.
>> > > >>>>
>> > > >>>> In the absence of any XML available from NCBI, gene_info might 
>> > > >>>> be
>> > > >>>> the
>> > > >>>> best start. An option could be to check for the presence of the
>> > > >>>> other
>> > > >>>> tab-delimited files and use those that are present. These are
>> > > >>>> tab-delimited and hence the format itself is trivial so you can
>> > > >>>> focus
>> > > >>>> entirely on setting up a Bio::Seq plus annotation that's
>> > > >>>> comparable/compatible to what the current SeqIO::locuslink does.
>> > > >>>>
>> > > >>>> My $0.02 (worth less and less almost every day).
>> > > >>>>
>> > > >>>> -hilmar
>> > > >>>>
>> > > >>>> On Thursday, December 23, 2004, at 10:51  AM, Peter Robinson 
>> > > >>>> wrote:
>> > > >>>>
>> > > >>>>> Hi,
>> > > >>>>>
>> > > >>>>> I have been thinking about given a BioPerl EntrezGene parser a 
>> > > >>>>> try
>> > > >>>>> since
>> > > >>>>> I have been a heavy user of locus link to date. One issue is 
>> > > >>>>> that
>> > > >>>>> the
>> > > >>>>> files that correspond to LL_tmpl (which was a flat file) are 
>> > > >>>>> now in
>> > > >>>>> asn
>> > > >>>>> format
>> > > >>>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/help/
>> > > >>>>> genehelp.html#query
>> > > >>>>> Although I saw some mention of ASN support in Bioperl by 
>> > > >>>>> googling,
>> > > >>>>> I
>> > > >>>>> can't seem to find any module that does this in the present
>> > > >>>>> distribution. What is the status on that? In any case, I will 
>> > > >>>>> be
>> > > >>>>> working
>> > > >>>>> on this in the next month or two and if anything nice comes of 
>> > > >>>>> it I
>> > > >>>>> will
>> > > >>>>> send it to you / BioPerpl.
>> > > >>>>>
>> > > >>>>> best wishes & happy holidays
>> > > >>>>>
>> > > >>>>> Peter
>> > > >>>>>
>> > > >>>>> On Tue, 2004-12-14 at 09:00, Hilmar Lapp wrote:
>> > > >>>>>> Since load_seqdatabase.pl will use bioperl's SeqIO parsers for
>> > > >>>>>> parsing
>> > > >>>>>> any input file, what you're asking is whether or not there is 
>> > > >>>>>> a
>> > > >>>>>> SeqIO
>> > > >>>>>> parser for NCBI Gene.
>> > > >>>>>>
>> > > >>>>>> The answer to that question is no, not yet. Anybody who feels
>> > > >>>>>> motivated
>> > > >>>>>> is welcome to give it a try ... Since I'll need it, I'll write 
>> > > >>>>>> the
>> > > >>>>>> parser if nobody else does within the next 3 months, but I'm 
>> > > >>>>>> not
>> > > >>>>>> going
>> > > >>>>>> to promise when exactly this will happen.
>> > > >>>>>>
>> > > >>>>>> -hilmar
>> > > >>>>>>
>> > > >>>>>> On Monday, December 13, 2004, at 08:03  AM, Law, Annie wrote:
>> > > >>>>>>
>> > > >>>>>>> Hi,
>> > > >>>>>>>
>> > > >>>>>>> I was wondering with regards to bioperl-db the scripts and 
>> > > >>>>>>> schema
>> > > >>>>>>> and
>> > > >>>>>>> load_seqdatabase.pl has there been preparation for 
>> > > >>>>>>> integration of
>> > > >>>>>>> Entrez
>> > > >>>>>>> gene information when locuslink is phased out?  Or if it has
>> > > >>>>>>> already
>> > > >>>>>>> been
>> > > >>>>>>> changed could somebody point
>> > > >>>>>>> me to the documentation or changed code?
>> > > >>>>>>>
>> > > >>>>>>> Thanks,
>> > > >>>>>>> Annie.
>> > > >>>>>>> _______________________________________________
>> > > >>>>>>> Bioperl-l mailing list
>> > > >>>>>>> Bioperl-l@portal.open-bio.org
>> > > >>>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>> > > >>>>>>>
>> > > >>>>>>>
>> > > >>>>> --
>> > > >>>>> Peter N. Robinson
>> > > >>>>> peter.robinson@t-online.de
>> > > >>>>> peter.robinson@charite.de
>> > > >>>>> http://www.charite.de/ch/medgen/robinson/
>> > > >>>>>
>> > > >>>>>
>> > > >>> --
>> > > >>> Peter N. Robinson
>> > > >>> peter.robinson@t-online.de
>> > > >>> peter.robinson@charite.de
>> > > >>> http://www.charite.de/ch/medgen/robinson/
>> > > >>>
>> > > >>> _______________________________________________
>> > > >>> Bioperl-l mailing list
>> > > >>> Bioperl-l@portal.open-bio.org
>> > > >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>> > > >>>
>> > > >>>
>> > > >> --
>> > > >> Jason Stajich
>> > > >> jason.stajich at duke.edu
>> > > >> http://www.duke.edu/~jes12/
>> > > > --
>> > > > Peter N. Robinson
>> > > > peter.robinson@t-online.de
>> > > > peter.robinson@charite.de
>> > > > http://www.charite.de/ch/medgen/robinson/
>> > > >
>> > > >
>> > > --
>> > > Jason Stajich
>> > > jason.stajich at duke.edu
>> > > http://www.duke.edu/~jes12/
>> > >
>> > > _______________________________________________
>> > > Bioperl-l mailing list
>> > > Bioperl-l@portal.open-bio.org
>> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
>> >
>
> -- 
> Peter N. Robinson
> peter.robinson@t-online.de
> peter.robinson@charite.de
> http://www.charite.de/ch/medgen/robinson/
>
>


--------------------------------------------------------------------------------


> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l 


From gyang at plantbio.uga.edu  Mon Jan 17 11:17:31 2005
From: gyang at plantbio.uga.edu (Guojun Yang)
Date: Mon Jan 17 11:15:41 2005
Subject: [Bioperl-l] regular expression help!
In-Reply-To: <4f10f19405011606531737d90@mail.gmail.com>
Message-ID: <20050117111731.58739c14@dogwood.plantbio.uga.edu>

Thanks for everybody's comments, the only thing I am interested in is a regular expression to recognize the pattern (it should not be confined to certain sequences as have suggested by some). For example: in tttaatatcaaAGCATgggaaaggatat....atatcctttcccGCATacatataccata, the regex should recognize AGCATgggaaaggatat....atatcctttcccGCAT. The problem is not the direct repeat AGCAT, but how to match the atatcctttccc with the gggaaaggatat. I guess there must be a way to do it. I tried the following and obtained weird results:
/.*(\S+)(\S)(\S)(\S)(\S)(\S)(\S)(\S)(\S)(\S)(\S).*(??{convert(\11);})(??{convert(\10);})(??{convert(\9);})(??{convert(\8);})(??{convert(\7);})(??{convert(\6);})(??{convert(\5);})(??{convert(\4);})(??{convert(\3);})(??{convert(\2);})\1.*/i
...

sub convert{
my $return=$_[0];
$return =~ tr/ATCG/TAGC/;
$return =reverse($return);
return $return;
}

Can anybody give me a hint on the -e switch when using perl script inside a regex?

Yang





----- Original Message -----
From: Willy West 
To: Jan.Aerts@wur.nl, bioperl-l@portal.open-bio.org
Sent: Sun, 16 Jan 2005 09:53:55 -0500
Subject: Re: [Bioperl-l] regular expression help!


> oops- i'd forgotten to "reply to all" with this... i apologize.
> 
> 
> On Sun, 16 Jan 2005 11:13:45 +0100, Aerts, Jan  wrote:
> > The problem is (or I might miss something here), that he wants to _test_ a
> regex. It's not possible to write something like
> > $_ =~ /(.*)(.*)foo(\2)(.*)/e
> > I think...
> > 
> > jan.
> 
> now i'm trying to do this with the test regex and am not successful :(
>   this is an interesting problem and i really would love to find a
> way..
> 
> one solution would be to explode the whole thing in another
> subroutine... but if it's
> not  what you want, i'm not yet sure how to do it.
> 
> good challenge though.....
> 
> :)
> 
> > 
> > 
> > -----Original Message-----
> > From:   Willy West [mailto:corenth@gmail.com]
> > Sent:   Sun 16-Jan-05 00:09
> > To:     Aerts, Jan
> > Cc:
> > Subject:        Re: [Bioperl-l] regular expression help!
> > On Sat, 15 Jan 2005 15:17:28 +0100, Aerts, Jan  wrote:
> > > You're right... Should have looked at the actual expression.
> > > Idea: is it possible in this case to call subroutines from within a regex
> and evaluating them using the 'e' switch?
> > 
> > if i recall::
> > 
> > sub foo {
> >            return 'hello genome';
> > }
> > 
> > $data = "ih ho hum bababa";
> > 
> > $data =~ s/ih/foo/e; #one way to do it.
> > 
> > print "$data\n";
> > 
> > seems to work..
> 
> 
> -- 
> Willy
> http://www.hackswell.com/corenth
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 


From danielucgbioinfo at yahoo.com.br  Mon Jan 17 11:19:31 2005
From: danielucgbioinfo at yahoo.com.br (Danielucg Sousa)
Date: Mon Jan 17 11:16:23 2005
Subject: [Bioperl-l] method "link_pattern" and Bio::Graphics::Panel
Message-ID: <20050117161932.15787.qmail@web53502.mail.yahoo.com>

Hi,

I have has some difficult
whith Bio::Graphics:Panel class. I want only show on
browser a little sequence and to be possible
clickable, for a link http. Please, look my little
code and tell me what is wrong. I have used Bioperl
1.5 RC 2 .

The out messanger is :
Can't locate object method "link_pattern" via package
"Bio::Graphics::FeatureFile" at
/usr/lib/perl5/site_perl/5.8.3/Bio/Graphics/Panel.pm
line 981,  line 191,.

My little code :
#!/usr/bin/perl -wT

use strict;
use Bio::Graphics;
use Bio::SeqIO;
use Bio::SeqFeature::Generic;
use CGI  qw / :standard /;
use CGI::Pretty;

my $wholeseq =
Bio::SeqFeature::Generic->new(-start=>1,-end=>600);

my $q = new CGI;

print $q->header('text/html');
print $q->start_html('A Vector Rendering ');

my $panel = Bio::Graphics::Panel->new(-length=>600,
-width=>1000, -pad_left=> 10, -pad_right=>10, 
-key_style =>'none',
-spacing=>-0.25,-box_subparts=>'true');

$panel->add_track($wholeseq,-glyph=>'arrow',-bump=>
+1,  -double => 1,-tick=>2,-title=>'test 1',-link =>
'www.perl.org' );
 
$panel->add_track($wholeseq,-glyph=>'transcript2',
-bgcolor =>'orange', -bump=> 0,-height
=>12,-title=>'test 2', -link
=>'http://www.google.com.br', );
      
my ($url,$map,$mapname) = $panel->image_and_map(-root
=> '/home/bioinfo/cgi-bin',-url => '/tmpimages', );

print $q->img({-src=>$url,-usemap=>"#$mapname"});
print $q->$map;
print $q->($panel->png);

print $q->exit_html;

exit;

Thank you very much,
Daniel Xavier 


	
	
		
_______________________________________________________ 
Yahoo! Acesso Gr?tis - Instale o discador do Yahoo! agora. http://br.acesso.yahoo.com/ - Internet r?pida e gr?tis
From cldwalker at chwhat.com  Mon Jan 17 12:49:43 2005
From: cldwalker at chwhat.com (Gabriel Horner)
Date: Mon Jan 17 12:43:08 2005
Subject: [Bioperl-l] Announcing bioperl shell, Fry::Lib::BioPerl
Message-ID: <20050117174943.GA29769@bigmama.chwhat.com>

Hi All,
  I'm announcing that I put up a module, Fry::Lib::BioPerl,
for my shell framework, Fry::Shell, a few days ago. The result
is a set of a commands for viewing and obtaining sequences and alignments.
It's definitely a step up from the shell in examples/bioperl.pl.
See http://search.cpan.org/perldoc?Fry::Shell for details.
It is fairly easy to write new libraries to use with Fry::Shell. 
Since the shell framework has no dependencies, a Fry::Shell bundle along with a script that loads
only Fry::Lib::BioPerl could be included in the examples directory if desired.

Gabriel
-- 
my looovely website -- http://www.chwhat.com
BTW, IF chwhat.com goes down email me at gabriel.horner@cern.ch
From Peter.Robinson at t-online.de  Mon Jan 17 14:21:44 2005
From: Peter.Robinson at t-online.de (Peter Robinson)
Date: Mon Jan 17 14:17:35 2005
Subject: [Bioperl-l] regular expression help!
In-Reply-To: <20050117111731.58739c14@dogwood.plantbio.uga.edu>
References: <20050117111731.58739c14@dogwood.plantbio.uga.edu>
Message-ID: <1105989704.8090.11.camel@localhost.localdomain>

Just a suggestion, but I don't think regular expressions are the best
way to do this. You might want to take a look at some of the programs
at www.emboss.org, which can find repeats, inverted repeats /
palindromes in DNA sequences. The EMBOSS programs are open-source, easy
to use and quite useful, although the EMBOSS group is unfortunately now
having difficulties with funding.

-peter

On Mon, 2005-01-17 at 17:17, Guojun Yang wrote:
> Thanks for everybody's comments, the only thing I am interested in is a regular expression to recognize the pattern (it should not be confined to certain sequences as have suggested by some). For example: in tttaatatcaaAGCATgggaaaggatat....atatcctttcccGCATacatataccata, the regex should recognize AGCATgggaaaggatat....atatcctttcccGCAT. The problem is not the direct repeat AGCAT, but how to match the atatcctttccc with the gggaaaggatat. I guess there must be a way to do it. I tried the following and obtained weird results:
> /.*(\S+)(\S)(\S)(\S)(\S)(\S)(\S)(\S)(\S)(\S)(\S).*(??{convert(\11);})(??{convert(\10);})(??{convert(\9);})(??{convert(\8);})(??{convert(\7);})(??{convert(\6);})(??{convert(\5);})(??{convert(\4);})(??{convert(\3);})(??{convert(\2);})\1.*/i
> ...
> 
> sub convert{
> my $return=$_[0];
> $return =~ tr/ATCG/TAGC/;
> $return =reverse($return);
> return $return;
> }
> 
> Can anybody give me a hint on the -e switch when using perl script inside a regex?
> 
> Yang
> 
> 
> 
> 
> 
> ----- Original Message -----
> From: Willy West 
> To: Jan.Aerts@wur.nl, bioperl-l@portal.open-bio.org
> Sent: Sun, 16 Jan 2005 09:53:55 -0500
> Subject: Re: [Bioperl-l] regular expression help!
> 
> 
> > oops- i'd forgotten to "reply to all" with this... i apologize.
> > 
> > 
> > On Sun, 16 Jan 2005 11:13:45 +0100, Aerts, Jan  wrote:
> > > The problem is (or I might miss something here), that he wants to _test_ a
> > regex. It's not possible to write something like
> > > $_ =~ /(.*)(.*)foo(\2)(.*)/e
> > > I think...
> > > 
> > > jan.
> > 
> > now i'm trying to do this with the test regex and am not successful :(
> >   this is an interesting problem and i really would love to find a
> > way..
> > 
> > one solution would be to explode the whole thing in another
> > subroutine... but if it's
> > not  what you want, i'm not yet sure how to do it.
> > 
> > good challenge though.....
> > 
> > :)
> > 
> > > 
> > > 
> > > -----Original Message-----
> > > From:   Willy West [mailto:corenth@gmail.com]
> > > Sent:   Sun 16-Jan-05 00:09
> > > To:     Aerts, Jan
> > > Cc:
> > > Subject:        Re: [Bioperl-l] regular expression help!
> > > On Sat, 15 Jan 2005 15:17:28 +0100, Aerts, Jan  wrote:
> > > > You're right... Should have looked at the actual expression.
> > > > Idea: is it possible in this case to call subroutines from within a regex
> > and evaluating them using the 'e' switch?
> > > 
> > > if i recall::
> > > 
> > > sub foo {
> > >            return 'hello genome';
> > > }
> > > 
> > > $data = "ih ho hum bababa";
> > > 
> > > $data =~ s/ih/foo/e; #one way to do it.
> > > 
> > > print "$data\n";
> > > 
> > > seems to work..
> > 
> > 
> > -- 
> > Willy
> > http://www.hackswell.com/corenth
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
-- 
Peter N. Robinson
peter.robinson@t-online.de
peter.robinson@charite.de
http://www.charite.de/ch/medgen/robinson/

From babenko at ncbi.nlm.nih.gov  Mon Jan 17 14:23:12 2005
From: babenko at ncbi.nlm.nih.gov (Babenko, Vladimir (NIH/NLM/NCBI))
Date: Mon Jan 17 14:19:24 2005
Subject: [Bioperl-l] Problem with parsing ENSEMBL genbank flat file with
	genbank2gff3. pls
Message-ID: <69BA0F938FAC6A4CBEF49461720696F20796569C@nihexchange16.nih.gov>

    Greetings,
While parsing a genbank file taken from:
ftp://ftp.ensembl.org/pub/current_human/data/flatfiles/genbank/Homo_sapiens.
0.dat as of Jan 2005,
I'm getting the following unflattening error:
--------------------------------------------------------
Processing file /ENSEMBL/Homo_sapiens.0.dat...
working on contig
chromosome:NCBI35:1:1:994676:1...chromosome:NCBI35:1:1:994676:1 Unflattening
error:
Details: 
------------- EXCEPTION  -------------
MSG: PROBLEM, SEVERITY==2
no containers possible for SeqFeature of type: CDS; this SF is being placed
at root level
SF [Bio::SeqFeature::Generic=HASH(0x86485d8)]: CDS; ENSG00000146556

STACK Bio::SeqFeature::Tools::Unflattener::problem
/Bio/SeqFeature/Tools/Unflattener.pm:940
STACK Bio::SeqFeature::Tools::Unflattener::unflatten_group
/Bio/SeqFeature/Tools/Unflattener.pm:1983
STACK Bio::SeqFeature::Tools::Unflattener::unflatten_groups
/Bio/SeqFeature/Tools/Unflattener.pm:1744
STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq
/Bio/SeqFeature/Tools/Unflattener.pm:1449
STACK (eval) genbank2gff3.PLS:345
STACK main::unflatten_seq genbank2gff3.PLS:344
STACK toplevel genbank2gff3.PLS:209

--------------------------------------

Possible gene unflattening error withchromosome:NCBI35:1:1:994676:1: consult
STDERR

Using bioperl-1.5.0.RC2 under Linux.

    Would be grateful for the hint,
      Vladimir
From cjm at fruitfly.org  Mon Jan 17 14:51:37 2005
From: cjm at fruitfly.org (Chris Mungall)
Date: Mon Jan 17 14:47:48 2005
Subject: [Bioperl-l] Problem with parsing ENSEMBL genbank flat file with
	genbank2gff3. pls
In-Reply-To: <69BA0F938FAC6A4CBEF49461720696F20796569C@nihexchange16.nih.gov>
References: <69BA0F938FAC6A4CBEF49461720696F20796569C@nihexchange16.nih.gov>
Message-ID: 


Hi Vladimir

The genbank2gff3 script, in scripts/Bio-DB-GFF is attempting to recover
information often which the genbank flat file format loses; this is the
information about which mRNA relates to which CDS. You may or may not need
this information, it depends why you are doing the conversion. If you
don't need this, you may want just a straightforward genbank->gff
conversion. Let me know if this is what you want to do and I can help with
that.

If you _do_ wish to preserve the mRNA to CDS mappings, be aware that it
isn't always possible to recover these with 100% fidelity from the genbank
flat files. You may wish to pursue alternate approaches, such as
downloading ensembl as a mysql dump (any ensembl folks around.. any plans
to offer downloads in alternate formats such as gff3? This would be
fantastic)

If you'd prefer to carry on via the genbank flat file route, here's what
you should do:

* get the latest version of genbank2gff3.PLS I have just checked into cvs
(I can send you a copy if you are using a bioperl release and not cvs)

* run the script with the "--ethresh 3" option. This will raise the error
severity threshold at which problems with genbank file become
showstoppers.

In addition, I will take a look at this particular file and see what it is
that is causing problems and get back to you.

Cheers
Chris

On Mon, 17 Jan 2005, Babenko, Vladimir (NIH/NLM/NCBI) wrote:

>     Greetings,
> While parsing a genbank file taken from:
> ftp://ftp.ensembl.org/pub/current_human/data/flatfiles/genbank/Homo_sapiens.
> 0.dat as of Jan 2005,
> I'm getting the following unflattening error:
> --------------------------------------------------------
> Processing file /ENSEMBL/Homo_sapiens.0.dat...
> working on contig
> chromosome:NCBI35:1:1:994676:1...chromosome:NCBI35:1:1:994676:1 Unflattening
> error:
> Details:
> ------------- EXCEPTION  -------------
> MSG: PROBLEM, SEVERITY==2
> no containers possible for SeqFeature of type: CDS; this SF is being placed
> at root level
> SF [Bio::SeqFeature::Generic=HASH(0x86485d8)]: CDS; ENSG00000146556
>
> STACK Bio::SeqFeature::Tools::Unflattener::problem
> /Bio/SeqFeature/Tools/Unflattener.pm:940
> STACK Bio::SeqFeature::Tools::Unflattener::unflatten_group
> /Bio/SeqFeature/Tools/Unflattener.pm:1983
> STACK Bio::SeqFeature::Tools::Unflattener::unflatten_groups
> /Bio/SeqFeature/Tools/Unflattener.pm:1744
> STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq
> /Bio/SeqFeature/Tools/Unflattener.pm:1449
> STACK (eval) genbank2gff3.PLS:345
> STACK main::unflatten_seq genbank2gff3.PLS:344
> STACK toplevel genbank2gff3.PLS:209
>
> --------------------------------------
>
> Possible gene unflattening error withchromosome:NCBI35:1:1:994676:1: consult
> STDERR
>
> Using bioperl-1.5.0.RC2 under Linux.
>
>     Would be grateful for the hint,
>       Vladimir
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
From osmany.guirola at cigb.edu.cu  Mon Jan 17 10:58:14 2005
From: osmany.guirola at cigb.edu.cu (Osmany Guirola Cruz)
Date: Mon Jan 17 14:54:15 2005
Subject: [Bioperl-l] buried surface
Message-ID: <1105977494.2482.11.camel@draco.cigb.edu.cu>

Hi 
I am new in the list and i need to know how can i calculate the buried
surface of residues of my pdb file ... i want select some residues of my
pdb with a specific buried surface

Thanks

Osmany
 


From osmany.guirola at cigb.edu.cu  Mon Jan 17 11:02:20 2005
From: osmany.guirola at cigb.edu.cu (Osmany Guirola Cruz)
Date: Mon Jan 17 14:58:20 2005
Subject: [Bioperl-l] buried surface calculation
Message-ID: <1105977740.2482.15.camel@draco.cigb.edu.cu>

Hi 
i am new in the list and i need to know how can i calculate the buried 
surface for each residue... How can i do that? i need select some
residues froma PDB file with a specific value ?

Thanks 
Osmany




From R.J.Minshall at pgr.salford.ac.uk  Mon Jan 17 08:39:45 2005
From: R.J.Minshall at pgr.salford.ac.uk (Robert Minshall)
Date: Mon Jan 17 19:59:57 2005
Subject: [Bioperl-l] Feature table comparison
Message-ID: <1105969185.41ebc021ee0a5@webmail.salford.ac.uk>


Hi does any one know of or have a script that can compare the faeture tables of
genomes and show what appears on one and not the other. ie i want to find the
differenmces on the feature tables. is this possible i'm new to perl and was
hoping that someone could point me in the right direction. my email is
r.j.minshall@pgr.salford.ac.uk
thanks in advance
Rob Minshall

--
Robert J Minshall
Postgraduate Researcher in Microbiology,
Biosciences Research Institute,
School of Environment and Life Sciences,
Lab 209 Cockcroft Building,
University of Salford,
Salford,
Greater Manchester.
M5 4WT
UK
0161 2952652
r.j.mishall@pgr.salford.ac.uk






----------------------------------------------------------------
Concerns about content should be sent to abuse@salford.ac.uk
From cain at cshl.edu  Mon Jan 17 12:04:54 2005
From: cain at cshl.edu (Scott Cain)
Date: Mon Jan 17 20:00:00 2005
Subject: [Bioperl-l] Re: GFF3
In-Reply-To: <200501161451.j0GEpNKs028052@portal.open-bio.org>
Message-ID: 

Hi Rob,

Thanks for your work on this--I've put several comments in your
original message below.

Scott

---------Original Message--------
Date: Sat, 15 Jan 2005 15:22:23 -0800
From: Rob Edwards 
Subject: [Bioperl-l] GFF3
To: Bioperl list 

Because I need it for some things that I am doing, I have worked quite 
a bit on the GFF3 parser Bio::FeatureIO::gff. Several people have 
written this module, I have just made some cosmetic changes:

I have improved the validation processes that are applied as a gff3 
file is parsed, and the module should now validate essentially 
everything in the file except alignments. Validation is optional and is 
based on the specification described at : 
http://song.sourceforge.net/gff3.shtml

SC> Excellent--Did you happen to relax the requirement that ID be unique
SC> for each line of the GFF?  Allen and I put that in due to a misreading
SC> of the spec.  The ID has to be unique for a *feature*, which can be
SC> spread across several lines.

For clarification and edification I have created a couple of tables 
describing the module and the validation that is applied to GFF3 files, 
which you can see online: http://www.salmonella.org/bioperl/gff3.html

SC> Very nice and well done--do you happen to have a pod-ified version 
SC> of this page?  It would be nice to include in the pod for 
SC> Bio::FeatureIO::gff.

I also wrote a Bio::SeqIO::gff module. Since gff3 files can hold 
sequences, it seems that you'd want to be able to call the next_seq 
methods, and therefore SeqIO is more appropriate than FeatureIO for 
those aspects. Currently the SeqIO module uses the FeatureIO module for 
parsing the file, it just reorganizes things.

This provides two different interfaces for getting objects out of GFF3 
files:
	Bio::FeatureIO::gff will return Bio::SeqFeature::Annotated objects 
representing the annotations.
	Bio::SeqIO::gff will return Bio::Seq objects representing the 
sequences with all the annotations attached.

The other difference between the two is that the former passes out the 
objects as they are read, but the latter has to read the whole file to 
get the annotations and the sequences.

SC> I thought about doing something similar with SeqIO, but I am worried 
SC> about the case where somebody tries to use SeqIO on a well 
SC> annotated human Chr1 GFF3 file (if one were ever to exist :-) ,
SC> but I suppose the same machine killing thing could be done if
SC> someone tried to use SeqIO on a genbank file of Chr1.

At the moment I focussed on reading GFF3 files.

I have not committed these to cvs yet, pending comments from others. I 
have some specific questions:
	Should I wait until after 1.5 is out?

SC> I don't have the definative answer, but I would say it doesn't
SC> matter much, as long as it passes tests.  Bio::FeatureIO::gff is
SC> hardly a fully functional module as it is, so if we could 
SC> squeeze a little more functionality into it before we
SC> release it, that would be fine with me.

	Is two separate modules really the right way to go about this?

SC> As long as it works for this case, I don't mind:  calling
SC> 'next_feature' on a FeatureIO object until I run out of features
SC> and then calling 'next_sequence' (and get a Bio::PrimarySeq) on
SC> the same FeatureIO object until I run out of sequences.

	What about other GFF modules (like Bio::Tools::GFF)?

SC> I am willing to let Bio::Tools::GFF die a terrible death.  While
SC> it will have to be kept around for apps that depend on it, I don't
SC> see adding any major functionality as time well spent.

	Could someone give the modules a workout and let me know about bugs? I 
am sure there are many.

SC> I will try to soon, but it won't be until next week at 
SC> the earliest.

I have posted these modules online via anonymous ftp at 
ftp://ftp.salmonella.org/rob/bioperl/GFF_modules.tgz
Take a look and let me know what you do and don't like!

Rob


----------------------------------------------------------------------
Scott Cain, Ph. D.				 	 cain@cshl.org
GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
----------------------------------------------------------------------



From palmeida at igc.gulbenkian.pt  Mon Jan 17 12:56:07 2005
From: palmeida at igc.gulbenkian.pt (palmeida@igc.gulbenkian.pt)
Date: Mon Jan 17 20:00:02 2005
Subject: [Bioperl-l] regular expression help! (attached script)
Message-ID: <20050117175606.GB5318@bioinf.igc.gulbenkian.pt>


-- 
Paulo Almeida
Instituto Gulbenkian de Ciencia
Apartado 14, 2781-901, Oeiras, PORTUGAL
tel  +351 21 446 46 35
fax  +351 21 440 79 70
http://www.igc.gulbenkian.pt
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.pl
Type: text/x-perl
Size: 235 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050117/0ff6cec2/test.bin
-------------- next part --------------
tttaatatcaaagcatgggaaaggatatatcgatcgatgctacgatcatatcctttcccagcatacatataccata
From smarkel at scitegic.com  Mon Jan 17 21:00:18 2005
From: smarkel at scitegic.com (Scott Markel)
Date: Mon Jan 17 20:56:58 2005
Subject: [Bioperl-l] possible to skip parsing features when calling
	Bio::SeqIO::new?
Message-ID: <41EC6DB2.10205@scitegic.com>

I'm using BioPerl 1.4 to read a GenBank sequence file.  Is there
an option or parameter I can set in the

my $::seqIterator = Bio::SeqIO->new("-file"   => "$file",
                                     "-format" => "genbank");

call that will cause the parser to skip the features?

I checked BioPerl 1.5RC2 and didn't see any changes there
that would address my question.

Scott

-- 
Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel@scitegic.com
SciTegic Inc.                       mobile: +1 858 205 3653
9665 Chesapeake Drive, Suite 401    voice:  +1 858 279 8800, ext. 253
San Diego, CA 92123                 fax:    +1 858 279 8804
USA                                 web:    http://www.scitegic.com


From cain at cshl.edu  Mon Jan 17 21:10:19 2005
From: cain at cshl.edu (Scott Cain)
Date: Mon Jan 17 21:06:42 2005
Subject: [Bioperl-l] Problem with parsing ENSEMBL genbank flat file with
	genbank2gff3. pls
In-Reply-To: 
Message-ID: 

Hi Vladimir,

Not to ask a question on the level of "is it plugged in", but are you sure
it is a genbank formatted file?  I think you would get a different error
if it weren't, but I just wanted to make sure.

Scott

----------------------------------------------------------------------
Scott Cain, Ph. D.				 	 cain@cshl.org
GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
----------------------------------------------------------------------


On Mon, 17 Jan 2005, Chris Mungall wrote:

> 
> Hi Vladimir
> 
> The genbank2gff3 script, in scripts/Bio-DB-GFF is attempting to recover
> information often which the genbank flat file format loses; this is the
> information about which mRNA relates to which CDS. You may or may not need
> this information, it depends why you are doing the conversion. If you
> don't need this, you may want just a straightforward genbank->gff
> conversion. Let me know if this is what you want to do and I can help with
> that.
> 
> If you _do_ wish to preserve the mRNA to CDS mappings, be aware that it
> isn't always possible to recover these with 100% fidelity from the genbank
> flat files. You may wish to pursue alternate approaches, such as
> downloading ensembl as a mysql dump (any ensembl folks around.. any plans
> to offer downloads in alternate formats such as gff3? This would be
> fantastic)
> 
> If you'd prefer to carry on via the genbank flat file route, here's what
> you should do:
> 
> * get the latest version of genbank2gff3.PLS I have just checked into cvs
> (I can send you a copy if you are using a bioperl release and not cvs)
> 
> * run the script with the "--ethresh 3" option. This will raise the error
> severity threshold at which problems with genbank file become
> showstoppers.
> 
> In addition, I will take a look at this particular file and see what it is
> that is causing problems and get back to you.
> 
> Cheers
> Chris
> 
> On Mon, 17 Jan 2005, Babenko, Vladimir (NIH/NLM/NCBI) wrote:
> 
> >     Greetings,
> > While parsing a genbank file taken from:
> > ftp://ftp.ensembl.org/pub/current_human/data/flatfiles/genbank/Homo_sapiens.
> > 0.dat as of Jan 2005,
> > I'm getting the following unflattening error:
> > --------------------------------------------------------
> > Processing file /ENSEMBL/Homo_sapiens.0.dat...
> > working on contig
> > chromosome:NCBI35:1:1:994676:1...chromosome:NCBI35:1:1:994676:1 Unflattening
> > error:
> > Details:
> > ------------- EXCEPTION  -------------
> > MSG: PROBLEM, SEVERITY==2
> > no containers possible for SeqFeature of type: CDS; this SF is being placed
> > at root level
> > SF [Bio::SeqFeature::Generic=HASH(0x86485d8)]: CDS; ENSG00000146556
> >
> > STACK Bio::SeqFeature::Tools::Unflattener::problem
> > /Bio/SeqFeature/Tools/Unflattener.pm:940
> > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_group
> > /Bio/SeqFeature/Tools/Unflattener.pm:1983
> > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_groups
> > /Bio/SeqFeature/Tools/Unflattener.pm:1744
> > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq
> > /Bio/SeqFeature/Tools/Unflattener.pm:1449
> > STACK (eval) genbank2gff3.PLS:345
> > STACK main::unflatten_seq genbank2gff3.PLS:344
> > STACK toplevel genbank2gff3.PLS:209
> >
> > --------------------------------------
> >
> > Possible gene unflattening error withchromosome:NCBI35:1:1:994676:1: consult
> > STDERR
> >
> > Using bioperl-1.5.0.RC2 under Linux.
> >
> >     Would be grateful for the hint,
> >       Vladimir
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> 

From brian_osborne at cognia.com  Mon Jan 17 21:20:47 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Mon Jan 17 21:19:47 2005
Subject: [Bioperl-l] possible to skip parsing features when
	callingBio::SeqIO::new?
In-Reply-To: <41EC6DB2.10205@scitegic.com>
Message-ID: 

Scott,

Why would you want to do this? I can imagine one reason, that there's some
problem with a feature causing the script to exit. In that case do something
like:

my $seq;
eval { $seq = $seqIterator->next_seq; };


Brian O.

-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Scott Markel
Sent: Monday, January 17, 2005 9:00 PM
To: Bioperl-l@portal.open-bio.org
Subject: [Bioperl-l] possible to skip parsing features when
callingBio::SeqIO::new?


I'm using BioPerl 1.4 to read a GenBank sequence file.  Is there
an option or parameter I can set in the

my $::seqIterator = Bio::SeqIO->new("-file"   => "$file",
                                     "-format" => "genbank");

call that will cause the parser to skip the features?

I checked BioPerl 1.5RC2 and didn't see any changes there
that would address my question.

Scott

--
Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel@scitegic.com
SciTegic Inc.                       mobile: +1 858 205 3653
9665 Chesapeake Drive, Suite 401    voice:  +1 858 279 8800, ext. 253
San Diego, CA 92123                 fax:    +1 858 279 8804
USA                                 web:    http://www.scitegic.com


_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at duke.edu  Mon Jan 17 21:21:57 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon Jan 17 21:19:54 2005
Subject: [Bioperl-l] Problem with parsing ENSEMBL genbank flat file with
	genbank2gff3. pls
In-Reply-To: 
References: <69BA0F938FAC6A4CBEF49461720696F20796569C@nihexchange16.nih.gov>
	
Message-ID: 

I have been using EnsMart to  grab GFF2/GTF or (GFF-like output and  
reformatting it for GFF3) with reasonable success.  You probably want  
just the output columns so you can reformat things to have CDS  
start/end and the Gen, Exon->Transcript->Peptide identifiers all in the  
same report

This is a lot easier than parsing genbank flatfiles and the whole point  
of ensmart.

-jason
On Jan 17, 2005, at 2:51 PM, Chris Mungall wrote:

>
> Hi Vladimir
>
> The genbank2gff3 script, in scripts/Bio-DB-GFF is attempting to recover
> information often which the genbank flat file format loses; this is the
> information about which mRNA relates to which CDS. You may or may not  
> need
> this information, it depends why you are doing the conversion. If you
> don't need this, you may want just a straightforward genbank->gff
> conversion. Let me know if this is what you want to do and I can help  
> with
> that.
>
> If you _do_ wish to preserve the mRNA to CDS mappings, be aware that it
> isn't always possible to recover these with 100% fidelity from the  
> genbank
> flat files. You may wish to pursue alternate approaches, such as
> downloading ensembl as a mysql dump (any ensembl folks around.. any  
> plans
> to offer downloads in alternate formats such as gff3? This would be
> fantastic)
>
> If you'd prefer to carry on via the genbank flat file route, here's  
> what
> you should do:
>
> * get the latest version of genbank2gff3.PLS I have just checked into  
> cvs
> (I can send you a copy if you are using a bioperl release and not cvs)
>
> * run the script with the "--ethresh 3" option. This will raise the  
> error
> severity threshold at which problems with genbank file become
> showstoppers.
>
> In addition, I will take a look at this particular file and see what  
> it is
> that is causing problems and get back to you.
>
> Cheers
> Chris
>
> On Mon, 17 Jan 2005, Babenko, Vladimir (NIH/NLM/NCBI) wrote:
>
>>     Greetings,
>> While parsing a genbank file taken from:
>> ftp://ftp.ensembl.org/pub/current_human/data/flatfiles/genbank/ 
>> Homo_sapiens.
>> 0.dat as of Jan 2005,
>> I'm getting the following unflattening error:
>> --------------------------------------------------------
>> Processing file /ENSEMBL/Homo_sapiens.0.dat...
>> working on contig
>> chromosome:NCBI35:1:1:994676:1...chromosome:NCBI35:1:1:994676:1  
>> Unflattening
>> error:
>> Details:
>> ------------- EXCEPTION  -------------
>> MSG: PROBLEM, SEVERITY==2
>> no containers possible for SeqFeature of type: CDS; this SF is being  
>> placed
>> at root level
>> SF [Bio::SeqFeature::Generic=HASH(0x86485d8)]: CDS; ENSG00000146556
>>
>> STACK Bio::SeqFeature::Tools::Unflattener::problem
>> /Bio/SeqFeature/Tools/Unflattener.pm:940
>> STACK Bio::SeqFeature::Tools::Unflattener::unflatten_group
>> /Bio/SeqFeature/Tools/Unflattener.pm:1983
>> STACK Bio::SeqFeature::Tools::Unflattener::unflatten_groups
>> /Bio/SeqFeature/Tools/Unflattener.pm:1744
>> STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq
>> /Bio/SeqFeature/Tools/Unflattener.pm:1449
>> STACK (eval) genbank2gff3.PLS:345
>> STACK main::unflatten_seq genbank2gff3.PLS:344
>> STACK toplevel genbank2gff3.PLS:209
>>
>> --------------------------------------
>>
>> Possible gene unflattening error withchromosome:NCBI35:1:1:994676:1:  
>> consult
>> STDERR
>>
>> Using bioperl-1.5.0.RC2 under Linux.
>>
>>     Would be grateful for the hint,
>>       Vladimir
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From jason.stajich at duke.edu  Mon Jan 17 21:32:06 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon Jan 17 21:28:41 2005
Subject: [Bioperl-l] possible to skip parsing features when calling
	Bio::SeqIO::new?
In-Reply-To: <41EC6DB2.10205@scitegic.com>
References: <41EC6DB2.10205@scitegic.com>
Message-ID: <2580819E-68F9-11D9-83BC-000393C44276@duke.edu>

See the docs for Bio::Seq::SeqBuilder

I think this will work:
my $seqIterator = Bio::SeqIO->new("-file"   => "$file",
                                     "-format" => "genbank");
$seqIterator->sequence_builder->add_unwanted_slot('features');

If you additionally don't want the Annotations (references,etc)
$seqIterator->sequence_builder->add_unwanted_slot('features', 
'annotation');

[don't ask why one is plural and other singular... =)]

-jason
On Jan 17, 2005, at 9:00 PM, Scott Markel wrote:

> I'm using BioPerl 1.4 to read a GenBank sequence file.  Is there
> an option or parameter I can set in the
>
> my $::seqIterator = Bio::SeqIO->new("-file"   => "$file",
>                                     "-format" => "genbank");
>
> call that will cause the parser to skip the features?
>
> I checked BioPerl 1.5RC2 and didn't see any changes there
> that would address my question.
>
> Scott
>
> -- 
> Scott Markel, Ph.D.
> Principal Bioinformatics Architect  email:  smarkel@scitegic.com
> SciTegic Inc.                       mobile: +1 858 205 3653
> 9665 Chesapeake Drive, Suite 401    voice:  +1 858 279 8800, ext. 253
> San Diego, CA 92123                 fax:    +1 858 279 8804
> USA                                 web:    http://www.scitegic.com
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From cjm at fruitfly.org  Mon Jan 17 21:33:31 2005
From: cjm at fruitfly.org (Chris Mungall)
Date: Mon Jan 17 21:29:39 2005
Subject: [Bioperl-l] Problem with parsing ENSEMBL genbank flat file with
	genbank2gff3. pls
In-Reply-To: 
References: 
Message-ID: 


It is a genbank formatted file - you can download it from the url Vladmir
provides below.

There seem to be a few oddities to do with the ensembl-flavour genbank
format which may be causing problems for the unflattener:

* There doesn't appear to be any 'gene' features - a gene model is just
mRNAs and CDSs. This means the files don't even contain essential stuff
like the gene symbol!

* In the feature entry, for the reverse strand, ensembl nests the
complement function inside the join function, listing sublocations in a
3'->5' direction. This is unusual, but not problemmatic in itself.
However, I'm not 100% convinced that the bioperl genbank parser handles
these cases correctly - I will expand on this in another email. It's not
a problem for the vast majority of cases, but it will be problemmatic for
certain rare situations where the sublocations are of mixed strand (eg
trans-spliced genes).

I can implement a hack in the unflattener for the first problem. However,
the question is - is it worth it? Without the gene feature the
ensembl-flavoured genbank files seem not particularly useful (granted it
is possible to get the gene data by integrating with LocusLink/EntrezGene
but is it worth it?). I know for a fact that the data structures
underlying ensembl are sound, so it seems counterproductive to use nothing
but genbank/embl as a flat file distribution format (and to drop the gene
features on top of that!). I know ensembl use GTF a lot internally, it
would be great to see use made of this format (or even better, GFF3) for
data distribution. Perhaps there's something I'm missing here.. I'll wait
for comment from someone from ensembl before progressing here, to avoid
any pointless work...

Cheers
Chris

On Mon, 17 Jan 2005, Scott Cain wrote:

> Hi Vladimir,
>
> Not to ask a question on the level of "is it plugged in", but are you sure
> it is a genbank formatted file?  I think you would get a different error
> if it weren't, but I just wanted to make sure.
>
> Scott
>
> ----------------------------------------------------------------------
> Scott Cain, Ph. D.				 	 cain@cshl.org
> GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
> ----------------------------------------------------------------------
>
>
> On Mon, 17 Jan 2005, Chris Mungall wrote:
>
> >
> > Hi Vladimir
> >
> > The genbank2gff3 script, in scripts/Bio-DB-GFF is attempting to recover
> > information often which the genbank flat file format loses; this is the
> > information about which mRNA relates to which CDS. You may or may not need
> > this information, it depends why you are doing the conversion. If you
> > don't need this, you may want just a straightforward genbank->gff
> > conversion. Let me know if this is what you want to do and I can help with
> > that.
> >
> > If you _do_ wish to preserve the mRNA to CDS mappings, be aware that it
> > isn't always possible to recover these with 100% fidelity from the genbank
> > flat files. You may wish to pursue alternate approaches, such as
> > downloading ensembl as a mysql dump (any ensembl folks around.. any plans
> > to offer downloads in alternate formats such as gff3? This would be
> > fantastic)
> >
> > If you'd prefer to carry on via the genbank flat file route, here's what
> > you should do:
> >
> > * get the latest version of genbank2gff3.PLS I have just checked into cvs
> > (I can send you a copy if you are using a bioperl release and not cvs)
> >
> > * run the script with the "--ethresh 3" option. This will raise the error
> > severity threshold at which problems with genbank file become
> > showstoppers.
> >
> > In addition, I will take a look at this particular file and see what it is
> > that is causing problems and get back to you.
> >
> > Cheers
> > Chris
> >
> > On Mon, 17 Jan 2005, Babenko, Vladimir (NIH/NLM/NCBI) wrote:
> >
> > >     Greetings,
> > > While parsing a genbank file taken from:
> > > ftp://ftp.ensembl.org/pub/current_human/data/flatfiles/genbank/Homo_sapiens.
> > > 0.dat as of Jan 2005,
> > > I'm getting the following unflattening error:
> > > --------------------------------------------------------
> > > Processing file /ENSEMBL/Homo_sapiens.0.dat...
> > > working on contig
> > > chromosome:NCBI35:1:1:994676:1...chromosome:NCBI35:1:1:994676:1 Unflattening
> > > error:
> > > Details:
> > > ------------- EXCEPTION  -------------
> > > MSG: PROBLEM, SEVERITY==2
> > > no containers possible for SeqFeature of type: CDS; this SF is being placed
> > > at root level
> > > SF [Bio::SeqFeature::Generic=HASH(0x86485d8)]: CDS; ENSG00000146556
> > >
> > > STACK Bio::SeqFeature::Tools::Unflattener::problem
> > > /Bio/SeqFeature/Tools/Unflattener.pm:940
> > > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_group
> > > /Bio/SeqFeature/Tools/Unflattener.pm:1983
> > > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_groups
> > > /Bio/SeqFeature/Tools/Unflattener.pm:1744
> > > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq
> > > /Bio/SeqFeature/Tools/Unflattener.pm:1449
> > > STACK (eval) genbank2gff3.PLS:345
> > > STACK main::unflatten_seq genbank2gff3.PLS:344
> > > STACK toplevel genbank2gff3.PLS:209
> > >
> > > --------------------------------------
> > >
> > > Possible gene unflattening error withchromosome:NCBI35:1:1:994676:1: consult
> > > STDERR
> > >
> > > Using bioperl-1.5.0.RC2 under Linux.
> > >
> > >     Would be grateful for the hint,
> > >       Vladimir
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l@portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
>
>
From smarkel at scitegic.com  Mon Jan 17 21:37:10 2005
From: smarkel at scitegic.com (Scott Markel)
Date: Mon Jan 17 21:33:35 2005
Subject: [Bioperl-l] possible to skip parsing features when
	callingBio::SeqIO::new?
In-Reply-To: 
References: 
Message-ID: <41EC7656.5040204@scitegic.com>

Brian,

The use case is when a user has many sequences to read and
is only interested in the sequence data for use in predicting
new features.  The user is likely to come back later and
look at some sequences in detail, so they only want to parse
the GenBank features then.  For the first pass, they would
like the reading sped up by omitting some of the parsing.

Scott

Brian Osborne wrote:

> Scott,
> 
> Why would you want to do this? I can imagine one reason, that there's some
> problem with a feature causing the script to exit. In that case do something
> like:
> 
> my $seq;
> eval { $seq = $seqIterator->next_seq; };
> 
> 
> Brian O.
> 
> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org
> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Scott Markel
> Sent: Monday, January 17, 2005 9:00 PM
> To: Bioperl-l@portal.open-bio.org
> Subject: [Bioperl-l] possible to skip parsing features when
> callingBio::SeqIO::new?
> 
> 
> I'm using BioPerl 1.4 to read a GenBank sequence file.  Is there
> an option or parameter I can set in the
> 
> my $::seqIterator = Bio::SeqIO->new("-file"   => "$file",
>                                      "-format" => "genbank");
> 
> call that will cause the parser to skip the features?
> 
> I checked BioPerl 1.5RC2 and didn't see any changes there
> that would address my question.
> 
> Scott

-- 
Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel@scitegic.com
SciTegic Inc.                       mobile: +1 858 205 3653
9665 Chesapeake Drive, Suite 401    voice:  +1 858 279 8800, ext. 253
San Diego, CA 92123                 fax:    +1 858 279 8804
USA                                 web:    http://www.scitegic.com

From neil.saunders at unsw.edu.au  Mon Jan 17 21:40:01 2005
From: neil.saunders at unsw.edu.au (Neil Saunders)
Date: Mon Jan 17 21:35:32 2005
Subject: [Bioperl-l] re:  buried surface calculation
Message-ID: <20050118024001.GA2699@psychro>

hi Osmany,

There are a couple of solutions to your problem.  First, you will need 
to process your PDB file with something that calculates the required 
surface areas.  You may be able to use DSSP for that, which would be 
nice because Bioperl includes a module for parsing DSSP output.  It is 
documented here:

http://doc.bioperl.org/releases/bioperl-1.4/Bio/Structure/SecStr/DSSP/toc.html

The method:
$solv_acc = $dssp_obj->resSolvAcc( RESIDUE_ID );

returns the solvent-accessible area of a residue, so if you knew the 
total surface area, you could calculate what was buried.

Another option might be to use the program naccess:

http://wolf.bms.umist.ac.uk/naccess/

There is no Bioperl module for this output so far as I know (it's 
something I'd like to write one day).  But the output (a .rsa file) is 
quite easy to parse, as it is mostly space-delimited columns.

I wrote a few scripts to process naccess output some years ago.  You 
might get some ideas from them, see 'surface_charge.pl' and 
'parse_nacc_core2.pl' at my CVS server:

http://psychro.bioinformatics.unsw.edu.au/cgi-bin/viewcvs.cgi/GenRes2003/scripts/#dirlist

I knew very little Perl when I wrote these so they are embarassingly 
awful, but they may give you an idea of how .rsa files can be parsed.


Neil
-- 
 School of Biotechnology and Biomolecular Sciences,
 The University of New South Wales,
 Sydney 2052,
 Australia

http://psychro.bioinformatics.unsw.edu.au/neil/index.php
From smarkel at scitegic.com  Mon Jan 17 21:53:05 2005
From: smarkel at scitegic.com (Scott Markel)
Date: Mon Jan 17 21:49:35 2005
Subject: [Bioperl-l] possible to skip parsing features when calling
	Bio::SeqIO::new?
In-Reply-To: <2580819E-68F9-11D9-83BC-000393C44276@duke.edu>
References: <41EC6DB2.10205@scitegic.com>
	<2580819E-68F9-11D9-83BC-000393C44276@duke.edu>
Message-ID: <41EC7A11.6050903@scitegic.com>

Jason,

Excellent!  Thank you.

Scott

Jason Stajich wrote:

> See the docs for Bio::Seq::SeqBuilder
> 
> I think this will work:
> my $seqIterator = Bio::SeqIO->new("-file"   => "$file",
>                                     "-format" => "genbank");
> $seqIterator->sequence_builder->add_unwanted_slot('features');
> 
> If you additionally don't want the Annotations (references,etc)
> $seqIterator->sequence_builder->add_unwanted_slot('features', 
> 'annotation');
> 
> [don't ask why one is plural and other singular... =)]
> 
> -jason
> On Jan 17, 2005, at 9:00 PM, Scott Markel wrote:
> 
>> I'm using BioPerl 1.4 to read a GenBank sequence file.  Is there
>> an option or parameter I can set in the
>>
>> my $::seqIterator = Bio::SeqIO->new("-file"   => "$file",
>>                                     "-format" => "genbank");
>>
>> call that will cause the parser to skip the features?
>>
>> I checked BioPerl 1.5RC2 and didn't see any changes there
>> that would address my question.
>>
>> Scott
>>
> -- 
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
> 
> 

-- 
Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel@scitegic.com
SciTegic Inc.                       mobile: +1 858 205 3653
9665 Chesapeake Drive, Suite 401    voice:  +1 858 279 8800, ext. 253
San Diego, CA 92123                 fax:    +1 858 279 8804
USA                                 web:    http://www.scitegic.com

From allenday at ucla.edu  Tue Jan 18 00:27:21 2005
From: allenday at ucla.edu (Allen Day)
Date: Tue Jan 18 00:23:31 2005
Subject: [Bioperl-l] GFF3
In-Reply-To: <4FC537A9-674C-11D9-9C9B-000A959E1622@salmonella.org>
References: <4FC537A9-674C-11D9-9C9B-000A959E1622@salmonella.org>
Message-ID: 

Hi Rob,

I looked at FeatureIO::gff and merged in your changes with some
modifications.

I also added a next_seq() method to FeatureIO::gff that is activated when
a /^##FASTA/ or /^>/ line is encountered.  Functionality delegates to
Bio::SeqIO's fasta parser.  I think this obviates the need for
Bio::SeqIO::gff.

Please update your repository and have a look at t/FeatureIO.t (unit test
for FeatureIO, also added).

-Allen


On Sat, 15 Jan 2005, Rob Edwards wrote:

> Because I need it for some things that I am doing, I have worked quite 
> a bit on the GFF3 parser Bio::FeatureIO::gff. Several people have 
> written this module, I have just made some cosmetic changes:
> 
> I have improved the validation processes that are applied as a gff3 
> file is parsed, and the module should now validate essentially 
> everything in the file except alignments. Validation is optional and is 
> based on the specification described at : 
> http://song.sourceforge.net/gff3.shtml
> 
> For clarification and edification I have created a couple of tables 
> describing the module and the validation that is applied to GFF3 files, 
> which you can see online: http://www.salmonella.org/bioperl/gff3.html
> 
> I also wrote a Bio::SeqIO::gff module. Since gff3 files can hold 
> sequences, it seems that you'd want to be able to call the next_seq 
> methods, and therefore SeqIO is more appropriate than FeatureIO for 
> those aspects. Currently the SeqIO module uses the FeatureIO module for 
> parsing the file, it just reorganizes things.
> 
> This provides two different interfaces for getting objects out of GFF3 
> files:
> 	Bio::FeatureIO::gff will return Bio::SeqFeature::Annotated objects 
> representing the annotations.
> 	Bio::SeqIO::gff will return Bio::Seq objects representing the 
> sequences with all the annotations attached.
> 
> The other difference between the two is that the former passes out the 
> objects as they are read, but the latter has to read the whole file to 
> get the annotations and the sequences.
> 
> At the moment I focussed on reading GFF3 files.
> 
> I have not committed these to cvs yet, pending comments from others. I 
> have some specific questions:
> 	Should I wait until after 1.5 is out?
> 	Is two separate modules really the right way to go about this?
> 	What about other GFF modules (like Bio::Tools::GFF)?
> 	Could someone give the modules a workout and let me know about bugs? I 
> am sure there are many.
> 
> I have posted these modules online via anonymous ftp at 
> ftp://ftp.salmonella.org/rob/bioperl/GFF_modules.tgz
> Take a look and let me know what you do and don't like!
> 
> Rob
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
From allenday at ucla.edu  Tue Jan 18 00:34:01 2005
From: allenday at ucla.edu (Allen Day)
Date: Tue Jan 18 00:30:16 2005
Subject: [Bioperl-l] Re: GFF3
In-Reply-To: 
References: 
Message-ID: 

Hi,

On Mon, 17 Jan 2005, Scott Cain wrote:

> Hi Rob,
> 
> Thanks for your work on this--I've put several comments in your
> original message below.
> 
> Scott
> 
> ---------Original Message--------
> Date: Sat, 15 Jan 2005 15:22:23 -0800
> From: Rob Edwards 
> Subject: [Bioperl-l] GFF3
> To: Bioperl list 
> 
> Because I need it for some things that I am doing, I have worked quite 
> a bit on the GFF3 parser Bio::FeatureIO::gff. Several people have 
> written this module, I have just made some cosmetic changes:
> 
> I have improved the validation processes that are applied as a gff3 
> file is parsed, and the module should now validate essentially 
> everything in the file except alignments. Validation is optional and is 
> based on the specification described at : 
> http://song.sourceforge.net/gff3.shtml
> 
> SC> Excellent--Did you happen to relax the requirement that ID be unique
> SC> for each line of the GFF?  Allen and I put that in due to a misreading
> SC> of the spec.  The ID has to be unique for a *feature*, which can be
> SC> spread across several lines.

I'm not sure if this is taken care of in the code... actually, I'm a bit 
foggy on exactly what the problem is.

> For clarification and edification I have created a couple of tables
> describing the module and the validation that is applied to GFF3 files,
> which you can see online: http://www.salmonella.org/bioperl/gff3.html
> 
> SC> Very nice and well done--do you happen to have a pod-ified version
> SC> of this page?  It would be nice to include in the pod for
> SC> Bio::FeatureIO::gff.

That's nice, I'd like to see it folded into the gff.pm perldoc as well.

> I also wrote a Bio::SeqIO::gff module. Since gff3 files can hold 
> sequences, it seems that you'd want to be able to call the next_seq 
> methods, and therefore SeqIO is more appropriate than FeatureIO for 
> those aspects. Currently the SeqIO module uses the FeatureIO module for 
> parsing the file, it just reorganizes things.
> 
> This provides two different interfaces for getting objects out of GFF3 
> files:
> 	Bio::FeatureIO::gff will return Bio::SeqFeature::Annotated objects 
> representing the annotations.
> 	Bio::SeqIO::gff will return Bio::Seq objects representing the 
> sequences with all the annotations attached.
> 
> The other difference between the two is that the former passes out the 
> objects as they are read, but the latter has to read the whole file to 
> get the annotations and the sequences.
> 
> SC> I thought about doing something similar with SeqIO, but I am worried 
> SC> about the case where somebody tries to use SeqIO on a well 
> SC> annotated human Chr1 GFF3 file (if one were ever to exist :-) ,
> SC> but I suppose the same machine killing thing could be done if
> SC> someone tried to use SeqIO on a genbank file of Chr1.

See my previous email, I don't think we need the SeqIO module.

> At the moment I focussed on reading GFF3 files.
> 
> I have not committed these to cvs yet, pending comments from others. I 
> have some specific questions:
> 	Should I wait until after 1.5 is out?
> 
> SC> I don't have the definative answer, but I would say it doesn't
> SC> matter much, as long as it passes tests.  Bio::FeatureIO::gff is
> SC> hardly a fully functional module as it is, so if we could 
> SC> squeeze a little more functionality into it before we
> SC> release it, that would be fine with me.

well it's in now.  and it passes tests.  there weren't any before, but i 
wrote some.  look in t/FeatureIO.t

> 	Is two separate modules really the right way to go about this?
> 
> SC> As long as it works for this case, I don't mind:  calling
> SC> 'next_feature' on a FeatureIO object until I run out of features
> SC> and then calling 'next_sequence' (and get a Bio::PrimarySeq) on
> SC> the same FeatureIO object until I run out of sequences.
> 
> 	What about other GFF modules (like Bio::Tools::GFF)?
> 
> SC> I am willing to let Bio::Tools::GFF die a terrible death.  While
> SC> it will have to be kept around for apps that depend on it, I don't
> SC> see adding any major functionality as time well spent.
> 
> 	Could someone give the modules a workout and let me know about bugs? I 
> am sure there are many.
> 
> SC> I will try to soon, but it won't be until next week at 
> SC> the earliest.
> 
> I have posted these modules online via anonymous ftp at 
> ftp://ftp.salmonella.org/rob/bioperl/GFF_modules.tgz
> Take a look and let me know what you do and don't like!
> 
> Rob
> 
> 
> ----------------------------------------------------------------------
> Scott Cain, Ph. D.				 	 cain@cshl.org
> GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
> ----------------------------------------------------------------------
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
From rob at salmonella.org  Tue Jan 18 03:46:05 2005
From: rob at salmonella.org (Rob Edwards)
Date: Tue Jan 18 03:42:17 2005
Subject: [Bioperl-l] Re: GFF3
In-Reply-To: 
References: 
	
Message-ID: <64201ADE-692D-11D9-9265-000A959E1622@salmonella.org>

Thanks for you help and comments. Here are a couple of points, and I'll 
work on filling in some of the other gaps.


>> SC> Excellent--Did you happen to relax the requirement that ID be 
>> unique
>> SC> for each line of the GFF?  Allen and I put that in due to a 
>> misreading
>> SC> of the spec.  The ID has to be unique for a *feature*, which can 
>> be
>> SC> spread across several lines.
>
> I'm not sure if this is taken care of in the code... actually, I'm a 
> bit
> foggy on exactly what the problem is.

It is not corrected yet. The problem is this section, around line 669. 
Its not true that each line can only have one ID, and this can be 
removed.

   if($attr{ID}){
     if(scalar( @{ $attr{ID} } ) > 1){
       $self->throw("Error in line:\n$feature_string\nA feature may have 
at most one ID value");
     }


>> For clarification and edification I have created a couple of tables
>> describing the module and the validation that is applied to GFF3 
>> files,
>> which you can see online: http://www.salmonella.org/bioperl/gff3.html
>>
>> SC> Very nice and well done--do you happen to have a pod-ified version
>> SC> of this page?  It would be nice to include in the pod for
>> SC> Bio::FeatureIO::gff.
>
> That's nice, I'd like to see it folded into the gff.pm perldoc as well.

I'll take care of PODify it over the next couple of days.


>> SC> I don't have the definative answer, but I would say it doesn't
>> SC> matter much, as long as it passes tests.  Bio::FeatureIO::gff is
>> SC> hardly a fully functional module as it is, so if we could
>> SC> squeeze a little more functionality into it before we
>> SC> release it, that would be fine with me.
>
> well it's in now.  and it passes tests.  there weren't any before, but 
> i
> wrote some.  look in t/FeatureIO.t

Thanks for those, however at the moment the tests failed. See below.

The first series of errors die because the feature ID=AB000114 in 
t/data/knownGene.gff3 has several Dbxrefs separated with ';' instead of 
','

The second failure is because  hybrid1.gff3 isn't in cvs

Rob



% perl -I. -w t/FeatureIO.t
1..19
ok 1
ok 2
ok 3
ok 4
ok 5
ok 6
Use of uninitialized value in substitution (s///) at 
Bio/FeatureIO/gff.pm line 590,  line 10.
Use of uninitialized value in substitution (s///) at 
Bio/FeatureIO/gff.pm line 591,  line 10.
Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
 line 10.
Use of uninitialized value in substitution (s///) at 
Bio/FeatureIO/gff.pm line 590,  line 10.
Use of uninitialized value in substitution (s///) at 
Bio/FeatureIO/gff.pm line 591,  line 10.
Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
 line 10.
Use of uninitialized value in substitution (s///) at 
Bio/FeatureIO/gff.pm line 590,  line 10.
Use of uninitialized value in substitution (s///) at 
Bio/FeatureIO/gff.pm line 591,  line 10.
Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
 line 10.
Use of uninitialized value in substitution (s///) at 
Bio/FeatureIO/gff.pm line 590,  line 10.
Use of uninitialized value in substitution (s///) at 
Bio/FeatureIO/gff.pm line 591,  line 10.
Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
 line 10.
Use of uninitialized value in substitution (s///) at 
Bio/FeatureIO/gff.pm line 590,  line 10.
Use of uninitialized value in substitution (s///) at 
Bio/FeatureIO/gff.pm line 591,  line 10.
Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
 line 10.
Use of uninitialized value in substitution (s///) at 
Bio/FeatureIO/gff.pm line 590,  line 10.
Use of uninitialized value in substitution (s///) at 
Bio/FeatureIO/gff.pm line 591,  line 10.
Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
 line 10.
ok 7
ok 8

------------- EXCEPTION  -------------
MSG: Could not open t/data/hybrid1.gff3: No such file or directory
STACK Bio::Root::IO::_initialize_io Bio/Root/IO.pm:314
STACK Bio::FeatureIO::_initialize Bio/FeatureIO.pm:345
STACK Bio::FeatureIO::gff::_initialize Bio/FeatureIO/gff.pm:92
STACK Bio::FeatureIO::new Bio/FeatureIO.pm:268
STACK Bio::FeatureIO::new Bio/FeatureIO.pm:288
STACK toplevel t/FeatureIO.t:83

--------------------------------------

From rob at salmonella.org  Tue Jan 18 03:46:11 2005
From: rob at salmonella.org (Rob Edwards)
Date: Tue Jan 18 03:42:28 2005
Subject: [Bioperl-l] GFF3
In-Reply-To: 
References: <4FC537A9-674C-11D9-9C9B-000A959E1622@salmonella.org>
	
Message-ID: <67606A0B-692D-11D9-9265-000A959E1622@salmonella.org>

I don't really feel that strongly about this, but it seems that if I 
were downloading a gff3 file and wanted to read the sequence I would 
probably look in SeqIO for a reader. That was my primary rationale for 
Bio::SeqIO::gff.

Rob


On Jan 17, 2005, at 9:27 PM, Allen Day wrote:

> Hi Rob,
>
> I looked at FeatureIO::gff and merged in your changes with some
> modifications.
>
> I also added a next_seq() method to FeatureIO::gff that is activated 
> when
> a /^##FASTA/ or /^>/ line is encountered.  Functionality delegates to
> Bio::SeqIO's fasta parser.  I think this obviates the need for
> Bio::SeqIO::gff.
>
> Please update your repository and have a look at t/FeatureIO.t (unit 
> test
> for FeatureIO, also added).
>
> -Allen


From allenday at ucla.edu  Tue Jan 18 03:54:31 2005
From: allenday at ucla.edu (Allen Day)
Date: Tue Jan 18 03:50:41 2005
Subject: [Bioperl-l] Re: GFF3
In-Reply-To: <64201ADE-692D-11D9-9265-000A959E1622@salmonella.org>
References: 
	
	<64201ADE-692D-11D9-9265-000A959E1622@salmonella.org>
Message-ID: 

> The first series of errors die because the feature ID=AB000114 in 
> t/data/knownGene.gff3 has several Dbxrefs separated with ';' instead of 
> ','

i'm not getting these errors, are you are in sync with cvs HEAD?

> The second failure is because  hybrid1.gff3 isn't in cvs

gff files are in cvs now.

> 
> Rob
> 
> 
> 
> % perl -I. -w t/FeatureIO.t
> 1..19
> ok 1
> ok 2
> ok 3
> ok 4
> ok 5
> ok 6
> Use of uninitialized value in substitution (s///) at 
> Bio/FeatureIO/gff.pm line 590,  line 10.
> Use of uninitialized value in substitution (s///) at 
> Bio/FeatureIO/gff.pm line 591,  line 10.
> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
>  line 10.
> Use of uninitialized value in substitution (s///) at 
> Bio/FeatureIO/gff.pm line 590,  line 10.
> Use of uninitialized value in substitution (s///) at 
> Bio/FeatureIO/gff.pm line 591,  line 10.
> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
>  line 10.
> Use of uninitialized value in substitution (s///) at 
> Bio/FeatureIO/gff.pm line 590,  line 10.
> Use of uninitialized value in substitution (s///) at 
> Bio/FeatureIO/gff.pm line 591,  line 10.
> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
>  line 10.
> Use of uninitialized value in substitution (s///) at 
> Bio/FeatureIO/gff.pm line 590,  line 10.
> Use of uninitialized value in substitution (s///) at 
> Bio/FeatureIO/gff.pm line 591,  line 10.
> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
>  line 10.
> Use of uninitialized value in substitution (s///) at 
> Bio/FeatureIO/gff.pm line 590,  line 10.
> Use of uninitialized value in substitution (s///) at 
> Bio/FeatureIO/gff.pm line 591,  line 10.
> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
>  line 10.
> Use of uninitialized value in substitution (s///) at 
> Bio/FeatureIO/gff.pm line 590,  line 10.
> Use of uninitialized value in substitution (s///) at 
> Bio/FeatureIO/gff.pm line 591,  line 10.
> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
>  line 10.
> ok 7
> ok 8
> 
> ------------- EXCEPTION  -------------
> MSG: Could not open t/data/hybrid1.gff3: No such file or directory
> STACK Bio::Root::IO::_initialize_io Bio/Root/IO.pm:314
> STACK Bio::FeatureIO::_initialize Bio/FeatureIO.pm:345
> STACK Bio::FeatureIO::gff::_initialize Bio/FeatureIO/gff.pm:92
> STACK Bio::FeatureIO::new Bio/FeatureIO.pm:268
> STACK Bio::FeatureIO::new Bio/FeatureIO.pm:288
> STACK toplevel t/FeatureIO.t:83
> 
> --------------------------------------
> 
From birney at ebi.ac.uk  Tue Jan 18 04:05:49 2005
From: birney at ebi.ac.uk (Ewan Birney)
Date: Tue Jan 18 04:03:42 2005
Subject: [Bioperl-l] Problem with parsing ENSEMBL genbank flat file with
	genbank2gff3. pls
In-Reply-To: 
Message-ID: 

On Mon, 17 Jan 2005, Chris Mungall wrote:

> 
> Hi Vladimir
> 
> The genbank2gff3 script, in scripts/Bio-DB-GFF is attempting to recover
> information often which the genbank flat file format loses; this is the
> information about which mRNA relates to which CDS. You may or may not need
> this information, it depends why you are doing the conversion. If you
> don't need this, you may want just a straightforward genbank->gff
> conversion. Let me know if this is what you want to do and I can help with
> that.
> 
> If you _do_ wish to preserve the mRNA to CDS mappings, be aware that it
> isn't always possible to recover these with 100% fidelity from the genbank
> flat files. You may wish to pursue alternate approaches, such as
> downloading ensembl as a mysql dump (any ensembl folks around.. any plans
> to offer downloads in alternate formats such as gff3? This would be
> fantastic)

This is on the road map for Ensembl due to Vectorbase, but don't forget we 
offer GTF format, which is a different and well established GFF derived 
format and very clean to parse.

Go to Ensembl website --> Click on EnsMart, select your genome, in Filter,
unselect the filter by genomic region (to get the entire region) then in
Output select structure and select "GTF" format.

> 
> If you'd prefer to carry on via the genbank flat file route, here's what
> you should do:
> 
> * get the latest version of genbank2gff3.PLS I have just checked into cvs
> (I can send you a copy if you are using a bioperl release and not cvs)
> 
> * run the script with the "--ethresh 3" option. This will raise the error
> severity threshold at which problems with genbank file become
> showstoppers.
> 
> In addition, I will take a look at this particular file and see what it is
> that is causing problems and get back to you.
> 
> Cheers
> Chris
> 
> On Mon, 17 Jan 2005, Babenko, Vladimir (NIH/NLM/NCBI) wrote:
> 
> >     Greetings,
> > While parsing a genbank file taken from:
> > ftp://ftp.ensembl.org/pub/current_human/data/flatfiles/genbank/Homo_sapiens.
> > 0.dat as of Jan 2005,
> > I'm getting the following unflattening error:
> > --------------------------------------------------------
> > Processing file /ENSEMBL/Homo_sapiens.0.dat...
> > working on contig
> > chromosome:NCBI35:1:1:994676:1...chromosome:NCBI35:1:1:994676:1 Unflattening
> > error:
> > Details:
> > ------------- EXCEPTION  -------------
> > MSG: PROBLEM, SEVERITY==2
> > no containers possible for SeqFeature of type: CDS; this SF is being placed
> > at root level
> > SF [Bio::SeqFeature::Generic=HASH(0x86485d8)]: CDS; ENSG00000146556
> >
> > STACK Bio::SeqFeature::Tools::Unflattener::problem
> > /Bio/SeqFeature/Tools/Unflattener.pm:940
> > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_group
> > /Bio/SeqFeature/Tools/Unflattener.pm:1983
> > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_groups
> > /Bio/SeqFeature/Tools/Unflattener.pm:1744
> > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq
> > /Bio/SeqFeature/Tools/Unflattener.pm:1449
> > STACK (eval) genbank2gff3.PLS:345
> > STACK main::unflatten_seq genbank2gff3.PLS:344
> > STACK toplevel genbank2gff3.PLS:209
> >
> > --------------------------------------
> >
> > Possible gene unflattening error withchromosome:NCBI35:1:1:994676:1: consult
> > STDERR
> >
> > Using bioperl-1.5.0.RC2 under Linux.
> >
> >     Would be grateful for the hint,
> >       Vladimir
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 

-----------------------------------------------------------------
Ewan Birney.  Work:  +44 1223 494420
             Email:  birney "at" ebi.ac.uk 
Clerical Assistant:  shelley "at" ebi.ac.uk
Please cc shelley for urgent or diary-dependent requests
-----------------------------------------------------------------

From birney at ebi.ac.uk  Tue Jan 18 04:16:29 2005
From: birney at ebi.ac.uk (Ewan Birney)
Date: Tue Jan 18 04:16:06 2005
Subject: [Bioperl-l] Problem with parsing ENSEMBL genbank flat file with
	genbank2gff3. pls
In-Reply-To: 
Message-ID: 

On Mon, 17 Jan 2005, Chris Mungall wrote:

> 
> It is a genbank formatted file - you can download it from the url Vladmir
> provides below.
> 
> There seem to be a few oddities to do with the ensembl-flavour genbank
> format which may be causing problems for the unflattener:
> 
> * There doesn't appear to be any 'gene' features - a gene model is just
> mRNAs and CDSs. This means the files don't even contain essential stuff
> like the gene symbol!

The symbols are on the mRNA and CDS (in fact most identifiers map to the
mRNA and CDS). Each mRNA and CDS has the ENSG identifier in there. We
could of course put in a Gene line as well, and I can flag this up to the
guys. We should do this as it is easy enough to do.


However Chris, as you imply, we don't consider our EMBL or GenBank flat
files somehow definitive - the Mart tool allows highly flexible
downloading of gene structure (GTF) and other things and if we do
implement a GFF3 dumper it is likely to be via the Mart tool again.


Underneath this the database and Perl and Java API allows nearly any sort 
of information to be yanked out, and the database is internet accessible 
directly at ensembldb.ensembl.org.


   --> I'll ask the guys here to put in a gene line - Chris - what 
precisely do you need in the format to tickle your unflattener right?

   --> GFF3 direct dumping is in 2005 todo list, but not at the top at the 
moment. 




> 
> * In the feature entry, for the reverse strand, ensembl nests the
> complement function inside the join function, listing sublocations in a
> 3'->5' direction. This is unusual, but not problemmatic in itself.
> However, I'm not 100% convinced that the bioperl genbank parser handles
> these cases correctly - I will expand on this in another email. It's not
> a problem for the vast majority of cases, but it will be problemmatic for
> certain rare situations where the sublocations are of mixed strand (eg
> trans-spliced genes).
> 
> I can implement a hack in the unflattener for the first problem. However,
> the question is - is it worth it? Without the gene feature the
> ensembl-flavoured genbank files seem not particularly useful (granted it
> is possible to get the gene data by integrating with LocusLink/EntrezGene
> but is it worth it?). I know for a fact that the data structures
> underlying ensembl are sound, so it seems counterproductive to use nothing
> but genbank/embl as a flat file distribution format (and to drop the gene
> features on top of that!). I know ensembl use GTF a lot internally, it
> would be great to see use made of this format (or even better, GFF3) for
> data distribution. Perhaps there's something I'm missing here.. I'll wait
> for comment from someone from ensembl before progressing here, to avoid
> any pointless work...
> 
> Cheers
> Chris
> 
> On Mon, 17 Jan 2005, Scott Cain wrote:
> 
> > Hi Vladimir,
> >
> > Not to ask a question on the level of "is it plugged in", but are you sure
> > it is a genbank formatted file?  I think you would get a different error
> > if it weren't, but I just wanted to make sure.
> >
> > Scott
> >
> > ----------------------------------------------------------------------
> > Scott Cain, Ph. D.				 	 cain@cshl.org
> > GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
> > ----------------------------------------------------------------------
> >
> >
> > On Mon, 17 Jan 2005, Chris Mungall wrote:
> >
> > >
> > > Hi Vladimir
> > >
> > > The genbank2gff3 script, in scripts/Bio-DB-GFF is attempting to recover
> > > information often which the genbank flat file format loses; this is the
> > > information about which mRNA relates to which CDS. You may or may not need
> > > this information, it depends why you are doing the conversion. If you
> > > don't need this, you may want just a straightforward genbank->gff
> > > conversion. Let me know if this is what you want to do and I can help with
> > > that.
> > >
> > > If you _do_ wish to preserve the mRNA to CDS mappings, be aware that it
> > > isn't always possible to recover these with 100% fidelity from the genbank
> > > flat files. You may wish to pursue alternate approaches, such as
> > > downloading ensembl as a mysql dump (any ensembl folks around.. any plans
> > > to offer downloads in alternate formats such as gff3? This would be
> > > fantastic)
> > >
> > > If you'd prefer to carry on via the genbank flat file route, here's what
> > > you should do:
> > >
> > > * get the latest version of genbank2gff3.PLS I have just checked into cvs
> > > (I can send you a copy if you are using a bioperl release and not cvs)
> > >
> > > * run the script with the "--ethresh 3" option. This will raise the error
> > > severity threshold at which problems with genbank file become
> > > showstoppers.
> > >
> > > In addition, I will take a look at this particular file and see what it is
> > > that is causing problems and get back to you.
> > >
> > > Cheers
> > > Chris
> > >
> > > On Mon, 17 Jan 2005, Babenko, Vladimir (NIH/NLM/NCBI) wrote:
> > >
> > > >     Greetings,
> > > > While parsing a genbank file taken from:
> > > > ftp://ftp.ensembl.org/pub/current_human/data/flatfiles/genbank/Homo_sapiens.
> > > > 0.dat as of Jan 2005,
> > > > I'm getting the following unflattening error:
> > > > --------------------------------------------------------
> > > > Processing file /ENSEMBL/Homo_sapiens.0.dat...
> > > > working on contig
> > > > chromosome:NCBI35:1:1:994676:1...chromosome:NCBI35:1:1:994676:1 Unflattening
> > > > error:
> > > > Details:
> > > > ------------- EXCEPTION  -------------
> > > > MSG: PROBLEM, SEVERITY==2
> > > > no containers possible for SeqFeature of type: CDS; this SF is being placed
> > > > at root level
> > > > SF [Bio::SeqFeature::Generic=HASH(0x86485d8)]: CDS; ENSG00000146556
> > > >
> > > > STACK Bio::SeqFeature::Tools::Unflattener::problem
> > > > /Bio/SeqFeature/Tools/Unflattener.pm:940
> > > > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_group
> > > > /Bio/SeqFeature/Tools/Unflattener.pm:1983
> > > > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_groups
> > > > /Bio/SeqFeature/Tools/Unflattener.pm:1744
> > > > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq
> > > > /Bio/SeqFeature/Tools/Unflattener.pm:1449
> > > > STACK (eval) genbank2gff3.PLS:345
> > > > STACK main::unflatten_seq genbank2gff3.PLS:344
> > > > STACK toplevel genbank2gff3.PLS:209
> > > >
> > > > --------------------------------------
> > > >
> > > > Possible gene unflattening error withchromosome:NCBI35:1:1:994676:1: consult
> > > > STDERR
> > > >
> > > > Using bioperl-1.5.0.RC2 under Linux.
> > > >
> > > >     Would be grateful for the hint,
> > > >       Vladimir
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l@portal.open-bio.org
> > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > >
> >
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 

-----------------------------------------------------------------
Ewan Birney.  Work:  +44 1223 494420
             Email:  birney "at" ebi.ac.uk 
Clerical Assistant:  shelley "at" ebi.ac.uk
Please cc shelley for urgent or diary-dependent requests
-----------------------------------------------------------------

From danielucgbioinfo at yahoo.com.br  Tue Jan 18 05:34:50 2005
From: danielucgbioinfo at yahoo.com.br (Danielucg Sousa)
Date: Tue Jan 18 05:31:09 2005
Subject: [Bioperl-l] My last email about Bio::Graphics::Panel, please HELP
Message-ID: <20050118103450.97318.qmail@web53503.mail.yahoo.com>

Hi,

I'm showing a sequence on browser, but I not get do a
link http.
When a use: print $q->$map;
The out messanger is: 
Undefined subroutine CGI::
test 2
test 2


Please, What I do?
I have used Bioperl 1.5 RC 2 
Thanky for all.

My little code :
#!/usr/bin/perl -wT

use strict;
use Bio::Graphics;
use Bio::Graphics::FeatureFile;
use Bio::SeqIO;
use Bio::SeqFeature::Generic;
use CGI  qw / :standard /;
use CGI::Pretty;

my $wholeseq =
Bio::SeqFeature::Generic->new(-start=>1,-end=>600);

my $q = new CGI;

print $q->header('text/html');
print $q->start_html('A Vector Rendering ');

print $q->h1('teste');
my $panel = Bio::Graphics::Panel->new(-length  => 
1000, -width  => 800, -pad_left     => 10,  -pad_right
   => 10,  -key_style =>'none', -spacing => -0.25, 
-box_subparts => 'true',-link =>
"http://www.google.com");

my $track =  $panel->add_track($wholeseq,  -glyph  =>
'transcript2', -bgcolor =>'orange', -bump   => 0,
-height =>12,-title=>'test 2', -link
=>'http://www.google.com.br' );

my $feature =
Bio::SeqFeature::Generic->new(-display_name=>'teste',
-score=>20, -start=>400, -end=>800,
-url=>'http://www.google.com' );
 $track -> add_feature($feature);
      
 my ($url,$map,$mapname) = $panel->image_and_map(-root
=> '/var/www/html',-url => '/tmpimages', -link =>
"http://www.google.com" );
 
print $q->img({-src=>$url,-usemap=>"#$mapname", -link
=> "http://www.google.com" });
print $q->$map;
print $q->($panel->png);
$panel->finished;
print $q->exit_html;

exit;

Thank you very much,
Daniel Xavier


	
	
		
_______________________________________________________ 
Yahoo! Acesso Gr?tis - Instale o discador do Yahoo! agora. http://br.acesso.yahoo.com/ - Internet r?pida e gr?tis
From sdavis2 at mail.nih.gov  Tue Jan 18 06:17:56 2005
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Tue Jan 18 06:15:54 2005
Subject: [Bioperl-l] Feature table comparison
In-Reply-To: <1106041017.41ecd8b96aa2b@webmail.salford.ac.uk>
References: <1105969185.41ebc021ee0a5@webmail.salford.ac.uk>
	<000801c4fd01$6caa6f00$7d75f345@WATSON>
	<1106041017.41ecd8b96aa2b@webmail.salford.ac.uk>
Message-ID: <9A5CF428-6942-11D9-B052-000D933565E8@mail.nih.gov>

Rob,

If you have files in EMBL format, you can use Bio::SeqIO to read them.  
What is in the EMBL files--protein or DNA?  Are the features named in a 
systematic manner (are the same genes called the same thing in both 
strains if they are present)?  If they are, can you simply do an ID 
matching between the two strains?  Judging from your email below, 
probably not.

If the question you are asking is truly the opposite of an alignment, 
then you will need to do more work.  This is beyond my usual 
bioinformatics realm, but I would imagine that you would need to align 
the two genomes first (and how you do this will greatly affect your 
results, I would suppose) and then look for what didn't align in each 
strain.  I'm sure others on the list have done this kind of thing 
before.  I'm just not sure what the state-of-the-art is for 
whole-genome alignments these days.

Sean

On Jan 18, 2005, at 4:36 AM, Robert Minshall wrote:

> i am basically trying to find the differences between 2 strains of 
> bacteria in
> embl format. what i really need is an inverted ACT (Artemis comparison 
> tool)
> diffseq from emboss wont do what i need, i just need to some how get a 
> list of
> protiens that are on one strain and not the other. This cn be done by 
> hand but
> will take months. oi was woundereing if there was a program out there 
> where i
> can input the 2 embl files and get a list of feature differences or the
> opposite of an alignment.
> Thanks
> Rob
> --
> Robert J Minshall
> Postgraduate Researcher in Microbiology,
> Biosciences Research Institute,
> School of Environment and Life Sciences,
> Lab 209 Cockcroft Building,
> University of Salford,
> Salford,
> Greater Manchester.
> M5 4WT
> UK
> 0161 2952652
> r.j.mishall@pgr.salford.ac.uk
>
>
>
> Quoting Sean Davis :
>
>> Rob,
>>
>> You will probably need to be a bit more specific.  What constitutes a
>> "genome" in your email below?  What are the features?  In what form 
>> are you
>> getting the data?  Do you have a specific question you are trying to 
>> answer?
>>
>> Sean
>>
>> ----- Original Message -----
>> From: "Robert Minshall" 
>> To: 
>> Sent: Monday, January 17, 2005 8:39 AM
>> Subject: [Bioperl-l] Feature table comparison
>>
>>
>>>
>>> Hi does any one know of or have a script that can compare the faeture
>>> tables of
>>> genomes and show what appears on one and not the other. ie i want to 
>>> find
>>> the
>>> differenmces on the feature tables. is this possible i'm new to perl 
>>> and
>>> was
>>> hoping that someone could point me in the right direction. my email 
>>> is
>>> r.j.minshall@pgr.salford.ac.uk
>>> thanks in advance
>>> Rob Minshall
>>>
>>> --
>>> Robert J Minshall
>>> Postgraduate Researcher in Microbiology,
>>> Biosciences Research Institute,
>>> School of Environment and Life Sciences,
>>> Lab 209 Cockcroft Building,
>>> University of Salford,
>>> Salford,
>>> Greater Manchester.
>>> M5 4WT
>>> UK
>>> 0161 2952652
>>> r.j.mishall@pgr.salford.ac.uk
>>>
>>>
>>>
>>>
>>>
>>>
>>> ----------------------------------------------------------------
>>> Concerns about content should be sent to abuse@salford.ac.uk
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>>
>
>
> ----------------------------------------------------------------
> Concerns about content should be sent to abuse@salford.ac.uk

From sdavis2 at mail.nih.gov  Tue Jan 18 06:43:54 2005
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Tue Jan 18 06:41:37 2005
Subject: [Bioperl-l] Feature table comparison
In-Reply-To: <1106047454.41ecf1de788e5@webmail.salford.ac.uk>
References: <1105969185.41ebc021ee0a5@webmail.salford.ac.uk>
	<000801c4fd01$6caa6f00$7d75f345@WATSON>
	<1106041017.41ecd8b96aa2b@webmail.salford.ac.uk>
	<9A5CF428-6942-11D9-B052-000D933565E8@mail.nih.gov>
	<1106047454.41ecf1de788e5@webmail.salford.ac.uk>
Message-ID: <3B416620-6946-11D9-B052-000D933565E8@mail.nih.gov>

Rob,

Perhaps others have done something similar.  In general, it helps to 
post back to the list so we both benefit from others' knowledge and 
others benefit from your thoughts on what is not a straightforward 
problem.

As for my two-cents worth, can't you just go through the alignments for 
each strain, sort them in genomic order of one strain, and determine 
the segments not aligning based on the end of one alignment and the 
beginning of then next?  Do the sort for the other strain to get the 
same unaligned blocks for the other strain.  Then, move on to the next 
pairing of strains and repeat.  That will give you the unaligned blocks 
for each strain with respect to each other strain.  Then you can do go 
back to your feature table for each strain and look for overlaps 
between the unaligned segments and the annotated features--for this 
there are tools in bioperl.  See Bio::DB::GFF and ? others?

Sean

On Jan 18, 2005, at 6:24 AM, Robert Minshall wrote:

> tahnks for your help, so far i can align the DNA embl files no problem 
> and work
> out which bits are not alligned by habd but i have 6 strains to 
> compair against
> eachother and the first allignmnt has taken me months to work out so 
> far and
> i'm only 1/2 way through the thing. all i wanted to do was find 
> sections of dna
> not on one strain but on another and work out what the protiens were. 
> the
> feature table on most of the strains i have is not well annotated and 
> therefore
> i think that a feature table comparrison is now not the correct way 
> forward. i
> just want to separate out the sections of dna that are "unique" to one
> particular strain form the other and get the protien and see if it 
> appears on
> other strains or not. it seems simple in my head but in practice its 
> not.
> --
> Robert J Minshall
> Postgraduate Researcher in Microbiology,
> Biosciences Research Institute,
> School of Environment and Life Sciences,
> Lab 209 Cockcroft Building,
> University of Salford,
> Salford,
> Greater Manchester.
> M5 4WT
> UK
> 0161 2952652
> r.j.mishall@pgr.salford.ac.uk
>
>
>
> Quoting Sean Davis :
>
>> Rob,
>>
>> If you have files in EMBL format, you can use Bio::SeqIO to read them.
>> What is in the EMBL files--protein or DNA?  Are the features named in 
>> a
>> systematic manner (are the same genes called the same thing in both
>> strains if they are present)?  If they are, can you simply do an ID
>> matching between the two strains?  Judging from your email below,
>> probably not.
>>
>> If the question you are asking is truly the opposite of an alignment,
>> then you will need to do more work.  This is beyond my usual
>> bioinformatics realm, but I would imagine that you would need to align
>> the two genomes first (and how you do this will greatly affect your
>> results, I would suppose) and then look for what didn't align in each
>> strain.  I'm sure others on the list have done this kind of thing
>> before.  I'm just not sure what the state-of-the-art is for
>> whole-genome alignments these days.
>>
>> Sean
>>
>> On Jan 18, 2005, at 4:36 AM, Robert Minshall wrote:
>>
>>> i am basically trying to find the differences between 2 strains of
>>> bacteria in
>>> embl format. what i really need is an inverted ACT (Artemis 
>>> comparison
>>> tool)
>>> diffseq from emboss wont do what i need, i just need to some how get 
>>> a
>>> list of
>>> protiens that are on one strain and not the other. This cn be done by
>>> hand but
>>> will take months. oi was woundereing if there was a program out there
>>> where i
>>> can input the 2 embl files and get a list of feature differences or 
>>> the
>>> opposite of an alignment.
>>> Thanks
>>> Rob
>>> --
>>> Robert J Minshall
>>> Postgraduate Researcher in Microbiology,
>>> Biosciences Research Institute,
>>> School of Environment and Life Sciences,
>>> Lab 209 Cockcroft Building,
>>> University of Salford,
>>> Salford,
>>> Greater Manchester.
>>> M5 4WT
>>> UK
>>> 0161 2952652
>>> r.j.mishall@pgr.salford.ac.uk
>>>
>>>
>>>
>>> Quoting Sean Davis :
>>>
>>>> Rob,
>>>>
>>>> You will probably need to be a bit more specific.  What constitutes 
>>>> a
>>>> "genome" in your email below?  What are the features?  In what form
>>>> are you
>>>> getting the data?  Do you have a specific question you are trying to
>>>> answer?
>>>>
>>>> Sean
>>>>
>>>> ----- Original Message -----
>>>> From: "Robert Minshall" 
>>>> To: 
>>>> Sent: Monday, January 17, 2005 8:39 AM
>>>> Subject: [Bioperl-l] Feature table comparison
>>>>
>>>>
>>>>>
>>>>> Hi does any one know of or have a script that can compare the 
>>>>> faeture
>>>>> tables of
>>>>> genomes and show what appears on one and not the other. ie i want 
>>>>> to
>>>>> find
>>>>> the
>>>>> differenmces on the feature tables. is this possible i'm new to 
>>>>> perl
>>>>> and
>>>>> was
>>>>> hoping that someone could point me in the right direction. my email
>>>>> is
>>>>> r.j.minshall@pgr.salford.ac.uk
>>>>> thanks in advance
>>>>> Rob Minshall
>>>>>
>>>>> --
>>>>> Robert J Minshall
>>>>> Postgraduate Researcher in Microbiology,
>>>>> Biosciences Research Institute,
>>>>> School of Environment and Life Sciences,
>>>>> Lab 209 Cockcroft Building,
>>>>> University of Salford,
>>>>> Salford,
>>>>> Greater Manchester.
>>>>> M5 4WT
>>>>> UK
>>>>> 0161 2952652
>>>>> r.j.mishall@pgr.salford.ac.uk
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ----------------------------------------------------------------
>>>>> Concerns about content should be sent to abuse@salford.ac.uk
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l@portal.open-bio.org
>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> ----------------------------------------------------------------
>>> Concerns about content should be sent to abuse@salford.ac.uk
>>
>>
>
>
> ----------------------------------------------------------------
> Concerns about content should be sent to abuse@salford.ac.uk

From palmeida at igc.gulbenkian.pt  Tue Jan 18 07:01:21 2005
From: palmeida at igc.gulbenkian.pt (palmeida@igc.gulbenkian.pt)
Date: Tue Jan 18 06:56:45 2005
Subject: [Bioperl-l] My last email about Bio::Graphics::Panel, please HELP
In-Reply-To: <20050118103450.97318.qmail@web53503.mail.yahoo.com>
References: <20050118103450.97318.qmail@web53503.mail.yahoo.com>
Message-ID: <20050118120121.GD5318@bioinf.igc.gulbenkian.pt>

Hi,

Have you tried: print $map;

You are using it as if $map were a subroutine of CGI, but you just want
to print whatever is in the variable $map.

-Paulo

On Tue, Jan 18, 2005 at 07:34:50AM -0300, Danielucg Sousa wrote:
> Hi,
> 
> I'm showing a sequence on browser, but I not get do a
> link http.
> When a use: print $q->$map;
> The out messanger is: 
> Undefined subroutine CGI:: id="bgmap00001">
>  href="http://www.google.com.br" title="test 2"
> alt="test 2" />
>  href="http://www.google.com.br" title="test 2"
> alt="test 2" />
> 
> 
> Please, What I do?
> I have used Bioperl 1.5 RC 2 
> Thanky for all.
> 
> My little code :
> #!/usr/bin/perl -wT
> 
> use strict;
> use Bio::Graphics;
> use Bio::Graphics::FeatureFile;
> use Bio::SeqIO;
> use Bio::SeqFeature::Generic;
> use CGI  qw / :standard /;
> use CGI::Pretty;
> 
> my $wholeseq =
> Bio::SeqFeature::Generic->new(-start=>1,-end=>600);
> 
> my $q = new CGI;
> 
> print $q->header('text/html');
> print $q->start_html('A Vector Rendering ');
> 
> print $q->h1('teste');
> my $panel = Bio::Graphics::Panel->new(-length  => 
> 1000, -width  => 800, -pad_left     => 10,  -pad_right
>    => 10,  -key_style =>'none', -spacing => -0.25, 
> -box_subparts => 'true',-link =>
> "http://www.google.com");
> 
> my $track =  $panel->add_track($wholeseq,  -glyph  =>
> 'transcript2', -bgcolor =>'orange', -bump   => 0,
> -height =>12,-title=>'test 2', -link
> =>'http://www.google.com.br' );
> 
> my $feature =
> Bio::SeqFeature::Generic->new(-display_name=>'teste',
> -score=>20, -start=>400, -end=>800,
> -url=>'http://www.google.com' );
>  $track -> add_feature($feature);
>       
>  my ($url,$map,$mapname) = $panel->image_and_map(-root
> => '/var/www/html',-url => '/tmpimages', -link =>
> "http://www.google.com" );
>  
> print $q->img({-src=>$url,-usemap=>"#$mapname", -link
> => "http://www.google.com" });
> print $q->$map;
> print $q->($panel->png);
> $panel->finished;
> print $q->exit_html;
> 
> exit;
> 
> Thank you very much,
> Daniel Xavier

-- 
Paulo Almeida
Instituto Gulbenkian de Ciencia
Apartado 14, 2781-901, Oeiras, PORTUGAL
tel  +351 21 446 46 35
fax  +351 21 440 79 70
http://www.igc.gulbenkian.pt
From avilella at ebi.ac.uk  Tue Jan 18 10:57:44 2005
From: avilella at ebi.ac.uk (Albert Vilella)
Date: Tue Jan 18 10:53:56 2005
Subject: [Bioperl-l] negative and decimal values in Bio::Graphics xyplot
Message-ID: <1106063864.5345.33.camel@magneto>

Hi,

I was uploading an xyplot file for Hapmap's GBrowse browser, that
contains negative decimal numbers. For what I saw, there seems to be a
problem with the plotting of negative values.

I assume that decimal values are allowed, not seeing any problem in
Bio::Graphics::Glyph::xyplot. This would make feasible to plot things
like:

----------------
[expression]
glyph = xyplot
graph_type=boxes
fgcolor = black
bgcolor = darkslateblue
height=100
min_score = 0.000001
max_score = 0.001
label=1
key=variscan_MRA_plots_for_genotypes_chr1_YRI.w100000
reference=chr1

##mra_levels_1-9
expression      mra_levels_1-9_YRI      1750001..1850000        0.000109
expression      mra_levels_1-9_YRI      1850001..1950000        0.000003
expression      mra_levels_1-9_YRI      1950001..2050000        0.000022
expression      mra_levels_1-9_YRI      2050001..2150000        0.000053
[...]
----------------

But negative values are either a problem in xyplot, or unlikely, in any
other step in the process of importing the data in GBrowse.

Any hint?

Thanks,

    Albert.

-- 
Albert Vilella Bertran    avilella_at_ub_edu
--------------------------------------------
Departament de Genetica
Universitat de Barcelona
Diagonal 645 08028, Barcelona
Tel: +34 934035306 Fax: +34 934034420
--------------------------------------------
avilella_at_ebi_ac_uk
EMBL Outstation, European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton
Cambs. CB10 1SD, United Kingdom
--------------------------------------------------
From cjm at fruitfly.org  Tue Jan 18 11:20:15 2005
From: cjm at fruitfly.org (Chris Mungall)
Date: Tue Jan 18 11:18:19 2005
Subject: [Bioperl-l] Problem with parsing ENSEMBL genbank flat file with
	genbank2gff3. pls
In-Reply-To: 
References: 
Message-ID: 


OK, so it looks like EnsMart may solve Vladimir's problem by bypassing the
genbank-format files altogether

Ewan - it'd be nice to see the GFF/GTFs appear in the main ftp download
area too at some point, as well as via dynamic EnsMart download. As far as
tweaking the ensembl genbank output, I think the addition of a feature of
type 'gene', with a single location covering the maximal extent of all
mRNAs, as is fairly-standard with genbank-format files - that should do
it.

Cheers
Chris

On Tue, 18 Jan 2005, Ewan Birney wrote:

> On Mon, 17 Jan 2005, Chris Mungall wrote:
>
> >
> > It is a genbank formatted file - you can download it from the url Vladmir
> > provides below.
> >
> > There seem to be a few oddities to do with the ensembl-flavour genbank
> > format which may be causing problems for the unflattener:
> >
> > * There doesn't appear to be any 'gene' features - a gene model is just
> > mRNAs and CDSs. This means the files don't even contain essential stuff
> > like the gene symbol!
>
> The symbols are on the mRNA and CDS (in fact most identifiers map to the
> mRNA and CDS). Each mRNA and CDS has the ENSG identifier in there. We
> could of course put in a Gene line as well, and I can flag this up to the
> guys. We should do this as it is easy enough to do.
>
>
> However Chris, as you imply, we don't consider our EMBL or GenBank flat
> files somehow definitive - the Mart tool allows highly flexible
> downloading of gene structure (GTF) and other things and if we do
> implement a GFF3 dumper it is likely to be via the Mart tool again.
>
>
> Underneath this the database and Perl and Java API allows nearly any sort
> of information to be yanked out, and the database is internet accessible
> directly at ensembldb.ensembl.org.
>
>
>    --> I'll ask the guys here to put in a gene line - Chris - what
> precisely do you need in the format to tickle your unflattener right?
>
>    --> GFF3 direct dumping is in 2005 todo list, but not at the top at the
> moment.
>
>
>
>
> >
> > * In the feature entry, for the reverse strand, ensembl nests the
> > complement function inside the join function, listing sublocations in a
> > 3'->5' direction. This is unusual, but not problemmatic in itself.
> > However, I'm not 100% convinced that the bioperl genbank parser handles
> > these cases correctly - I will expand on this in another email. It's not
> > a problem for the vast majority of cases, but it will be problemmatic for
> > certain rare situations where the sublocations are of mixed strand (eg
> > trans-spliced genes).
> >
> > I can implement a hack in the unflattener for the first problem. However,
> > the question is - is it worth it? Without the gene feature the
> > ensembl-flavoured genbank files seem not particularly useful (granted it
> > is possible to get the gene data by integrating with LocusLink/EntrezGene
> > but is it worth it?). I know for a fact that the data structures
> > underlying ensembl are sound, so it seems counterproductive to use nothing
> > but genbank/embl as a flat file distribution format (and to drop the gene
> > features on top of that!). I know ensembl use GTF a lot internally, it
> > would be great to see use made of this format (or even better, GFF3) for
> > data distribution. Perhaps there's something I'm missing here.. I'll wait
> > for comment from someone from ensembl before progressing here, to avoid
> > any pointless work...
> >
> > Cheers
> > Chris
> >
> > On Mon, 17 Jan 2005, Scott Cain wrote:
> >
> > > Hi Vladimir,
> > >
> > > Not to ask a question on the level of "is it plugged in", but are you sure
> > > it is a genbank formatted file?  I think you would get a different error
> > > if it weren't, but I just wanted to make sure.
> > >
> > > Scott
> > >
> > > ----------------------------------------------------------------------
> > > Scott Cain, Ph. D.				 	 cain@cshl.org
> > > GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
> > > ----------------------------------------------------------------------
> > >
> > >
> > > On Mon, 17 Jan 2005, Chris Mungall wrote:
> > >
> > > >
> > > > Hi Vladimir
> > > >
> > > > The genbank2gff3 script, in scripts/Bio-DB-GFF is attempting to recover
> > > > information often which the genbank flat file format loses; this is the
> > > > information about which mRNA relates to which CDS. You may or may not need
> > > > this information, it depends why you are doing the conversion. If you
> > > > don't need this, you may want just a straightforward genbank->gff
> > > > conversion. Let me know if this is what you want to do and I can help with
> > > > that.
> > > >
> > > > If you _do_ wish to preserve the mRNA to CDS mappings, be aware that it
> > > > isn't always possible to recover these with 100% fidelity from the genbank
> > > > flat files. You may wish to pursue alternate approaches, such as
> > > > downloading ensembl as a mysql dump (any ensembl folks around.. any plans
> > > > to offer downloads in alternate formats such as gff3? This would be
> > > > fantastic)
> > > >
> > > > If you'd prefer to carry on via the genbank flat file route, here's what
> > > > you should do:
> > > >
> > > > * get the latest version of genbank2gff3.PLS I have just checked into cvs
> > > > (I can send you a copy if you are using a bioperl release and not cvs)
> > > >
> > > > * run the script with the "--ethresh 3" option. This will raise the error
> > > > severity threshold at which problems with genbank file become
> > > > showstoppers.
> > > >
> > > > In addition, I will take a look at this particular file and see what it is
> > > > that is causing problems and get back to you.
> > > >
> > > > Cheers
> > > > Chris
> > > >
> > > > On Mon, 17 Jan 2005, Babenko, Vladimir (NIH/NLM/NCBI) wrote:
> > > >
> > > > >     Greetings,
> > > > > While parsing a genbank file taken from:
> > > > > ftp://ftp.ensembl.org/pub/current_human/data/flatfiles/genbank/Homo_sapiens.
> > > > > 0.dat as of Jan 2005,
> > > > > I'm getting the following unflattening error:
> > > > > --------------------------------------------------------
> > > > > Processing file /ENSEMBL/Homo_sapiens.0.dat...
> > > > > working on contig
> > > > > chromosome:NCBI35:1:1:994676:1...chromosome:NCBI35:1:1:994676:1 Unflattening
> > > > > error:
> > > > > Details:
> > > > > ------------- EXCEPTION  -------------
> > > > > MSG: PROBLEM, SEVERITY==2
> > > > > no containers possible for SeqFeature of type: CDS; this SF is being placed
> > > > > at root level
> > > > > SF [Bio::SeqFeature::Generic=HASH(0x86485d8)]: CDS; ENSG00000146556
> > > > >
> > > > > STACK Bio::SeqFeature::Tools::Unflattener::problem
> > > > > /Bio/SeqFeature/Tools/Unflattener.pm:940
> > > > > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_group
> > > > > /Bio/SeqFeature/Tools/Unflattener.pm:1983
> > > > > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_groups
> > > > > /Bio/SeqFeature/Tools/Unflattener.pm:1744
> > > > > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq
> > > > > /Bio/SeqFeature/Tools/Unflattener.pm:1449
> > > > > STACK (eval) genbank2gff3.PLS:345
> > > > > STACK main::unflatten_seq genbank2gff3.PLS:344
> > > > > STACK toplevel genbank2gff3.PLS:209
> > > > >
> > > > > --------------------------------------
> > > > >
> > > > > Possible gene unflattening error withchromosome:NCBI35:1:1:994676:1: consult
> > > > > STDERR
> > > > >
> > > > > Using bioperl-1.5.0.RC2 under Linux.
> > > > >
> > > > >     Would be grateful for the hint,
> > > > >       Vladimir
> > > > > _______________________________________________
> > > > > Bioperl-l mailing list
> > > > > Bioperl-l@portal.open-bio.org
> > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > > > >
> > > >
> > >
> > >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
>
> -----------------------------------------------------------------
> Ewan Birney.  Work:  +44 1223 494420
>              Email:  birney "at" ebi.ac.uk
> Clerical Assistant:  shelley "at" ebi.ac.uk
> Please cc shelley for urgent or diary-dependent requests
> -----------------------------------------------------------------
>
>
From birney at ebi.ac.uk  Tue Jan 18 11:27:40 2005
From: birney at ebi.ac.uk (Ewan Birney)
Date: Tue Jan 18 11:23:53 2005
Subject: [Bioperl-l] Problem with parsing ENSEMBL genbank flat file with
	genbank2gff3. pls
In-Reply-To: 
Message-ID: 

On Tue, 18 Jan 2005, Chris Mungall wrote:

> 
> OK, so it looks like EnsMart may solve Vladimir's problem by bypassing the
> genbank-format files altogether
> 
> Ewan - it'd be nice to see the GFF/GTFs appear in the main ftp download
> area too at some point, as well as via dynamic EnsMart download. As far as
> tweaking the ensembl genbank output, I think the addition of a feature of
> type 'gene', with a single location covering the maximal extent of all
> mRNAs, as is fairly-standard with genbank-format files - that should do
> it.

We can't do putting the GTF file on the ftp site as matter of general
principal - too many people ask for "why can't you just put XXXX on the
ftp site"  - and then we will run out of disk space too fast. Mart is far
more scaleable solution. We can't just keep putting every possible format
combination on our ftp site - it wont scale. (admittedly GTF files wont 
dent our disk space much, but you get the idea). 

Mart should work well for people and we have command line tools to address 
mart as well as the web form. Vladimir - does this work for you?



I've cc'd in Arne (software lead) and Glenn (release coordinator for 
March) - guys - we need to add a "gene" line in our EMBL and GenBank 
dumper so it plays better with parsing scripts out there. Chris - just for 
the avoidance of us screwing up could you give a concrete example of the 
right sort of gene line?


From MEC at Stowers-Institute.org  Tue Jan 18 11:42:12 2005
From: MEC at Stowers-Institute.org (Cook, Malcolm)
Date: Tue Jan 18 11:38:55 2005
Subject: [Bioperl-l] Feature table comparison
Message-ID: <200501181638.j0IGcpKr027393@portal.open-bio.org>

You might find gffintersect.pl from
http://www.sanger.ac.uk/Software/formats/GFF/ relevant when it is
described to:

	gffintersect.pl - efficiently finds the intersection (or
exclusion) of two GFF streams, reporting intersection information in the
Group field. Definition of "intersection" allows for near-neighbours and
minimum-overlap


--Malcolm Cook


>-----Original Message-----
>From: bioperl-l-bounces@portal.open-bio.org 
>[mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of 
>Robert Minshall
>Sent: Monday, January 17, 2005 7:40 AM
>To: bioperl-l@bioperl.org
>Subject: [Bioperl-l] Feature table comparison
>
>
>
>Hi does any one know of or have a script that can compare the 
>faeture tables of
>genomes and show what appears on one and not the other. ie i 
>want to find the
>differenmces on the feature tables. is this possible i'm new 
>to perl and was
>hoping that someone could point me in the right direction. my email is
>r.j.minshall@pgr.salford.ac.uk
>thanks in advance
>Rob Minshall
>
>--
>Robert J Minshall
>Postgraduate Researcher in Microbiology,
>Biosciences Research Institute,
>School of Environment and Life Sciences,
>Lab 209 Cockcroft Building,
>University of Salford,
>Salford,
>Greater Manchester.
>M5 4WT
>UK
>0161 2952652
>r.j.mishall@pgr.salford.ac.uk
>
>
>
>
>
>
>----------------------------------------------------------------
>Concerns about content should be sent to abuse@salford.ac.uk
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

From rob at salmonella.org  Tue Jan 18 13:54:21 2005
From: rob at salmonella.org (Rob Edwards)
Date: Tue Jan 18 13:50:31 2005
Subject: [Bioperl-l] Re: GFF3
In-Reply-To: 
References: 
	
	<64201ADE-692D-11D9-9265-000A959E1622@salmonella.org>
	
Message-ID: <5D5654B8-6982-11D9-9265-000A959E1622@salmonella.org>

I checked out bioperl-live again, and got the same errors. I just fixed 
the file and checked it back in. All tests pass for me.

Rob


On Jan 18, 2005, at 12:54 AM, Allen Day wrote:

>> The first series of errors die because the feature ID=AB000114 in
>> t/data/knownGene.gff3 has several Dbxrefs separated with ';' instead 
>> of
>> ','
>
> i'm not getting these errors, are you are in sync with cvs HEAD?
>

From babenko at ncbi.nlm.nih.gov  Tue Jan 18 14:20:11 2005
From: babenko at ncbi.nlm.nih.gov (Babenko, Vladimir (NIH/NLM/NCBI))
Date: Tue Jan 18 14:17:23 2005
Subject: [Bioperl-l] Problem with parsing ENSEMBL genbank flat file wi
	th genbank2gff3. pls
Message-ID: <69BA0F938FAC6A4CBEF49461720696F2079659C2@nihexchange16.nih.gov>

    Sorry Ewan, 
Now I got that when I check multiple transcripts  it means genes with no
less than 2 transcripts.
     The Mart is amazing.
    Regards,
	Vladimir

>-----Original Message-----
>From: Ewan Birney [mailto:birney@ebi.ac.uk] 
>Sent: Tuesday, January 18, 2005 1:35 PM
>To: Babenko, Vladimir (NIH/NLM/NCBI)
>Cc: 'cjm@fruitfly.org'; 'cain@cshl.edu'; 'jason.stajich@duke.edu'
>Subject: RE: [Bioperl-l] Problem with parsing ENSEMBL genbank 
>flat file wi th genbank2gff3. pls
>
>On Tue, 18 Jan 2005, Babenko, Vladimir (NIH/NLM/NCBI) wrote:
>
>>        Hi all,
>> Thank you for your prompt response:
>> I believe that gff* is one of the ways to manage data, so 
>this is the 
>> reason I'm up to this.
>>      I played around for a while with all you propositions, 
>so further 
>> is a short response:
>> 1) Chris - I checked the script, it works fine, thank you. I'm 
>> currently exploring this option.
>> The point is that I do need to have both mRNA and CDS linked 
>to check 
>> for UTR, introns, and looks like genbank2gff3 works fine here.
>> 2) Ensmart - this is a great proposition. I haven't come to 
>the end of 
>> the investigation of this sound product, but it is my sneaky 
>suspicion 
>> that I need some kind of mysql dump to manage the stuff by myself in 
>> the same way bioperl does, but again it's a compromise between 
>> compexity and simplicity that I cannot fully embrace for a while.
>> Still the option of species comparison may annihilate my suspicions 
>> momentarily if I will be able to manage it.
>> BTW, Ewan, 'unchecking' for the entire genome and setting multiple 
>> transcripts for human yields (all other options unchecked) after 
>> Filters stage yields:
>> 7185 Entries pass Filters - that looks a bit few for human. 
>Probably I 
>> miss out something, sorry.
>
>You want to uncheck the "multiple transcripts" (this means 
>genes with more than one transcript: you want all genes).
>
>
>
From barry.moore at genetics.utah.edu  Tue Jan 18 14:21:47 2005
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Tue Jan 18 14:18:21 2005
Subject: [Bioperl-l] bioperl
In-Reply-To: <20050116062005.37768.qmail@web30103.mail.mud.yahoo.com>
References: <20050116062005.37768.qmail@web30103.mail.mud.yahoo.com>
Message-ID: <41ED61CB.7010700@genetics.utah.edu>

Robin-

Have you checked out the BioJava project?  http://www.biojava.org/.  
Yes, the RichSeq objects use by bioperl contain the information from the 
GenBank features table.  Bio::SeqIO understands a variety of XML formats.

Barry

Robin XML wrote:

>Dear Sir,
>I am a beginner in bioinformatics. I am being excited
>by your fantastic biopel functions. But some questions
>confuse me:
>1.Is it possible to call bioperl functions by Java
>under Windows? because I need a GUI and need Java to
>handle XML template modification.
>2. Is it correct that with Bio::DB::GenBank() and
>Bio::SeqIO, I can get full GanBank data in XML format?
>Is it means include the features part?
>
>
>Thank you!!!!!!
>
>Best regards,
>Robin
>
>
>
>	
>		
>__________________________________ 
>Do you Yahoo!? 
>Yahoo! Mail - You care about security. So do we. 
>http://promotions.yahoo.com/new_mail
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>  
>

-- 
Barry Moore
Dept. of Human Genetics
University of Utah
Salt Lake City, UT



From talcon at iastate.edu  Tue Jan 18 17:20:05 2005
From: talcon at iastate.edu (Tim Alcon)
Date: Tue Jan 18 17:16:12 2005
Subject: [Bioperl-l] accessing GenBank
Message-ID: <41ED8B95.8020506@iastate.edu>

I seem unable to access GenBank.  When running bptutorial.exe, it seems 
like all the other examples run fine except that one.  Anyone know why 
that would be?  I'm using ActivePerl on Windows XP.  I have whichever 
version of bioperl is the current default using ppm (it's at least 
1.0).   When I run the exact same code from my campus Unix account, it 
works fine.

Tim

From barry.moore at genetics.utah.edu  Tue Jan 18 18:19:08 2005
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Tue Jan 18 18:15:38 2005
Subject: [Bioperl-l] accessing GenBank
In-Reply-To: <41ED8B95.8020506@iastate.edu>
References: <41ED8B95.8020506@iastate.edu>
Message-ID: <41ED996C.8000301@genetics.utah.edu>

Tim,

If you just typed install bioperl at the ppm prompt you may well have 
1.2.x.  That doesn't necessarily explain why your tutorial script 
doesn't work, but it might.  You probably want to install at least 
bioperl 1.4 (1.5 is on the way soon).  Try the following script as 
another way to check if you've got bioperl working on your windows machine.

Barry

#!/usr/bin/perl

#A short script to demonstrate how to download sequences from GenBank 
and access
#the sequence and some associated annotations using Bioperl.

use strict;
use warnings;
use Bio::SeqIO;
use Bio::DB::GenBank; #use Bio::DB::GenPept or Bio::DB::RefSeq if needed

#Get some sequence IDs either like below, or read in from a file.  Note that
#this sample script works with the accession numbers below (at least at 
the time
#it was written).  If you add different accession numbers, and you get 
errors,
#you may be calling for something that the sequence doesn't have.  
You'll have
#to add your own error trapping code to handle that.
my @ids = ('K03160', 'AB039327', 'BC035972');

#Create the GenBank database object to read from the database.
my $gb = new Bio::DB::GenBank();

#Create a sequence stream to pass the sequences from the database to the 
program.
my $seqio = $gb->get_Stream_by_id(\@ids);

#Loop over all of the sequences that you requested.
while (my $seq = $seqio->next_seq) {

  #Here is how you get methods directly from the RichSeq object.  Replace
  #'display_name' with any other method in Table 2. that can be called on
  #either the RichSeq object directly, or the PrimarySeq object which it has
  #inherited.
  print "Display Name:  ", $seq->display_name,"\n";
  print "Sequence Date:  ",$seq->get_dates,"\n";

  #Here is how to access the classification data from the species object.
  my $species = $seq->species;
  print "Species  :", $species->common_name,"\n";
  my @class = $species->classification;
  print "Classification:  @class\n";

  #Here is a general way to call things that are stored as a 
Bio::SeqFeature::
  #Generic object.  Replace 'source' with any other of the "major" 
headings in
  #the feature table (e.g gene, CDS, etc.) and replace 'organism' with 
any of
  #the tag values found under that heading (mol_type, locus_tag, gene, etc.)
  my @source_feats = grep { $_->primary_tag eq 'source' } 
$seq->get_SeqFeatures();
  my $source_feat = shift @source_feats;
  my @mol_type = $source_feat->get_tag_values('mol_type');
  print "Molecule Type:  @mol_type\n";
 
  #Here is a general way to call things that are stored as some type of a
  #Bio::Annotation oject.  This includes reference information, and 
comments.
  #Replace reference with 'comment' to get the comment, and replace
  #$ref->authors with $ref->title (or location, medline, etc.) to get other
  #reference categories
  my $ann = $seq->annotation();
  my @references = ($ann->get_Annotations('reference'));
  my $ref = shift @references;
  my ($title, $authors, $location, $pubmed, $reference);
  if (defined $ref) {
    $authors = $ref->authors;
    print "Authors:  $authors\n";
  }
  print "Sequence:  \n", $seq->seq, "\n\n";
}

Tim Alcon wrote:

> I seem unable to access GenBank.  When running bptutorial.exe, it 
> seems like all the other examples run fine except that one.  Anyone 
> know why that would be?  I'm using ActivePerl on Windows XP.  I have 
> whichever version of bioperl is the current default using ppm (it's at 
> least 1.0).   When I run the exact same code from my campus Unix 
> account, it works fine.
>
> Tim
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l


-- 
Barry Moore
Dept. of Human Genetics
University of Utah
Salt Lake City, UT



From palmeida at igc.gulbenkian.pt  Tue Jan 18 14:49:08 2005
From: palmeida at igc.gulbenkian.pt (Paulo Almeida)
Date: Tue Jan 18 21:36:46 2005
Subject: [Bioperl-l] NCBI BLink
Message-ID: <25361.192.168.50.3.1106077748.squirrel@192.168.50.3>

Is anyone working on a parser for BLink? I found a module by Rob Edwards
(http://www.salmonella.org/bioperl/Blink.pm), but I wanted to have the
Best Hits page, so I added an extra parameter (-besthits) to that module,
which you set to '1' to have the desired behavior.

I'm attaching the .diff file and the module itself.

--
Paulo Almeida
Instituto Gulbenkian de Ciencia
Apartado 14, 2781-901, Oeiras, PORTUGAL
tel  +351 21 446 46 35
fax  +351 21 440 79 70
http://www.igc.gulbenkian.pt
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BlinkNew.pm
Type: text/x-perl
Size: 11386 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050118/b721ccdc/BlinkNew.bin
-------------- next part --------------
115c115
<     my ($gi, $cutoff) = $self->_rearrange([qw(GI CUTOFF)], @args);
---
>     my ($gi, $cutoff, $besthits) = $self->_rearrange([qw(GI CUTOFF BESTHITS)], @args);
117a118
>     $self->{besthits}=$besthits || 0;
154a156,171
> =head2 besthits
> 
> Title	: besthits
> Usage	: $blink->besthits($besthits)
> Function: Get/Set All Hits or Best Hits
> Returns	: Current status
> Args	: 1 for best hits, anything else for all hits
> 
> =cut
> 
> sub besthits {
>  my ($self, $val)=@_;
>  if ($val) {$self->{besthits}=$val}
>  return $self->{besthits}
> }
> 
208a226
>  $header{'org'}=$self->{besthits};
256,257c274,278
<  return ($self->{$r2r}->{bl2sqlurl}, $self->{$r2r}->{score}, $self->{$r2r}->{p}, $self->{$r2r}->{prot2url}, 
<          $self->{$r2r}->{p2acc}, $self->{$r2r}->{p2blinkurl}, $self->{$r2r}->{p2gi}, $self->{$r2r}->{p2desc});
---
>  
>   return ($self->{$r2r}->{bl2sqlurl}, $self->{$r2r}->{score}, $self->{$r2r}->{p}, $self->{$r2r}->{prot2url},
>           $self->{$r2r}->{p2acc}, $self->{$r2r}->{p2blinkurl}, $self->{$r2r}->{p2gi}, $self->{$r2r}->{p2desc}) if $self->{besthits} != 1;
>   return ($self->{$r2r}->{bl2sqlurl}, $self->{$r2r}->{score}, $self->{$r2r}->{p}, $self->{$r2r}->{prot2url},
>           $self->{$r2r}->{p2acc}, $self->{$r2r}->{p2blinkurl}, $self->{$r2r}->{p2gi}, $self->{$r2r}->{p2desc}, $self->{$r2r}->{hitsurl}, $self->{$r2r}->{hits}) if $self->{besthits} == 1;
259c280
< 
---
> 		
284c305,307
<          $self->{$r2r}->{p2acc}, $self->{$r2r}->{p2blinkurl}, $self->{$r2r}->{p2gi}, $self->{$r2r}->{p2desc});
---
>          $self->{$r2r}->{p2acc}, $self->{$r2r}->{p2blinkurl}, $self->{$r2r}->{p2gi}, $self->{$r2r}->{p2desc}) if $self->{besthits} != 1;
>   return ($self->{$r2r}->{bl2sqlurl}, $self->{$r2r}->{score}, $self->{$r2r}->{p}, $self->{$r2r}->{prot2url},
> 	           $self->{$r2r}->{p2acc}, $self->{$r2r}->{p2blinkurl}, $self->{$r2r}->{p2gi}, $self->{$r2r}->{p2desc}, $self->{$r2r}->{hitsurl}, $self->{$r2r}->{hits}) if $self->{besthits} == 1;
317a341
>    next if (m#SCORE.*P.*ACCESSION#);
326,328c350
<    (m#^.*?onclick.*?href=(\S+?)>(\d+)\s+(\d+)(\S+).*?(\d+)(.*?)$#i);
<      
< # fix vi!
---
> #   print STDERR "\n", $self->{besthits}, "\n";
330c352,359
<    unless ($1 && $2 && $3 && $4 && $5 && $6 && $7 && $8) {print STDERR "Couldn't parse\n$_\n"; next}
---
>    (m#^.*?onclick.*?href=(\S+?)>(\d+)\s+(\d+)(\S+).*?(\d+).*?(\d+).*?(.*?)$#i) if $self->{besthits} ==1;  
>    unless  (($1 && $2 && $3 && $4 && $5 && $6 && $7 && $8 && $9 && $10) || ($self->{besthits} !=1))
>    {print STDERR "Couldn't parse\n$_\n"; next}
> 	 
>    (m#^.*?onclick.*?href=(\S+?)>(\d+)\s+(\d+)(\S+).*?(\d+)(.*?)$#i) if $self->{besthits} != 1;
>    unless (($1 && $2 && $3 && $4 && $5 && $6 && $7 && $8) || ($self->{besthits}==1))
>    {print STDERR "Couldn't parse\n$_\n"; next}
>    
341c370,375
<    $self->{$rcount}->{p2desc}=$8;
---
>    if ($self->besthits != 1 ) { $self->{$rcount}->{p2desc}=$8; }
>    else {
>    $self->{$rcount}->{hitsurl}=$8;
>    $self->{$rcount}->{hits}=$9;
>    $self->{$rcount}->{p2desc}=$10;
>    }
From rob at salmonella.org  Wed Jan 19 00:05:01 2005
From: rob at salmonella.org (Rob Edwards)
Date: Wed Jan 19 00:01:17 2005
Subject: [Bioperl-l] NCBI BLink
In-Reply-To: <25361.192.168.50.3.1106077748.squirrel@192.168.50.3>
References: <25361.192.168.50.3.1106077748.squirrel@192.168.50.3>
Message-ID: 

I wrote that when I needed some BLink functionality, and the module did  
exactly what I wanted. However, I never really rolled it into bioperl  
proper, never committed it, and never pursued it. If you'd like to add  
more functionality go ahead.

Rob


On Jan 18, 2005, at 11:49 AM, Paulo Almeida wrote:

> Is anyone working on a parser for BLink? I found a module by Rob  
> Edwards
> (http://www.salmonella.org/bioperl/Blink.pm), but I wanted to have the
> Best Hits page, so I added an extra parameter (-besthits) to that  
> module,
> which you set to '1' to have the desired behavior.
>
> I'm attaching the .diff file and the module itself.
>
> --
> Paulo Almeida
> Instituto Gulbenkian de Ciencia
> Apartado 14, 2781-901, Oeiras, PORTUGAL
> tel  +351 21 446 46 35
> fax  +351 21 440 79 70
> http:// 
> www.igc.gulbenkian.pt_________________________ 
> ______________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

From nathanhaigh at ukonline.co.uk  Wed Jan 19 04:02:10 2005
From: nathanhaigh at ukonline.co.uk (Nathan Haigh)
Date: Wed Jan 19 03:58:41 2005
Subject: [Bioperl-l] Installing Bioperl using PPM
In-Reply-To: <41ED8B95.8020506@iastate.edu>
Message-ID: 

Please read this even if you think you know how to install modules via PPM!

This is just a note on what to do to install the latest version of Bioperl (or any other module) via PPM:
Because of inconsistencies (see ActiveStates comments on this at the bottom) with the way PPM determines modules names/versions etc
it is NOT WISE to install modules by going:
    "install bioperl"
OR
    "upgrade bioperl"

You are very likely NOT to install the most recent version of a particular module by doing this! Instead you should do the
following:
    "search bioperl"
This gives a numbered list of the available modules in the repository's searched by your PPM (you can add additional repositories in
addition to the defaults given during installation - and this is advised). Chose the number of the correct module to install from
the list and do:
    "install "
Where  is the number of the module you wish to install. This way you will ensure you install the correct module/version YOU
want not the arbitrary module that PPM seems to want to install most of the time!

As soon as the official Bioperl 1.5 is released, I'll make the ppd and tar.gz files so it can be installed via PPM.

Nathan

ActiveStates comment on PPM's inconsistencies for determining module name/versions:
"Sorry for the confusion, ppm3 is kind of inconsistent in spots."


> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Tim Alcon
> Sent: 18 January 2005 22:20
> To: bioperl-l@portal.open-bio.org
> Subject: [Bioperl-l] accessing GenBank
> 
> I seem unable to access GenBank.  When running bptutorial.exe, it seems
> like all the other examples run fine except that one.  Anyone know why
> that would be?  I'm using ActivePerl on Windows XP.  I have whichever
> version of bioperl is the current default using ppm (it's at least
> 1.0).   When I run the exact same code from my campus Unix account, it
> works fine.
> 
> Tim
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> ---
> avast! Antivirus: Inbound message clean.
> Virus Database (VPS): 0503-0, 18/01/2005
> Tested on: 19/01/2005 08:41:49
> avast! is copyright (c) 2000-2003 ALWIL Software.
> http://www.avast.com
> 
> 

---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0503-0, 18/01/2005
Tested on: 19/01/2005 09:00:08
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com





From nathanhaigh at ukonline.co.uk  Wed Jan 19 04:05:10 2005
From: nathanhaigh at ukonline.co.uk (Nathan Haigh)
Date: Wed Jan 19 04:01:52 2005
Subject: [Bioperl-l] accessing GenBank
In-Reply-To: <41ED8B95.8020506@iastate.edu>
Message-ID: 

You should double check the versions you have installed on both systems, it may well be that one is out-of-date with respect to
connecting to genbank and the other is not. If you do indeed have a version of bioperl <1.4 installed on your windows machine,
follow my instructions to install 1.4 (1.5 should be available via PPM shortly after it's official release - some time soon!)

Nathan

> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Tim Alcon
> Sent: 18 January 2005 22:20
> To: bioperl-l@portal.open-bio.org
> Subject: [Bioperl-l] accessing GenBank
> 
> I seem unable to access GenBank.  When running bptutorial.exe, it seems
> like all the other examples run fine except that one.  Anyone know why
> that would be?  I'm using ActivePerl on Windows XP.  I have whichever
> version of bioperl is the current default using ppm (it's at least
> 1.0).   When I run the exact same code from my campus Unix account, it
> works fine.
> 
> Tim
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> ---
> avast! Antivirus: Inbound message clean.
> Virus Database (VPS): 0503-0, 18/01/2005
> Tested on: 19/01/2005 08:41:49
> avast! is copyright (c) 2000-2003 ALWIL Software.
> http://www.avast.com
> 
> 

---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0503-0, 18/01/2005
Tested on: 19/01/2005 09:05:03
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com




From michael.watson at bbsrc.ac.uk  Wed Jan 19 04:17:30 2005
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Wed Jan 19 04:17:01 2005
Subject: [Bioperl-l] My last email about Bio::Graphics::Panel, please HELP
Message-ID: <8975119BCD0AC5419D61A9CF1A923E95E89BC4@iahce2knas1.iah.bbsrc.reserved>

Just for completion, Dan and I looked at this outside of the list and
finally discovered what he actually wanted was:

$q->print($map);
$q->print($panel->png);

Which makes a LOT more sense.... :-)

-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of
palmeida@igc.gulbenkian.pt
Sent: 18 January 2005 12:01
To: Danielucg Sousa
Cc: bioperl-l@portal.open-bio.org
Subject: Re: [Bioperl-l] My last email about Bio::Graphics::Panel,
please HELP


Hi,

Have you tried: print $map;

You are using it as if $map were a subroutine of CGI, but you just want
to print whatever is in the variable $map.

-Paulo

On Tue, Jan 18, 2005 at 07:34:50AM -0300, Danielucg Sousa wrote:
> Hi,
> 
> I'm showing a sequence on browser, but I not get do a
> link http.
> When a use: print $q->$map;
> The out messanger is:
> Undefined subroutine CGI:: id="bgmap00001">
>  href="http://www.google.com.br" title="test 2"
> alt="test 2" />
>  href="http://www.google.com.br" title="test 2"
> alt="test 2" />
> 
> 
> Please, What I do?
> I have used Bioperl 1.5 RC 2
> Thanky for all.
> 
> My little code :
> #!/usr/bin/perl -wT
> 
> use strict;
> use Bio::Graphics;
> use Bio::Graphics::FeatureFile;
> use Bio::SeqIO;
> use Bio::SeqFeature::Generic;
> use CGI  qw / :standard /;
> use CGI::Pretty;
> 
> my $wholeseq = Bio::SeqFeature::Generic->new(-start=>1,-end=>600);
> 
> my $q = new CGI;
> 
> print $q->header('text/html');
> print $q->start_html('A Vector Rendering ');
> 
> print $q->h1('teste');
> my $panel = Bio::Graphics::Panel->new(-length  => 
> 1000, -width  => 800, -pad_left     => 10,  -pad_right
>    => 10,  -key_style =>'none', -spacing => -0.25,
> -box_subparts => 'true',-link =>
> "http://www.google.com");
> 
> my $track =  $panel->add_track($wholeseq,  -glyph  =>
> 'transcript2', -bgcolor =>'orange', -bump   => 0,
> -height =>12,-title=>'test 2', -link =>'http://www.google.com.br' );
> 
> my $feature = Bio::SeqFeature::Generic->new(-display_name=>'teste',
> -score=>20, -start=>400, -end=>800,
> -url=>'http://www.google.com' );
>  $track -> add_feature($feature);
>       
>  my ($url,$map,$mapname) = $panel->image_and_map(-root
> => '/var/www/html',-url => '/tmpimages', -link => 
> "http://www.google.com" );
>  
> print $q->img({-src=>$url,-usemap=>"#$mapname", -link
> => "http://www.google.com" });
> print $q->$map;
> print $q->($panel->png);
> $panel->finished;
> print $q->exit_html;
> 
> exit;
> 
> Thank you very much,
> Daniel Xavier

-- 
Paulo Almeida
Instituto Gulbenkian de Ciencia
Apartado 14, 2781-901, Oeiras, PORTUGAL
tel  +351 21 446 46 35
fax  +351 21 440 79 70
http://www.igc.gulbenkian.pt
_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l

From palmeida at igc.gulbenkian.pt  Wed Jan 19 06:48:16 2005
From: palmeida at igc.gulbenkian.pt (Paulo Almeida)
Date: Wed Jan 19 06:43:34 2005
Subject: [Bioperl-l] Sequence features - complete sequences
Message-ID: <20050119114816.GA2618@bioinf.igc.gulbenkian.pt>

Hi,

I want to retrieve only complete sequences from GenPept records. I'm not
sure this is possible, because the notation may not be consistent, but I
was thinking of checking the 'Protein' feature for something like
<1..>952 (protein here:
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=34223805), which says there is more sequence upstream. I have
been looking at BioPerl feature objects but I couldn't find a way of
doing this. What module would need to be changed, or what code could be
written (maybe using the 'tag' feature of SeqFeature::Generic.pm? But then there would have to be some code attached to the tag, to parse the required information.) to accomplish
this?

Thank you,
Paulo

-- 
Paulo Almeida
Instituto Gulbenkian de Ciencia
Apartado 14, 2781-901, Oeiras, PORTUGAL
tel  +351 21 446 46 35
fax  +351 21 440 79 70
http://www.igc.gulbenkian.pt
From Marc.Logghe at devgen.com  Wed Jan 19 07:32:39 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Wed Jan 19 07:31:34 2005
Subject: [Bioperl-l] Sequence features - complete sequences
Message-ID: 

> I want to retrieve only complete sequences from GenPept 
> records. I'm not
> sure this is possible, because the notation may not be 
> consistent, but I
> was thinking of checking the 'Protein' feature for something like
> <1..>952 (protein here:
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=34223805), 
> which says there is more sequence upstream. I have

I think in a lot of cases this means that there _is_ more sequence in _reality_, but not available in the databases (e.g. not a full length cDNA clone). Just because that part was never isolated and thus, not sequenced. Meaning there is no way to fetch the missing sequence information.
Regards,
Marc

From palmeida at igc.gulbenkian.pt  Wed Jan 19 07:52:26 2005
From: palmeida at igc.gulbenkian.pt (Paulo Almeida)
Date: Wed Jan 19 07:47:24 2005
Subject: [Bioperl-l] Sequence features - complete sequences
In-Reply-To: 
References: 
Message-ID: <20050119125226.GB2618@bioinf.igc.gulbenkian.pt>

That's what I thought, but I don't want the rest of the information; I
just want to skip those sequences, because I am using them to generate
ProtDist matrices, and they may distort the results.

-Paulo

> I think in a lot of cases this means that there _is_ more sequence in _reality_, but not available in the databases (e.g. not a full length cDNA clone). Just because that part was never isolated and thus, not sequenced. Meaning there is no way to fetch the missing sequence information.
> Regards,
> Marc

-- 
Paulo Almeida
Instituto Gulbenkian de Ciencia
Apartado 14, 2781-901, Oeiras, PORTUGAL
tel  +351 21 446 46 35
fax  +351 21 440 79 70
http://www.igc.gulbenkian.pt
From jason.stajich at duke.edu  Wed Jan 19 08:17:46 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed Jan 19 08:15:00 2005
Subject: [Bioperl-l] Sequence features - complete sequences
In-Reply-To: <20050119125226.GB2618@bioinf.igc.gulbenkian.pt>
References: 
	<20050119125226.GB2618@bioinf.igc.gulbenkian.pt>
Message-ID: <8297945F-6A1C-11D9-8F42-000393C44276@duke.edu>

This is encoded in the Location object. In fact if the location is not 
exact we create a "fuzzy" location, you can just test if it is-a 
Bio::Location::FuzzyLocationI.

More properly (if you only cared about proteins that were incomplete in 
C-terminus or N-terminus) - You just need to check the  start_pos_type 
and  end_pos_type of the location. If they are 'EXACT' then the 
position is, well, exact.
if( $f->location->start_pos_type eq 'EXACT' && 
$f->location->end_pos_type eq 'EXACT' ) {
}

-jason
On Jan 19, 2005, at 7:52 AM, Paulo Almeida wrote:

> That's what I thought, but I don't want the rest of the information; I
> just want to skip those sequences, because I am using them to generate
> ProtDist matrices, and they may distort the results.
>
> -Paulo
>
>> I think in a lot of cases this means that there _is_ more sequence in 
>> _reality_, but not available in the databases (e.g. not a full length 
>> cDNA clone). Just because that part was never isolated and thus, not 
>> sequenced. Meaning there is no way to fetch the missing sequence 
>> information.
>> Regards,
>> Marc
>
> -- 
> Paulo Almeida
> Instituto Gulbenkian de Ciencia
> Apartado 14, 2781-901, Oeiras, PORTUGAL
> tel  +351 21 446 46 35
> fax  +351 21 440 79 70
> http://www.igc.gulbenkian.pt
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From g0404203 at nus.edu.sg  Wed Jan 19 01:57:34 2005
From: g0404203 at nus.edu.sg (Lee Ping Alison)
Date: Wed Jan 19 08:16:17 2005
Subject: [Bioperl-l] Attribute Tags in Bio::Tools::GFF &
	Bio::SeqFeature::Generic
Message-ID: <000201c4fdf4$4d8536c0$7347d90a@imcb.astar.edu.sg>

Hi,

Referring to the last column of the GFF file format which holds various user-specified attributes (tags), if I need to read in a GFF file and output the information to another GFF file, how do I retain the tags in the output?

e.g. 
i've used the following code:

my $gff = Bio::Tools::GFF->new(-file => $file, -gff_version => 2);
my $f = $gff->next_feature;
print $f->gff_string, "\n";


input is:
chr13   hg15.chr13      transcript      17950005        17951026        .       .       .       -name "chr13.0"

the output becomes:
chr13   hg15.chr13      transcript      17950005        17951026        .       .       .


Is there some way to retain the tag information? I figure this is related to the way the GFF line is parsed and the way the Generic feature object is created.

Thanks a lot in advance!

Best Regards,
Alison.
From malatorr at genoma.ciencias.uchile.cl  Wed Jan 19 08:57:00 2005
From: malatorr at genoma.ciencias.uchile.cl (Mariano Latorre A)
Date: Wed Jan 19 08:55:04 2005
Subject: [Bioperl-l] help with PHRAP assembly & Bio::Graphics
Message-ID: <1106143020.8976.17.camel@peach4>

Hi!

I'm developing a bioperl cgi to show a png PHRAP assembly using
Bio::Graphics + perl cgi.

The problem is that the Contig deffines the zero position and usually
the alignment ESTs are located before the contig. This implies I need to
use negative positions...but Bio::graphics doesn't allow to use negative
positions...it just cuts them off.

PD: I paste my source code.
-- 
Mariano Latorre A 
Universidad de Chile

######################################################################################
#THE CGI "render.pl"

#!/usr/bin/perl


use CGI;
use lib "$ENV{HOME}/projects/bioperl-live";
use Bio::Graphics;
use Bio::SeqFeature::Generic;

my $form = new CGI;
print "Content-type: image/png\n\n";

my $panel = Bio::Graphics::Panel->new(-length => 2000,-width  => 1800, -
pad_left => 10, -pad_right => 10,);

my $full_length = Bio::SeqFeature::Generic->new(-start=>$form->param
("start"),-end=>$form->param("end"));

$panel->add_track($full_length,
                  -glyph   => 'arrow',
                  -tick    => 2,
                  -fgcolor => 'black',
                  -double  => 1,
                 );

my $track = $panel->add_track(-glyph => 'graded_segments',
                              -label  => 1,
                              -bgcolor => 'blue',
                              -min_score => -200,
                              -max_score => 1000);


for($i=1;defined($form->param("est$i"));$i++){
  my($name,$score,$start,$end) = split /\@/,$form->param("est$i");
  my $feature = Bio::SeqFeature::Generic->new(-display_name=>$name,
            -score=>$score,
            -start=>$start,
            -end=>$end);
  $track->add_feature($feature);
}
print $panel->png;


######################################################################################
# The Url to call the CGI
render.pl?
est1=hola@300@-200@367@&est2=chau@50@300@600@&est3=nada@50@310@25@&start=1&end=800

######################################################################################


From crabtree at tigr.org  Wed Jan 19 10:07:19 2005
From: crabtree at tigr.org (Crabtree, Jonathan)
Date: Wed Jan 19 10:05:06 2005
Subject: [Bioperl-l] help with PHRAP assembly & Bio::Graphics
Message-ID: 


Mariano-

You should be able to use negative coordinates by setting the -offset
parameter (to the absolute value of the smallest negative coordinate
that you want to use in your image) when you call Panel->new().  Someone
else asked about this a few months ago and reported that this solution
worked for them:

http://bioperl.org/pipermail/bioperl-l/2004-July/016538.html

Jonathan


> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org 
> [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of 
> Mariano Latorre A
> Sent: Wednesday, January 19, 2005 8:57 AM
> To: bioperl-l@bioperl.org
> Subject: [Bioperl-l] help with PHRAP assembly & Bio::Graphics
> 
> 
> Hi!
> 
> I'm developing a bioperl cgi to show a png PHRAP assembly 
> using Bio::Graphics + perl cgi.
> 
> The problem is that the Contig deffines the zero position and 
> usually the alignment ESTs are located before the contig. 
> This implies I need to use negative positions...but 
> Bio::graphics doesn't allow to use negative positions...it 
> just cuts them off.
> 
> PD: I paste my source code.
> -- 
> Mariano Latorre A 
> Universidad de Chile
> 
> ##############################################################
> ########################
> #THE CGI "render.pl"
> 
> #!/usr/bin/perl
> 
> 
> use CGI;
> use lib "$ENV{HOME}/projects/bioperl-live";
> use Bio::Graphics;
> use Bio::SeqFeature::Generic;
> 
> my $form = new CGI;
> print "Content-type: image/png\n\n";
> 
> my $panel = Bio::Graphics::Panel->new(-length => 2000,-width  
> => 1800, - pad_left => 10, -pad_right => 10,);
> 
> my $full_length = Bio::SeqFeature::Generic->new(-start=>$form->param
> ("start"),-end=>$form->param("end"));
> 
> $panel->add_track($full_length,
>                   -glyph   => 'arrow',
>                   -tick    => 2,
>                   -fgcolor => 'black',
>                   -double  => 1,
>                  );
> 
> my $track = $panel->add_track(-glyph => 'graded_segments',
>                               -label  => 1,
>                               -bgcolor => 'blue',
>                               -min_score => -200,
>                               -max_score => 1000);
> 
> 
> for($i=1;defined($form->param("est$i"));$i++){
>   my($name,$score,$start,$end) = split /\@/,$form->param("est$i");
>   my $feature = Bio::SeqFeature::Generic->new(-display_name=>$name,
>             -score=>$score,
>             -start=>$start,
>             -end=>$end);
>   $track->add_feature($feature);
> }
> print $panel->png;
> 
> 
> ##############################################################
> ########################
> # The Url to call the CGI
> render.pl? 
> est1=hola@300@-200@367@&est2=chau@50@300@600@&est3=nada@50@310
> @25@&start=1&end=800
> 
> ##############################################################
> ########################
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org 
> http://portal.open-> bio.org/mailman/listinfo/bioperl-l
> 

From crabtree at tigr.org  Wed Jan 19 10:22:24 2005
From: crabtree at tigr.org (Crabtree, Jonathan)
Date: Wed Jan 19 10:20:44 2005
Subject: [Bioperl-l] help with PHRAP assembly & Bio::Graphics
Message-ID: 


Mariano-

I just realized that I got the sign wrong (again!).  You'll want to set -offset to a *negative* number, not a positive number.  For example:

#!/usr/bin/perl

use Bio::Graphics::Panel;
use Bio::SeqFeature::Generic;

my $panel = Bio::Graphics::Panel->new(-length=> 1000, -offset=> -100, -width=> 600);

my $scale = Bio::SeqFeature::Generic->new(-start => -75, -end => 100);
$panel->add_track($scale, 
		  -glyph => 'anchored_arrow',
		  -tick => 2,
		  -fontcolor => '#3d5315',
		  -fgcolor => '#3d5315',
		  -bgcolor => '#e3ffb7',
		  );

my $gd =  $panel->gd();
print $gd->png();



-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org on behalf of Crabtree, Jonathan
Sent: Wed 1/19/2005 10:07 AM
To: malatorr@genoma.ciencias.uchile.cl; bioperl-l@bioperl.org
Subject: RE: [Bioperl-l] help with PHRAP assembly & Bio::Graphics
 

Mariano-

You should be able to use negative coordinates by setting the -offset
parameter (to the absolute value of the smallest negative coordinate
that you want to use in your image) when you call Panel->new().  Someone
else asked about this a few months ago and reported that this solution
worked for them:

http://bioperl.org/pipermail/bioperl-l/2004-July/016538.html

Jonathan


> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org 
> [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of 
> Mariano Latorre A
> Sent: Wednesday, January 19, 2005 8:57 AM
> To: bioperl-l@bioperl.org
> Subject: [Bioperl-l] help with PHRAP assembly & Bio::Graphics
> 
> 
> Hi!
> 
> I'm developing a bioperl cgi to show a png PHRAP assembly 
> using Bio::Graphics + perl cgi.
> 
> The problem is that the Contig deffines the zero position and 
> usually the alignment ESTs are located before the contig. 
> This implies I need to use negative positions...but 
> Bio::graphics doesn't allow to use negative positions...it 
> just cuts them off.
> 
> PD: I paste my source code.
> -- 
> Mariano Latorre A 
> Universidad de Chile
> 
> ##############################################################
> ########################
> #THE CGI "render.pl"
> 
> #!/usr/bin/perl
> 
> 
> use CGI;
> use lib "$ENV{HOME}/projects/bioperl-live";
> use Bio::Graphics;
> use Bio::SeqFeature::Generic;
> 
> my $form = new CGI;
> print "Content-type: image/png\n\n";
> 
> my $panel = Bio::Graphics::Panel->new(-length => 2000,-width  
> => 1800, - pad_left => 10, -pad_right => 10,);
> 
> my $full_length = Bio::SeqFeature::Generic->new(-start=>$form->param
> ("start"),-end=>$form->param("end"));
> 
> $panel->add_track($full_length,
>                   -glyph   => 'arrow',
>                   -tick    => 2,
>                   -fgcolor => 'black',
>                   -double  => 1,
>                  );
> 
> my $track = $panel->add_track(-glyph => 'graded_segments',
>                               -label  => 1,
>                               -bgcolor => 'blue',
>                               -min_score => -200,
>                               -max_score => 1000);
> 
> 
> for($i=1;defined($form->param("est$i"));$i++){
>   my($name,$score,$start,$end) = split /\@/,$form->param("est$i");
>   my $feature = Bio::SeqFeature::Generic->new(-display_name=>$name,
>             -score=>$score,
>             -start=>$start,
>             -end=>$end);
>   $track->add_feature($feature);
> }
> print $panel->png;
> 
> 
> ##############################################################
> ########################
> # The Url to call the CGI
> render.pl? 
> est1=hola@300@-200@367@&est2=chau@50@300@600@&est3=nada@50@310
> @25@&start=1&end=800
> 
> ##############################################################
> ########################
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org 
> http://portal.open-> bio.org/mailman/listinfo/bioperl-l
> 

_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From kmdaily at indiana.edu  Wed Jan 19 11:48:38 2005
From: kmdaily at indiana.edu (Daily, Kenneth Michael)
Date: Wed Jan 19 11:44:49 2005
Subject: [Bioperl-l] Reading all sequences using Bio::DB::Flat in SwissProt
	file
Message-ID: 

I want to work with a local copy of the SwissProt database, and need to search through all of the entries. I only see methods to return sequences by accession. However, I cannot use just FASTA format of the SwissProt records, as I need to use the feature fields. What I need to learn is how to do a DB search on the features field of the SwissProt records, if its possible. Would there be any advantage do doing it with the DB instead of just using SeqIO as an input stream? I think it might, since every time I want to do a search I must read in the entire file again, which is very costly. Thank you.

Kenny Daily
Indiana University
School of Informatics
kmdaily [at] indiana [dot] edu

From sdavis2 at mail.nih.gov  Wed Jan 19 13:01:21 2005
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed Jan 19 12:57:35 2005
Subject: [Bioperl-l] Reading all sequences using Bio::DB::Flat in
	SwissProt file
In-Reply-To: 
References: 
Message-ID: <2083AF7C-6A44-11D9-B052-000D933565E8@mail.nih.gov>

Kenny,

If this is something you are going to be doing often, you might want to 
look at bioperl-db.  Alternatively, UCSC maintains a fully-relational 
swissprot database 
(http://hgdownload.cse.ucsc.edu/goldenPath/swissProt/database/) that 
you could pretty easily load into a mysql server.  You can access their 
mysql server directly (let me know if you want to do this), also, but 
if you are running any kind of batch query, I would suggest you 
download the tables and load them yourself (really pretty easy to do).

Sean

On Jan 19, 2005, at 11:48 AM, Daily, Kenneth Michael wrote:

> I want to work with a local copy of the SwissProt database, and need 
> to search through all of the entries. I only see methods to return 
> sequences by accession. However, I cannot use just FASTA format of the 
> SwissProt records, as I need to use the feature fields. What I need to 
> learn is how to do a DB search on the features field of the SwissProt 
> records, if its possible. Would there be any advantage do doing it 
> with the DB instead of just using SeqIO as an input stream? I think it 
> might, since every time I want to do a search I must read in the 
> entire file again, which is very costly. Thank you.
>
> Kenny Daily
> Indiana University
> School of Informatics
> kmdaily [at] indiana [dot] edu
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

From jason.stajich at duke.edu  Wed Jan 19 13:30:39 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed Jan 19 13:34:47 2005
Subject: [Bioperl-l] Fwd: [Bioperl-guts-l] [Bug 1734] New: a RefSeq entry
	not converted to SwissProt format
Message-ID: <37FB8937-6A48-11D9-88CC-000393C44276@duke.edu>


Do any Swissprot experts know what the feature table should look like 
for this type of feature table (genpept)?

      Site            join(243,538)
                      /note="involved in regulation of interaction with
                      glutamyl substrate"
                      /site_type="unclassified"
      Site            join(279,338,361)
                      /note="catalytic triad"
                      /site_type="active"
      Site            join(403,450,455)
                      /note="involved in Ca2+ complexation"
                      /site_type="unclassified"

-jason


Begin forwarded message:

> From: bugzilla-daemon@portal.open-bio.org
> Date: January 19, 2005 8:34:10 AM EST
> To: bioperl-guts-l@bioperl.org
> Cc: Subject: [Bioperl-guts-l] [Bug 1734] New: a RefSeq entry not 
> converted to SwissProt format
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=1734
>
>            Summary: a RefSeq entry not converted to SwissProt format
>            Product: Bioperl
>            Version: 1.4 branch
>           Platform: Sun
>         OS/Version: Solaris
>             Status: NEW
>           Severity: normal
>           Priority: P2
>          Component: Bio::SeqIO
>         AssignedTo: bioperl-guts-l@bioperl.org
>         ReportedBy: laurent.falquet@isb-sib.ch
>
>
> The RefSeq entry NP_443187 generates an error when I tried to convert 
> it to SwissProt format using
> Bio::SeqIO and genbank format as input. (all other RefSeq entries are 
> converted normally using the
> same method).
>
> Here is the error message:
> len 1 is 56 len 2 is 34
> Error sequence not parsable
> Programming error - cannot called write_line_swissprot_regex with 
> different length
> pre1 (FT   Site     join(279,338,361) join(279,338,361)       ) and
> pre2 (FT                                ) tags! at 
> /usr/local/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line
> 1124,  line 98.
>
>
>
> ------- You are receiving this mail because: -------
> You are the assignee for the bug, or are watching the assignee.
> _______________________________________________
> Bioperl-guts-l mailing list
> Bioperl-guts-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 2497 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050119/23f4032c/attachment.bin
From cjm at fruitfly.org  Wed Jan 19 15:37:25 2005
From: cjm at fruitfly.org (Chris Mungall)
Date: Wed Jan 19 15:33:38 2005
Subject: [Bioperl-l] Reading all sequences using Bio::DB::Flat in	SwissProt
	file
In-Reply-To: <2083AF7C-6A44-11D9-B052-000D933565E8@mail.nih.gov>
References: 
	<2083AF7C-6A44-11D9-B052-000D933565E8@mail.nih.gov>
Message-ID: 


This is a good solution. If you're after something a bit more lightweight
than a relational solution, which typically involves a lot of admin and
(often slow) database loading (although this isn't a problem here as the
UCSC folks are nice enough to make their SP db available), then you may
want to look into an xml db solution

For example, you can download the swiss xml from the EBI and stick it into
something like Apache Xindice, then grab the sequences you want using an
arbitrary XPath query, and transform the results with something like XSLT
or XML::Twig. There's more of an initial learning curve but the same
solution pattern is reusable in lots of other contexts.

XPath isn't as powerful as SQL, but on the other hand the admin & coding
overhead is lower. It's very similar to the Bio::Index solution, with the
additional advantage of more queries & indexing.

There's also SRS too, which give you fairly flexible querying
capabilities. YMMV.

Cheers
Chris

On Wed, 19 Jan 2005, Sean Davis wrote:

> Kenny,
>
> If this is something you are going to be doing often, you might want to
> look at bioperl-db.  Alternatively, UCSC maintains a fully-relational
> swissprot database
> (http://hgdownload.cse.ucsc.edu/goldenPath/swissProt/database/) that
> you could pretty easily load into a mysql server.  You can access their
> mysql server directly (let me know if you want to do this), also, but
> if you are running any kind of batch query, I would suggest you
> download the tables and load them yourself (really pretty easy to do).
>
> Sean
>
> On Jan 19, 2005, at 11:48 AM, Daily, Kenneth Michael wrote:
>
> > I want to work with a local copy of the SwissProt database, and need
> > to search through all of the entries. I only see methods to return
> > sequences by accession. However, I cannot use just FASTA format of the
> > SwissProt records, as I need to use the feature fields. What I need to
> > learn is how to do a DB search on the features field of the SwissProt
> > records, if its possible. Would there be any advantage do doing it
> > with the DB instead of just using SeqIO as an input stream? I think it
> > might, since every time I want to do a search I must read in the
> > entire file again, which is very costly. Thank you.
> >
> > Kenny Daily
> > Indiana University
> > School of Informatics
> > kmdaily [at] indiana [dot] edu
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
From yguo at vbi.vt.edu  Wed Jan 19 16:16:02 2005
From: yguo at vbi.vt.edu (yguo@vbi.vt.edu)
Date: Wed Jan 19 16:12:41 2005
Subject: [Bioperl-l] Automatic retrieve pdf file from the publisher website.
In-Reply-To: 
References: <2083AF7C-6A44-11D9-B052-000D933565E8@mail.nih.gov>
	
Message-ID: <1280.198.82.192.209.1106169362.squirrel@webmail.vbi.vt.edu>

Hi,

While working for BRC project in VBI, I wrote a perl module for retrieving
full text pdf files from the publisher website using the information of
pubmed abstract page. If anyone wants to use it, you can contact with me.
I will see if it is worthwhile to put it in Bioperl.

Yongjian
at
Virginia Bioinformatics Institute.


From Peter.Robinson at t-online.de  Wed Jan 19 16:53:04 2005
From: Peter.Robinson at t-online.de (Peter Robinson)
Date: Wed Jan 19 16:48:29 2005
Subject: [Bioperl-l] Automatic retrieve pdf file from the publisher
	website.
In-Reply-To: <1280.198.82.192.209.1106169362.squirrel@webmail.vbi.vt.edu>
References: 
	<2083AF7C-6A44-11D9-B052-000D933565E8@mail.nih.gov>
	
	<1280.198.82.192.209.1106169362.squirrel@webmail.vbi.vt.edu>
Message-ID: <1106171584.3667.16.camel@localhost.localdomain>

That sounds extremely interesting and I would appreciate getting a copy
for testing.
-peter


On Wed, 2005-01-19 at 22:16, yguo@vbi.vt.edu wrote:
> Hi,
> 
> While working for BRC project in VBI, I wrote a perl module for retrieving
> full text pdf files from the publisher website using the information of
> pubmed abstract page. If anyone wants to use it, you can contact with me.
> I will see if it is worthwhile to put it in Bioperl.
> 
> Yongjian
> at
> Virginia Bioinformatics Institute.
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
-- 
Peter N. Robinson
peter.robinson@t-online.de
peter.robinson@charite.de
http://www.charite.de/ch/medgen/robinson/

From cain at cshl.edu  Wed Jan 19 16:56:07 2005
From: cain at cshl.edu (Scott Cain)
Date: Wed Jan 19 16:52:19 2005
Subject: [Bioperl-l] Re: GFF3
In-Reply-To: 
Message-ID: 

I just did a cvs update and the last few tests are failing on MacOSX 10.3.
I'll try to sort it out over the next couple of days.

Scott

----------------------------------------------------------------------
Scott Cain, Ph. D.				 	 cain@cshl.org
GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
----------------------------------------------------------------------


On Tue, 18 Jan 2005, Allen Day wrote:

> > The first series of errors die because the feature ID=AB000114 in 
> > t/data/knownGene.gff3 has several Dbxrefs separated with ';' instead of 
> > ','
> 
> i'm not getting these errors, are you are in sync with cvs HEAD?
> 
> > The second failure is because  hybrid1.gff3 isn't in cvs
> 
> gff files are in cvs now.
> 
> > 
> > Rob
> > 
> > 
> > 
> > % perl -I. -w t/FeatureIO.t
> > 1..19
> > ok 1
> > ok 2
> > ok 3
> > ok 4
> > ok 5
> > ok 6
> > Use of uninitialized value in substitution (s///) at 
> > Bio/FeatureIO/gff.pm line 590,  line 10.
> > Use of uninitialized value in substitution (s///) at 
> > Bio/FeatureIO/gff.pm line 591,  line 10.
> > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> >  line 10.
> > Use of uninitialized value in substitution (s///) at 
> > Bio/FeatureIO/gff.pm line 590,  line 10.
> > Use of uninitialized value in substitution (s///) at 
> > Bio/FeatureIO/gff.pm line 591,  line 10.
> > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> >  line 10.
> > Use of uninitialized value in substitution (s///) at 
> > Bio/FeatureIO/gff.pm line 590,  line 10.
> > Use of uninitialized value in substitution (s///) at 
> > Bio/FeatureIO/gff.pm line 591,  line 10.
> > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> >  line 10.
> > Use of uninitialized value in substitution (s///) at 
> > Bio/FeatureIO/gff.pm line 590,  line 10.
> > Use of uninitialized value in substitution (s///) at 
> > Bio/FeatureIO/gff.pm line 591,  line 10.
> > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> >  line 10.
> > Use of uninitialized value in substitution (s///) at 
> > Bio/FeatureIO/gff.pm line 590,  line 10.
> > Use of uninitialized value in substitution (s///) at 
> > Bio/FeatureIO/gff.pm line 591,  line 10.
> > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> >  line 10.
> > Use of uninitialized value in substitution (s///) at 
> > Bio/FeatureIO/gff.pm line 590,  line 10.
> > Use of uninitialized value in substitution (s///) at 
> > Bio/FeatureIO/gff.pm line 591,  line 10.
> > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> >  line 10.
> > ok 7
> > ok 8
> > 
> > ------------- EXCEPTION  -------------
> > MSG: Could not open t/data/hybrid1.gff3: No such file or directory
> > STACK Bio::Root::IO::_initialize_io Bio/Root/IO.pm:314
> > STACK Bio::FeatureIO::_initialize Bio/FeatureIO.pm:345
> > STACK Bio::FeatureIO::gff::_initialize Bio/FeatureIO/gff.pm:92
> > STACK Bio::FeatureIO::new Bio/FeatureIO.pm:268
> > STACK Bio::FeatureIO::new Bio/FeatureIO.pm:288
> > STACK toplevel t/FeatureIO.t:83
> > 
> > --------------------------------------
> > 
> 

From cain at cshl.edu  Wed Jan 19 16:57:54 2005
From: cain at cshl.edu (Scott Cain)
Date: Wed Jan 19 16:54:02 2005
Subject: [Bioperl-l] Re: GFF3
In-Reply-To: 
Message-ID: 

Allen,

Sorry about the ID problem/question--FeatureIO is fine in that respect.  I
was misremembering a problem with a chado loader as a bioperl problem.

Thanks,
Scott

----------------------------------------------------------------------
Scott Cain, Ph. D.				 	 cain@cshl.org
GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
----------------------------------------------------------------------


On Mon, 17 Jan 2005, Allen Day wrote:

> Hi,
> 
> On Mon, 17 Jan 2005, Scott Cain wrote:
> 
> > Hi Rob,
> > 
> > Thanks for your work on this--I've put several comments in your
> > original message below.
> > 
> > Scott
> > 
> > ---------Original Message--------
> > Date: Sat, 15 Jan 2005 15:22:23 -0800
> > From: Rob Edwards 
> > Subject: [Bioperl-l] GFF3
> > To: Bioperl list 
> > 
> > Because I need it for some things that I am doing, I have worked quite 
> > a bit on the GFF3 parser Bio::FeatureIO::gff. Several people have 
> > written this module, I have just made some cosmetic changes:
> > 
> > I have improved the validation processes that are applied as a gff3 
> > file is parsed, and the module should now validate essentially 
> > everything in the file except alignments. Validation is optional and is 
> > based on the specification described at : 
> > http://song.sourceforge.net/gff3.shtml
> > 
> > SC> Excellent--Did you happen to relax the requirement that ID be unique
> > SC> for each line of the GFF?  Allen and I put that in due to a misreading
> > SC> of the spec.  The ID has to be unique for a *feature*, which can be
> > SC> spread across several lines.
> 
> I'm not sure if this is taken care of in the code... actually, I'm a bit 
> foggy on exactly what the problem is.
> 
> > For clarification and edification I have created a couple of tables
> > describing the module and the validation that is applied to GFF3 files,
> > which you can see online: http://www.salmonella.org/bioperl/gff3.html
> > 
> > SC> Very nice and well done--do you happen to have a pod-ified version
> > SC> of this page?  It would be nice to include in the pod for
> > SC> Bio::FeatureIO::gff.
> 
> That's nice, I'd like to see it folded into the gff.pm perldoc as well.
> 
> > I also wrote a Bio::SeqIO::gff module. Since gff3 files can hold 
> > sequences, it seems that you'd want to be able to call the next_seq 
> > methods, and therefore SeqIO is more appropriate than FeatureIO for 
> > those aspects. Currently the SeqIO module uses the FeatureIO module for 
> > parsing the file, it just reorganizes things.
> > 
> > This provides two different interfaces for getting objects out of GFF3 
> > files:
> > 	Bio::FeatureIO::gff will return Bio::SeqFeature::Annotated objects 
> > representing the annotations.
> > 	Bio::SeqIO::gff will return Bio::Seq objects representing the 
> > sequences with all the annotations attached.
> > 
> > The other difference between the two is that the former passes out the 
> > objects as they are read, but the latter has to read the whole file to 
> > get the annotations and the sequences.
> > 
> > SC> I thought about doing something similar with SeqIO, but I am worried 
> > SC> about the case where somebody tries to use SeqIO on a well 
> > SC> annotated human Chr1 GFF3 file (if one were ever to exist :-) ,
> > SC> but I suppose the same machine killing thing could be done if
> > SC> someone tried to use SeqIO on a genbank file of Chr1.
> 
> See my previous email, I don't think we need the SeqIO module.
> 
> > At the moment I focussed on reading GFF3 files.
> > 
> > I have not committed these to cvs yet, pending comments from others. I 
> > have some specific questions:
> > 	Should I wait until after 1.5 is out?
> > 
> > SC> I don't have the definative answer, but I would say it doesn't
> > SC> matter much, as long as it passes tests.  Bio::FeatureIO::gff is
> > SC> hardly a fully functional module as it is, so if we could 
> > SC> squeeze a little more functionality into it before we
> > SC> release it, that would be fine with me.
> 
> well it's in now.  and it passes tests.  there weren't any before, but i 
> wrote some.  look in t/FeatureIO.t
> 
> > 	Is two separate modules really the right way to go about this?
> > 
> > SC> As long as it works for this case, I don't mind:  calling
> > SC> 'next_feature' on a FeatureIO object until I run out of features
> > SC> and then calling 'next_sequence' (and get a Bio::PrimarySeq) on
> > SC> the same FeatureIO object until I run out of sequences.
> > 
> > 	What about other GFF modules (like Bio::Tools::GFF)?
> > 
> > SC> I am willing to let Bio::Tools::GFF die a terrible death.  While
> > SC> it will have to be kept around for apps that depend on it, I don't
> > SC> see adding any major functionality as time well spent.
> > 
> > 	Could someone give the modules a workout and let me know about bugs? I 
> > am sure there are many.
> > 
> > SC> I will try to soon, but it won't be until next week at 
> > SC> the earliest.
> > 
> > I have posted these modules online via anonymous ftp at 
> > ftp://ftp.salmonella.org/rob/bioperl/GFF_modules.tgz
> > Take a look and let me know what you do and don't like!
> > 
> > Rob
> > 
> > 
> > ----------------------------------------------------------------------
> > Scott Cain, Ph. D.				 	 cain@cshl.org
> > GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
> > ----------------------------------------------------------------------
> > 
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 

From allenday at ucla.edu  Wed Jan 19 17:28:40 2005
From: allenday at ucla.edu (Allen Day)
Date: Wed Jan 19 17:27:08 2005
Subject: [Bioperl-l] Automatic retrieve pdf file from the publisher
	website.
In-Reply-To: <1106171584.3667.16.camel@localhost.localdomain>
References: 
	<2083AF7C-6A44-11D9-B052-000D933565E8@mail.nih.gov>
	
	<1280.198.82.192.209.1106169362.squirrel@webmail.vbi.vt.edu>
	<1106171584.3667.16.camel@localhost.localdomain>
Message-ID: 

please post the code here.  i've been meaning to add that functionality
into Bio::DB::Biblio::eutils.

do you have a list of which publishers are usable in this way?

-allen

On Wed, 19 Jan 2005, Peter Robinson wrote:

> That sounds extremely interesting and I would appreciate getting a copy
> for testing.
> -peter
> 
> 
> On Wed, 2005-01-19 at 22:16, yguo@vbi.vt.edu wrote:
> > Hi,
> > 
> > While working for BRC project in VBI, I wrote a perl module for retrieving
> > full text pdf files from the publisher website using the information of
> > pubmed abstract page. If anyone wants to use it, you can contact with me.
> > I will see if it is worthwhile to put it in Bioperl.
> > 
> > Yongjian
> > at
> > Virginia Bioinformatics Institute.
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
From allenday at ucla.edu  Wed Jan 19 17:31:26 2005
From: allenday at ucla.edu (Allen Day)
Date: Wed Jan 19 17:28:07 2005
Subject: [Bioperl-l] Re: GFF3
In-Reply-To: 
References: 
Message-ID: 

okay, let me know.  we should probably add some validation tests as well, 
right now i'm just making sure the lines can be processed but don't do any 
typechecking on the document.

Rob, would you mind writing some tests into FeatureIO.t for your 
validation code?

-allen


On Wed, 19 Jan 2005, Scott Cain wrote:

> I just did a cvs update and the last few tests are failing on MacOSX 10.3.
> I'll try to sort it out over the next couple of days.
> 
> Scott
> 
> ----------------------------------------------------------------------
> Scott Cain, Ph. D.				 	 cain@cshl.org
> GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
> ----------------------------------------------------------------------
> 
> 
> On Tue, 18 Jan 2005, Allen Day wrote:
> 
> > > The first series of errors die because the feature ID=AB000114 in 
> > > t/data/knownGene.gff3 has several Dbxrefs separated with ';' instead of 
> > > ','
> > 
> > i'm not getting these errors, are you are in sync with cvs HEAD?
> > 
> > > The second failure is because  hybrid1.gff3 isn't in cvs
> > 
> > gff files are in cvs now.
> > 
> > > 
> > > Rob
> > > 
> > > 
> > > 
> > > % perl -I. -w t/FeatureIO.t
> > > 1..19
> > > ok 1
> > > ok 2
> > > ok 3
> > > ok 4
> > > ok 5
> > > ok 6
> > > Use of uninitialized value in substitution (s///) at 
> > > Bio/FeatureIO/gff.pm line 590,  line 10.
> > > Use of uninitialized value in substitution (s///) at 
> > > Bio/FeatureIO/gff.pm line 591,  line 10.
> > > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> > >  line 10.
> > > Use of uninitialized value in substitution (s///) at 
> > > Bio/FeatureIO/gff.pm line 590,  line 10.
> > > Use of uninitialized value in substitution (s///) at 
> > > Bio/FeatureIO/gff.pm line 591,  line 10.
> > > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> > >  line 10.
> > > Use of uninitialized value in substitution (s///) at 
> > > Bio/FeatureIO/gff.pm line 590,  line 10.
> > > Use of uninitialized value in substitution (s///) at 
> > > Bio/FeatureIO/gff.pm line 591,  line 10.
> > > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> > >  line 10.
> > > Use of uninitialized value in substitution (s///) at 
> > > Bio/FeatureIO/gff.pm line 590,  line 10.
> > > Use of uninitialized value in substitution (s///) at 
> > > Bio/FeatureIO/gff.pm line 591,  line 10.
> > > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> > >  line 10.
> > > Use of uninitialized value in substitution (s///) at 
> > > Bio/FeatureIO/gff.pm line 590,  line 10.
> > > Use of uninitialized value in substitution (s///) at 
> > > Bio/FeatureIO/gff.pm line 591,  line 10.
> > > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> > >  line 10.
> > > Use of uninitialized value in substitution (s///) at 
> > > Bio/FeatureIO/gff.pm line 590,  line 10.
> > > Use of uninitialized value in substitution (s///) at 
> > > Bio/FeatureIO/gff.pm line 591,  line 10.
> > > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> > >  line 10.
> > > ok 7
> > > ok 8
> > > 
> > > ------------- EXCEPTION  -------------
> > > MSG: Could not open t/data/hybrid1.gff3: No such file or directory
> > > STACK Bio::Root::IO::_initialize_io Bio/Root/IO.pm:314
> > > STACK Bio::FeatureIO::_initialize Bio/FeatureIO.pm:345
> > > STACK Bio::FeatureIO::gff::_initialize Bio/FeatureIO/gff.pm:92
> > > STACK Bio::FeatureIO::new Bio/FeatureIO.pm:268
> > > STACK Bio::FeatureIO::new Bio/FeatureIO.pm:288
> > > STACK toplevel t/FeatureIO.t:83
> > > 
> > > --------------------------------------
> > > 
> > 
> 
From cain at cshl.edu  Wed Jan 19 18:30:35 2005
From: cain at cshl.edu (Scott Cain)
Date: Wed Jan 19 18:26:58 2005
Subject: [Bioperl-l] Re: GFF3
In-Reply-To: 
Message-ID: 

Weird: when I run the FeatureIO test on the command line (via `perl
t/FeatureIO.t`), all tests pass.  When I run it as part of 'make test',
tests 20-22 fail.  Does anyone know why that sort of thing might happen?

thanks,
Scott

----------------------------------------------------------------------
Scott Cain, Ph. D.				 	 cain@cshl.org
GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
----------------------------------------------------------------------


On Wed, 19 Jan 2005, Allen Day wrote:

> okay, let me know.  we should probably add some validation tests as well, 
> right now i'm just making sure the lines can be processed but don't do any 
> typechecking on the document.
> 
> Rob, would you mind writing some tests into FeatureIO.t for your 
> validation code?
> 
> -allen
> 
> 
> On Wed, 19 Jan 2005, Scott Cain wrote:
> 
> > I just did a cvs update and the last few tests are failing on MacOSX 10.3.
> > I'll try to sort it out over the next couple of days.
> > 
> > Scott
> > 
> > ----------------------------------------------------------------------
> > Scott Cain, Ph. D.				 	 cain@cshl.org
> > GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
> > ----------------------------------------------------------------------
> > 
> > 
> > On Tue, 18 Jan 2005, Allen Day wrote:
> > 
> > > > The first series of errors die because the feature ID=AB000114 in 
> > > > t/data/knownGene.gff3 has several Dbxrefs separated with ';' instead of 
> > > > ','
> > > 
> > > i'm not getting these errors, are you are in sync with cvs HEAD?
> > > 
> > > > The second failure is because  hybrid1.gff3 isn't in cvs
> > > 
> > > gff files are in cvs now.
> > > 
> > > > 
> > > > Rob
> > > > 
> > > > 
> > > > 
> > > > % perl -I. -w t/FeatureIO.t
> > > > 1..19
> > > > ok 1
> > > > ok 2
> > > > ok 3
> > > > ok 4
> > > > ok 5
> > > > ok 6
> > > > Use of uninitialized value in substitution (s///) at 
> > > > Bio/FeatureIO/gff.pm line 590,  line 10.
> > > > Use of uninitialized value in substitution (s///) at 
> > > > Bio/FeatureIO/gff.pm line 591,  line 10.
> > > > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> > > >  line 10.
> > > > Use of uninitialized value in substitution (s///) at 
> > > > Bio/FeatureIO/gff.pm line 590,  line 10.
> > > > Use of uninitialized value in substitution (s///) at 
> > > > Bio/FeatureIO/gff.pm line 591,  line 10.
> > > > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> > > >  line 10.
> > > > Use of uninitialized value in substitution (s///) at 
> > > > Bio/FeatureIO/gff.pm line 590,  line 10.
> > > > Use of uninitialized value in substitution (s///) at 
> > > > Bio/FeatureIO/gff.pm line 591,  line 10.
> > > > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> > > >  line 10.
> > > > Use of uninitialized value in substitution (s///) at 
> > > > Bio/FeatureIO/gff.pm line 590,  line 10.
> > > > Use of uninitialized value in substitution (s///) at 
> > > > Bio/FeatureIO/gff.pm line 591,  line 10.
> > > > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> > > >  line 10.
> > > > Use of uninitialized value in substitution (s///) at 
> > > > Bio/FeatureIO/gff.pm line 590,  line 10.
> > > > Use of uninitialized value in substitution (s///) at 
> > > > Bio/FeatureIO/gff.pm line 591,  line 10.
> > > > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> > > >  line 10.
> > > > Use of uninitialized value in substitution (s///) at 
> > > > Bio/FeatureIO/gff.pm line 590,  line 10.
> > > > Use of uninitialized value in substitution (s///) at 
> > > > Bio/FeatureIO/gff.pm line 591,  line 10.
> > > > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> > > >  line 10.
> > > > ok 7
> > > > ok 8
> > > > 
> > > > ------------- EXCEPTION  -------------
> > > > MSG: Could not open t/data/hybrid1.gff3: No such file or directory
> > > > STACK Bio::Root::IO::_initialize_io Bio/Root/IO.pm:314
> > > > STACK Bio::FeatureIO::_initialize Bio/FeatureIO.pm:345
> > > > STACK Bio::FeatureIO::gff::_initialize Bio/FeatureIO/gff.pm:92
> > > > STACK Bio::FeatureIO::new Bio/FeatureIO.pm:268
> > > > STACK Bio::FeatureIO::new Bio/FeatureIO.pm:288
> > > > STACK toplevel t/FeatureIO.t:83
> > > > 
> > > > --------------------------------------
> > > > 
> > > 
> > 
> 

From jason.stajich at duke.edu  Wed Jan 19 18:41:00 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed Jan 19 18:37:12 2005
Subject: [Bioperl-l] Re: GFF3
In-Reply-To: 
References: 
Message-ID: <92F4C972-6A73-11D9-B5B1-000393C44276@duke.edu>

test count was off.
1..19
[SNIP]
ok 20
ok 21
ok 22

You can also try
% make test_FeatureIO
to run just a specific test within the test framework to look at things.

Fixed.
On Jan 19, 2005, at 6:30 PM, Scott Cain wrote:

> Weird: when I run the FeatureIO test on the command line (via `perl
> t/FeatureIO.t`), all tests pass.  When I run it as part of 'make test',
> tests 20-22 fail.  Does anyone know why that sort of thing might  
> happen?
>
> thanks,
> Scott
>
> ----------------------------------------------------------------------
> Scott Cain, Ph. D.				 	 cain@cshl.org
> GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
> ----------------------------------------------------------------------
>
>
> On Wed, 19 Jan 2005, Allen Day wrote:
>
>> okay, let me know.  we should probably add some validation tests as  
>> well,
>> right now i'm just making sure the lines can be processed but don't  
>> do any
>> typechecking on the document.
>>
>> Rob, would you mind writing some tests into FeatureIO.t for your
>> validation code?
>>
>> -allen
>>
>>
>> On Wed, 19 Jan 2005, Scott Cain wrote:
>>
>>> I just did a cvs update and the last few tests are failing on MacOSX  
>>> 10.3.
>>> I'll try to sort it out over the next couple of days.
>>>
>>> Scott
>>>
>>> --------------------------------------------------------------------- 
>>> -
>>> Scott Cain, Ph. D.				 	 cain@cshl.org
>>> GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
>>> --------------------------------------------------------------------- 
>>> -
>>>
>>>
>>> On Tue, 18 Jan 2005, Allen Day wrote:
>>>
>>>>> The first series of errors die because the feature ID=AB000114 in
>>>>> t/data/knownGene.gff3 has several Dbxrefs separated with ';'  
>>>>> instead of
>>>>> ','
>>>>
>>>> i'm not getting these errors, are you are in sync with cvs HEAD?
>>>>
>>>>> The second failure is because  hybrid1.gff3 isn't in cvs
>>>>
>>>> gff files are in cvs now.
>>>>
>>>>>
>>>>> Rob
>>>>>
>>>>>
>>>>>
>>>>> % perl -I. -w t/FeatureIO.t
>>>>> 1..19
>>>>> ok 1
>>>>> ok 2
>>>>> ok 3
>>>>> ok 4
>>>>> ok 5
>>>>> ok 6
>>>>> Use of uninitialized value in substitution (s///) at
>>>>> Bio/FeatureIO/gff.pm line 590,  line 10.
>>>>> Use of uninitialized value in substitution (s///) at
>>>>> Bio/FeatureIO/gff.pm line 591,  line 10.
>>>>> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line  
>>>>> 593,
>>>>>  line 10.
>>>>> Use of uninitialized value in substitution (s///) at
>>>>> Bio/FeatureIO/gff.pm line 590,  line 10.
>>>>> Use of uninitialized value in substitution (s///) at
>>>>> Bio/FeatureIO/gff.pm line 591,  line 10.
>>>>> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line  
>>>>> 593,
>>>>>  line 10.
>>>>> Use of uninitialized value in substitution (s///) at
>>>>> Bio/FeatureIO/gff.pm line 590,  line 10.
>>>>> Use of uninitialized value in substitution (s///) at
>>>>> Bio/FeatureIO/gff.pm line 591,  line 10.
>>>>> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line  
>>>>> 593,
>>>>>  line 10.
>>>>> Use of uninitialized value in substitution (s///) at
>>>>> Bio/FeatureIO/gff.pm line 590,  line 10.
>>>>> Use of uninitialized value in substitution (s///) at
>>>>> Bio/FeatureIO/gff.pm line 591,  line 10.
>>>>> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line  
>>>>> 593,
>>>>>  line 10.
>>>>> Use of uninitialized value in substitution (s///) at
>>>>> Bio/FeatureIO/gff.pm line 590,  line 10.
>>>>> Use of uninitialized value in substitution (s///) at
>>>>> Bio/FeatureIO/gff.pm line 591,  line 10.
>>>>> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line  
>>>>> 593,
>>>>>  line 10.
>>>>> Use of uninitialized value in substitution (s///) at
>>>>> Bio/FeatureIO/gff.pm line 590,  line 10.
>>>>> Use of uninitialized value in substitution (s///) at
>>>>> Bio/FeatureIO/gff.pm line 591,  line 10.
>>>>> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line  
>>>>> 593,
>>>>>  line 10.
>>>>> ok 7
>>>>> ok 8
>>>>>
>>>>> ------------- EXCEPTION  -------------
>>>>> MSG: Could not open t/data/hybrid1.gff3: No such file or directory
>>>>> STACK Bio::Root::IO::_initialize_io Bio/Root/IO.pm:314
>>>>> STACK Bio::FeatureIO::_initialize Bio/FeatureIO.pm:345
>>>>> STACK Bio::FeatureIO::gff::_initialize Bio/FeatureIO/gff.pm:92
>>>>> STACK Bio::FeatureIO::new Bio/FeatureIO.pm:268
>>>>> STACK Bio::FeatureIO::new Bio/FeatureIO.pm:288
>>>>> STACK toplevel t/FeatureIO.t:83
>>>>>
>>>>> --------------------------------------
>>>>>
>>>>
>>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From allenday at ucla.edu  Wed Jan 19 20:14:25 2005
From: allenday at ucla.edu (Allen Day)
Date: Wed Jan 19 20:10:58 2005
Subject: [Bioperl-l] Re: GFF3
In-Reply-To: <92F4C972-6A73-11D9-B5B1-000393C44276@duke.edu>
References: 
	<92F4C972-6A73-11D9-B5B1-000393C44276@duke.edu>
Message-ID: 

doh!  my bad.

On Wed, 19 Jan 2005, Jason Stajich wrote:

> test count was off.
> 1..19
> [SNIP]
> ok 20
> ok 21
> ok 22
> 
> You can also try
> % make test_FeatureIO
> to run just a specific test within the test framework to look at things.
> 
> Fixed.
> On Jan 19, 2005, at 6:30 PM, Scott Cain wrote:
> 
> > Weird: when I run the FeatureIO test on the command line (via `perl
> > t/FeatureIO.t`), all tests pass.  When I run it as part of 'make test',
> > tests 20-22 fail.  Does anyone know why that sort of thing might  
> > happen?
> >
> > thanks,
> > Scott
> >
> > ----------------------------------------------------------------------
> > Scott Cain, Ph. D.				 	 cain@cshl.org
> > GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
> > ----------------------------------------------------------------------
> >
> >
> > On Wed, 19 Jan 2005, Allen Day wrote:
> >
> >> okay, let me know.  we should probably add some validation tests as  
> >> well,
> >> right now i'm just making sure the lines can be processed but don't  
> >> do any
> >> typechecking on the document.
> >>
> >> Rob, would you mind writing some tests into FeatureIO.t for your
> >> validation code?
> >>
> >> -allen
> >>
> >>
> >> On Wed, 19 Jan 2005, Scott Cain wrote:
> >>
> >>> I just did a cvs update and the last few tests are failing on MacOSX  
> >>> 10.3.
> >>> I'll try to sort it out over the next couple of days.
> >>>
> >>> Scott
> >>>
> >>> --------------------------------------------------------------------- 
> >>> -
> >>> Scott Cain, Ph. D.				 	 cain@cshl.org
> >>> GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
> >>> --------------------------------------------------------------------- 
> >>> -
> >>>
> >>>
> >>> On Tue, 18 Jan 2005, Allen Day wrote:
> >>>
> >>>>> The first series of errors die because the feature ID=AB000114 in
> >>>>> t/data/knownGene.gff3 has several Dbxrefs separated with ';'  
> >>>>> instead of
> >>>>> ','
> >>>>
> >>>> i'm not getting these errors, are you are in sync with cvs HEAD?
> >>>>
> >>>>> The second failure is because  hybrid1.gff3 isn't in cvs
> >>>>
> >>>> gff files are in cvs now.
> >>>>
> >>>>>
> >>>>> Rob
> >>>>>
> >>>>>
> >>>>>
> >>>>> % perl -I. -w t/FeatureIO.t
> >>>>> 1..19
> >>>>> ok 1
> >>>>> ok 2
> >>>>> ok 3
> >>>>> ok 4
> >>>>> ok 5
> >>>>> ok 6
> >>>>> Use of uninitialized value in substitution (s///) at
> >>>>> Bio/FeatureIO/gff.pm line 590,  line 10.
> >>>>> Use of uninitialized value in substitution (s///) at
> >>>>> Bio/FeatureIO/gff.pm line 591,  line 10.
> >>>>> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line  
> >>>>> 593,
> >>>>>  line 10.
> >>>>> Use of uninitialized value in substitution (s///) at
> >>>>> Bio/FeatureIO/gff.pm line 590,  line 10.
> >>>>> Use of uninitialized value in substitution (s///) at
> >>>>> Bio/FeatureIO/gff.pm line 591,  line 10.
> >>>>> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line  
> >>>>> 593,
> >>>>>  line 10.
> >>>>> Use of uninitialized value in substitution (s///) at
> >>>>> Bio/FeatureIO/gff.pm line 590,  line 10.
> >>>>> Use of uninitialized value in substitution (s///) at
> >>>>> Bio/FeatureIO/gff.pm line 591,  line 10.
> >>>>> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line  
> >>>>> 593,
> >>>>>  line 10.
> >>>>> Use of uninitialized value in substitution (s///) at
> >>>>> Bio/FeatureIO/gff.pm line 590,  line 10.
> >>>>> Use of uninitialized value in substitution (s///) at
> >>>>> Bio/FeatureIO/gff.pm line 591,  line 10.
> >>>>> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line  
> >>>>> 593,
> >>>>>  line 10.
> >>>>> Use of uninitialized value in substitution (s///) at
> >>>>> Bio/FeatureIO/gff.pm line 590,  line 10.
> >>>>> Use of uninitialized value in substitution (s///) at
> >>>>> Bio/FeatureIO/gff.pm line 591,  line 10.
> >>>>> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line  
> >>>>> 593,
> >>>>>  line 10.
> >>>>> Use of uninitialized value in substitution (s///) at
> >>>>> Bio/FeatureIO/gff.pm line 590,  line 10.
> >>>>> Use of uninitialized value in substitution (s///) at
> >>>>> Bio/FeatureIO/gff.pm line 591,  line 10.
> >>>>> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line  
> >>>>> 593,
> >>>>>  line 10.
> >>>>> ok 7
> >>>>> ok 8
> >>>>>
> >>>>> ------------- EXCEPTION  -------------
> >>>>> MSG: Could not open t/data/hybrid1.gff3: No such file or directory
> >>>>> STACK Bio::Root::IO::_initialize_io Bio/Root/IO.pm:314
> >>>>> STACK Bio::FeatureIO::_initialize Bio/FeatureIO.pm:345
> >>>>> STACK Bio::FeatureIO::gff::_initialize Bio/FeatureIO/gff.pm:92
> >>>>> STACK Bio::FeatureIO::new Bio/FeatureIO.pm:268
> >>>>> STACK Bio::FeatureIO::new Bio/FeatureIO.pm:288
> >>>>> STACK toplevel t/FeatureIO.t:83
> >>>>>
> >>>>> --------------------------------------
> >>>>>
> >>>>
> >>>
> >>
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
> 
From allenday at ucla.edu  Wed Jan 19 20:20:12 2005
From: allenday at ucla.edu (Allen Day)
Date: Wed Jan 19 20:16:17 2005
Subject: [Bioperl-l] Finding Alignment overlaps
In-Reply-To: <81da19f3050113225018d1c01a@mail.gmail.com>
References: <81da19f3050113225018d1c01a@mail.gmail.com>
Message-ID: 

yeah, look in Bio::Range.  specifically:

Geometrical methods
       These methods do things to the geometry of ranges, and return triplets
       (start, end, strand) from which new ranges could be built.

       intersection

         Title    : intersection
         Usage    : ($start, $stop, $strand) = $r1->intersection($r2)
         Function : gives the range that is contained by both ranges
         Args     : a range to compare this one to
         Returns  : nothing if they do not overlap, or the range that they do overlap
         Inherited: Bio::RangeI::intersection


       union

         Title    : union
         Usage    : ($start, $stop, $strand) = $r1->union($r2);
                  : ($start, $stop, $strand) = Bio::Range->union(@ranges);
         Function : finds the minimal range that contains all of the ranges
         Args     : a range or list of ranges
         Returns  : the range containing all of the ranges
         Inherited: Bio::RangeI::union

-allen


On Fri, 14 Jan 2005, zayed albertyn wrote:

> Dear Bioperl Community
> 
> I have output from an alignment program that produces coordinates with
> reference to the query sequence e.g.
> 
> 3665384,3665702-1770163,1770480
> 3665130,3665474-3695657,3696000
> 3665115,3665357-1770508,1770749
> 
> Each line represent ,-,
> 
> I know how to add each line as a sequence feature using
> Bio::Seqfeature::Generic. Is there a bioperl class or associated
> method that can be used for determing possible overlaps in these
> alignments?
> Eventually I would like to find all overlaps and merge them if possible.
> 
> Thanks for the help,
> Zayed
> 
> 
> 
> 
> 
From yguo at vbi.vt.edu  Wed Jan 19 21:12:47 2005
From: yguo at vbi.vt.edu (yguo@vbi.vt.edu)
Date: Wed Jan 19 21:10:29 2005
Subject: [Bioperl-l] Automatic retrieve pdf file from the 
	publisherwebsite.
Message-ID: <1109.151.199.12.38.1106187167.squirrel@webmail.vbi.vt.edu>

Ok, I will first add more comments and instructions to the module and post
the code to this list. I can make this done before the weekend.

I donot have the publisher list for successful retrieval. But it is a good
idea to make one.


Yongjian
at
Virginia Bioinformatics Institute

> please post the code here.  i've been meaning to add that functionality
> into Bio::DB::Biblio::eutils.
>
> do you have a list of which publishers are usable in this way?
>
> -allen
>
> On Wed, 19 Jan 2005, Peter Robinson wrote:
>
>> That sounds extremely interesting and I would appreciate getting a copy
>> for testing.
>> -peter
>>
>>
>> On Wed, 2005-01-19 at 22:16, yguo@vbi.vt.edu wrote:
>> > Hi,
>> >
>> > While working for BRC project in VBI, I wrote a perl module for
>> retrieving
>> > full text pdf files from the publisher website using the information
>> of
>> > pubmed abstract page. If anyone wants to use it, you can contact with
>> me.
>> > I will see if it is worthwhile to put it in Bioperl.
>> >
>> > Yongjian
>> > at
>> > Virginia Bioinformatics Institute.
>> >
>> >
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l@portal.open-bio.org
>> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

From sdavis2 at mail.nih.gov  Wed Jan 19 22:06:28 2005
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed Jan 19 22:02:56 2005
Subject: [Bioperl-l] Automatic retrieve pdf file from the publisherwebsite.
References: <1109.151.199.12.38.1106187167.squirrel@webmail.vbi.vt.edu>
Message-ID: <001c01c4fe9d$08c64a70$7d75f345@WATSON>

----- Original Message ----- 
From: 
To: 
Sent: Wednesday, January 19, 2005 9:12 PM
Subject: Re: [Bioperl-l] Automatic retrieve pdf file from the 
publisherwebsite.


> Ok, I will first add more comments and instructions to the module and post
> the code to this list. I can make this done before the weekend.
>
> I donot have the publisher list for successful retrieval. But it is a good
> idea to make one.

Wouldn't such a publisher list depend somewhat on institutional 
subscriptions--just curious?

Sean

> Yongjian
> at
> Virginia Bioinformatics Institute


From allenday at ucla.edu  Thu Jan 20 00:23:46 2005
From: allenday at ucla.edu (Allen Day)
Date: Thu Jan 20 00:20:07 2005
Subject: [Bioperl-l] Automatic retrieve pdf file from the publisherwebsite.
In-Reply-To: <001c01c4fe9d$08c64a70$7d75f345@WATSON>
References: <1109.151.199.12.38.1106187167.squirrel@webmail.vbi.vt.edu>
	<001c01c4fe9d$08c64a70$7d75f345@WATSON>
Message-ID: 

On Wed, 19 Jan 2005, Sean Davis wrote:

> ----- Original Message ----- 
> From: 
> To: 
> Sent: Wednesday, January 19, 2005 9:12 PM
> Subject: Re: [Bioperl-l] Automatic retrieve pdf file from the 
> publisherwebsite.
> 
> 
> > Ok, I will first add more comments and instructions to the module and post
> > the code to this list. I can make this done before the weekend.
> >
> > I donot have the publisher list for successful retrieval. But it is a good
> > idea to make one.
> 
> Wouldn't such a publisher list depend somewhat on institutional 
> subscriptions--just curious?

sure, there are two separate issues:

#1 is the ip/gateway/proxy allowed to access the host with the resource.
#2 if so, is the module able to find the resource on the host.

i was asking about #2.  of course #1 depends on where you are.  this could
make it difficult to do extensive unit tests.

-allen

> Sean
> 
> > Yongjian
> > at
> > Virginia Bioinformatics Institute
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
From dlondon at ebi.ac.uk  Thu Jan 20 12:58:59 2005
From: dlondon at ebi.ac.uk (Darin London)
Date: Thu Jan 20 13:11:56 2005
Subject: [Bioperl-l] BOSC 2005
Message-ID: <20050120175859.GA7254@parrot.ebi.ac.uk>

 {Please pass the word!}
 
 MEETING ANNOUNCEMENT & CALL FOR SPEAKERS

 The 6th annual Bioinformatics Open Source Conference (BOSC'2005) is organized by the
 not-for-profit Open Bioinformatics Foundation. The meeting will take place
 June 23-24, 2005 in Detroit, Michigan, USA, and is one of several Special Interest
 Group (SIG) meetings occurring in conjunction with the 13th International Conference
 on Intelligent Systems for Molecular Biology.

 see http://www.iscb.org/ismb2005 for more information.

 Because of the power of many Open Source bioinformatics packages in
 use by the Research Community today, it is not too presumptuous to say 
 that the work of the Open Source Bioinformatics Community represents 
 the cutting edge of Bioinformatics in general. This has been repeatedly 
 demonstrated by the quality of presentations at previous BOSC conferences.
 This year, at BOSC 2006, we want to continue this tradition of excellence, 
 while presenting this message to a wider part of the Research Community.  
 Please, pass this message on to anyone you know that is interested in
 Bioinformatics software. 


 BOSC PROGRAM & CONTACT INFO
 
 * Web: http://www.open-bio.org/bosc2005/
 * Email: bosc@open-bio.org
 
 FEES

  TO BE ANNOUNCED. Watch the bosc website for more information.
 
 
 SPEAKERS & ABSTRACTS WANTED
 
 The program committee is currently seeking abstracts for talks at BOSC 
 2005. BOSC is a great opportunity for you to tell the community about 
 your use, development, or philosophy of open source software development 
 in bioinformatics. The committee will select several submitted abstracts 
 for 25-minute talks and others for shorter "lightning" talks. Accepted 
 abstracts will be published on the BOSC web site.
 
 If you are interested in speaking at BOSC 2005, 
 please send us before April 26, 2005:
 
 * an abstract (no more than a few paragraphs)
 * a URL for the project page, if applicable
 * information about the open source license used for your software or 
   your release plans.

 Abstracts will be accepted for submission until April 26, 2005.
 Abstracts chosen for presentation will be announced May 12, 2005 
 (before the ISMB Early Registration Deadline).

 LIGHTNING-TALK SPEAKERS WANTED!
 
 The program committee is currently seeking speakers for the lightning 
 talks at BOSC 2005. Lightning talks are quick - only five minutes 
 long - and a great opportunity for you to give people a quick 
 summary of your open source project, code, idea, or vision of the future.

 If you are interested in giving a lightning talk at BOSC 2005, 
 please send us:

 * a brief title and summary (one or two lines)
 * a URL for the project page, if applicable
 * information about the open source license used for your software or 
   your release plans.

 We will accept entries on-line until BOSC starts, but
 space for demos and lightning talks is limited.
SOFTWARE DEMONSTRATIONS WANTED! If you are involved in the development of Open Source Bioinformatics Software, you are invited to provide a short demonstration to attendees of BOSC 2005. If you are interested in giving a software demonstration at BOSC 2005, please send us: * a brief title and summary (one or two lines) * a URL for the project page, if applicable * Internet connectivity requirements (e.g. website Application served on the world wide web, or web based client application). We will accept entries on-line until the BOSC starts, but space for demos and lightning talks is limited. ** Because the mission of the OBF is to promote Open Source software, we will favor submissions for projects that apply a recognized Open Source License, or adhere to the general Open Source Philosophy. See the following websites for further details: href="http://www.opensource.org/licenses/ href="http://www.opensource.org/docs/definition.php SESSION CHAIRS WANTED If you would like to be involved BOSC 2005, we invite you to chair a session. This will not require much of your time. You will be given a schedule of presenters during your session. You simply introduce each speaker, and manage the time of their presentation (25 minutes for full presentations, 5-10 minutes for lightning talks/demos, depending on the number of entries). If you are interested in chairing a session, please send us your name and affiliation (if applicable). -- cheers, Darin London dlondon@ebi.ac.uk European Bioinformatics Institute, +44 (0)1223 49 2566 Wellcome Trust Genome Campus, Hinxton +44 (0)1223 49 4468 (fax) Cambridgeshire CB10 1SD, UK -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050120/9bb1da79/attachment.bin From gyang at plantbio.uga.edu Thu Jan 20 14:30:29 2005 From: gyang at plantbio.uga.edu (Guojun Yang) Date: Thu Jan 20 14:27:12 2005 Subject: [Bioperl-l] standaloneblast large seq retrieving In-Reply-To: Message-ID: <20050120143029.2ce67836@dogwood.plantbio.uga.edu> Hi,all, I was trying to use the following sub to get seq after a standaloneblast. It worked with DB with short entries (~200kb), but it failed to work with a DB with much longer entries (up to ~30 Mb an entry). Can anybody give me a hint? sub getseq { my $name=$_[0]; my $file_name = $_[1]; my $inx=Bio::Index::Fasta->new (-filename => $file_name.".idx", -write_flag => 1); $inx->id_parser(\&get_id); $inx->make_index($file_name); $seq = $inx->fetch($name); return $seq; } Thanks, Yang From talcon at iastate.edu Thu Jan 20 18:34:06 2005 From: talcon at iastate.edu (Tim Alcon) Date: Thu Jan 20 18:30:55 2005 Subject: [Bioperl-l] accessing GenBank In-Reply-To: References: Message-ID: <41F03FEE.6080907@iastate.edu> Thanks Barry and Nathan. I installed version 1.4, and the remote GenBank access now works. Tim Nathan Haigh wrote: >You should double check the versions you have installed on both systems, it may well be that one is out-of-date with respect to >connecting to genbank and the other is not. If you do indeed have a version of bioperl <1.4 installed on your windows machine, >follow my instructions to install 1.4 (1.5 should be available via PPM shortly after it's official release - some time soon!) > >Nathan > > > >>-----Original Message----- >>From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Tim Alcon >>Sent: 18 January 2005 22:20 >>To: bioperl-l@portal.open-bio.org >>Subject: [Bioperl-l] accessing GenBank >> >>I seem unable to access GenBank. When running bptutorial.exe, it seems >>like all the other examples run fine except that one. Anyone know why >>that would be? I'm using ActivePerl on Windows XP. I have whichever >>version of bioperl is the current default using ppm (it's at least >>1.0). When I run the exact same code from my campus Unix account, it >>works fine. >> >>Tim >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l@portal.open-bio.org >>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>--- >>avast! Antivirus: Inbound message clean. >>Virus Database (VPS): 0503-0, 18/01/2005 >>Tested on: 19/01/2005 08:41:49 >>avast! is copyright (c) 2000-2003 ALWIL Software. >>http://www.avast.com >> >> >> >> > >--- >avast! Antivirus: Outbound message clean. >Virus Database (VPS): 0503-0, 18/01/2005 >Tested on: 19/01/2005 09:05:03 >avast! is copyright (c) 2000-2003 ALWIL Software. >http://www.avast.com > > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > From talcon at iastate.edu Thu Jan 20 18:35:55 2005 From: talcon at iastate.edu (Tim Alcon) Date: Thu Jan 20 18:33:53 2005 Subject: [Bioperl-l] Installing Bioperl using PPM In-Reply-To: References: Message-ID: <41F0405B.2050301@iastate.edu> Typing "install 1.4" didn't work, but typing "install Bioperl-1.4" did. Thanks. Tim Nathan Haigh wrote: >Please read this even if you think you know how to install modules via PPM! > >This is just a note on what to do to install the latest version of Bioperl (or any other module) via PPM: >Because of inconsistencies (see ActiveStates comments on this at the bottom) with the way PPM determines modules names/versions etc >it is NOT WISE to install modules by going: > "install bioperl" >OR > "upgrade bioperl" > >You are very likely NOT to install the most recent version of a particular module by doing this! Instead you should do the >following: > "search bioperl" >This gives a numbered list of the available modules in the repository's searched by your PPM (you can add additional repositories in >addition to the defaults given during installation - and this is advised). Chose the number of the correct module to install from >the list and do: > "install " >Where is the number of the module you wish to install. This way you will ensure you install the correct module/version YOU >want not the arbitrary module that PPM seems to want to install most of the time! > >As soon as the official Bioperl 1.5 is released, I'll make the ppd and tar.gz files so it can be installed via PPM. > >Nathan > >ActiveStates comment on PPM's inconsistencies for determining module name/versions: >"Sorry for the confusion, ppm3 is kind of inconsistent in spots." > > > > >>-----Original Message----- >>From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Tim Alcon >>Sent: 18 January 2005 22:20 >>To: bioperl-l@portal.open-bio.org >>Subject: [Bioperl-l] accessing GenBank >> >>I seem unable to access GenBank. When running bptutorial.exe, it seems >>like all the other examples run fine except that one. Anyone know why >>that would be? I'm using ActivePerl on Windows XP. I have whichever >>version of bioperl is the current default using ppm (it's at least >>1.0). When I run the exact same code from my campus Unix account, it >>works fine. >> >>Tim >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l@portal.open-bio.org >>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>--- >>avast! Antivirus: Inbound message clean. >>Virus Database (VPS): 0503-0, 18/01/2005 >>Tested on: 19/01/2005 08:41:49 >>avast! is copyright (c) 2000-2003 ALWIL Software. >>http://www.avast.com >> >> >> >> > >--- >avast! Antivirus: Outbound message clean. >Virus Database (VPS): 0503-0, 18/01/2005 >Tested on: 19/01/2005 09:00:08 >avast! is copyright (c) 2000-2003 ALWIL Software. >http://www.avast.com > > > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > From peter.robinson at charite.de Fri Jan 21 08:30:14 2005 From: peter.robinson at charite.de (Robinson, Peter) Date: Fri Jan 21 08:26:53 2005 Subject: [Bioperl-l] dssp script Message-ID: <5F7CE35370B6CF429AA3CA960ECC278001638D65@EXCHANGE2.charite.de> Dear BioPerlers, I am writing a script to use the BioPerl DSSP module to print out a list of phi and psi angles for all applicable residues of all chains. Although the results are correct, I get the following error message at the end of each chain: Argument "" isn't numeric in numeric eq (==) at /usr/local/share/perl/5.8.4/Bio/Structure/SecStr/DSSP/Res.pm line 1168. and I am not quite sure where it is coming from. Perhaps I am using the wrong part of the API, but I am trying to get a list of all residues for each chain as follows: foreach my $ch (@chains) { my $ss_elements_pts = $dssp->secBounds($ch); print "Chain $ch:\n"; my $pos = 0; my $max = 0; foreach my $stretch (@{$ss_elements_pts}) { my $start = $stretch->[0]; my $end = $stretch->[1]; if ($end =~ m/(\d+)/) { $end = $1; } if ($end > $max) { $max = $end; } } ## END is now the last residue in this chain for my $res (1..$max) { my $residueID = $res . ":" . $ch; my ($phi,$psi,$SS,$SSsum,$AA); eval { $phi = $dssp->resPhi($residueID);}; etc. The full script is appended to the bottom of this mail. I also noticed what might be a minor bug in the module DSSP/Res.pm; when I use dsspcmbi to analyze a PDB file, it produces a results file with an empty last line. This causes a crash: Use of uninitialized value in chomp at /usr/local/share/perl/5.8.4/Bio/Structure/SecStr/DSSP/Res.pm line 1284, line 955. If I manually remove this last empty line, there was no error. By adding the following line at Res.pm l.1284, you can fix the problem: while ( chomp( $cur = <$file> ) ) { next if ($cur =~ m/^\s*$/); ********************************************* $res_num = substr( $cur, 0, 5 ); $res_num =~ s/\s//g; $self->{ 'Res' }->[ $res_num ] = &_parseResLine( $cur ); } } Thanks in adavance for any tips! Peter Peter N. Robinson, M.D. Institute of Medical Genetics Charit? University Hospital Augustenburger Platz 1 13353 Berlin Germany ++49-30-450 569124 peter.robinson@charite.de http://www.charite.de/ch/medgen/robinson Beware of bugs in the above code; I have only proved it correct, not tried it. -Donald Knuth, computer scientist (1938- ) ######################## #!/usr/bin/perl -w use IO::File; use Bio::Structure::SecStr::DSSP::Res; use Data::Dumper; =pod parseDSSP.pl Script to parse the output of DSSP using the BioPerl module Bio::Structure::SecStr::DSSP::Res. To use it, process a PDB file with dssp or dsspcmbi, and pass the resulting file to this script. For more information on dssp and BioPerl see the module documentation at http://bioperl.org @email peter.robinson@charite.de 21 January, 2005 =cut my $file = "pdb43ca.dssp"; my $dssp = new Bio::Structure::SecStr::DSSP::Res('-file'=> "$file"); my $pdbID = $dssp->pdbID(); my $auth = $dssp->pdbAuthor(); my $cmpd = $dssp->pdbCompound(); my $pdb_date = $dssp->pdbDate(); my $header = $dssp->pdbHeader(); my $pdbSource = $dssp->pdbSource(); print "PDB entry $pdbID \n\tauthor:\t$auth", "\n\tCompound:\t$cmpd", "\n\tDate:\t$pdb_date", "\n\tHeader:\t$header", "\n\tsource:\t$pdbSource\n\n"; my $totalRes = $dssp->numResidues(); print "Total residue count (all chains):$totalRes\n"; my $surArea= $dssp->totSurfArea(); print "Total accessible surface area:\t$surArea (square Ang)\n"; my $chainRef = $dssp->chains(); my @chains = sort @{$chainRef}; print "Chain[s]:\n"; foreach my $ch (@chains) { print "\t$ch"; } print "\n"; my $hb = $dssp->hBonds(); print "H BONDS.\n"; print "TYPE O(I)-->H-N(J): $hb->[0]\n", "IN PARALLEL BRIDGES: $hb->[1]\n", "IN ANTIPARALLEL BRIDGES $hb->[2]\n", "TYPE O(I)-->H-N(I-5) $hb->[3]\n", "TYPE O(I)-->H-N(I-4) $hb->[4]\n", "TYPE O(I)-->H-N(I-3) $hb->[5]\n", "TYPE O(I)-->H-N(I-2) $hb->[6]\n", "TYPE O(I)-->H-N(I-1) $hb->[7]\n", "TYPE O(I)-->H-N(I+0) $hb->[8]\n", "TYPE O(I)-->H-N(I+1) $hb->[9]\n", "TYPE O(I)-->H-N(I+2) $hb->[10]\n", "TYPE O(I)-->H-N(I+3) $hb->[11]\n", "TYPE O(I)-->H-N(I+4) $hb->[12]\n", "TYPE O(I)-->H-N(I+5) $hb->[13]\n", "\n"; foreach my $ch (@chains) { my $ss_elements_pts = $dssp->secBounds($ch); print "Chain $ch:\n"; my $pos = 0; my $max = 0; foreach my $stretch (@{$ss_elements_pts}) { my $start = $stretch->[0]; my $end = $stretch->[1]; if ($end =~ m/(\d+)/) { $end = $1; } if ($end > $max) { $max = $end; } } ## END is now the last residue in this chain for my $res (1..$max) { my $residueID = $res . ":" . $ch; my ($phi,$psi,$SS,$SSsum,$AA); eval { $phi = $dssp->resPhi($residueID);}; eval { $psi = $dssp->resPsi($residueID);}; eval { $SS = $dssp->resSecStr($residueID);}; eval { $SSsum = $dssp->resSecStrSum($residueID);}; $AA = $dssp->resAA($residueID); $phi = $phi || "n/a"; $psi = $psi || "n/a"; $SS = $SS || "-"; my $SSclass; if ($SSsum eq "H") { $SSclass = "helix"; } elsif ($SSsum eq "T") { $SSclass = "turn"; } elsif ($SSsum eq "B") { $SSclass = "beta"; } else { $SSclass = $SSsum; } print "$residueID) [$AA] phi:$phi psi:$psi SecStruct: $SS ($SSclass) \n"; } } From cjfields at uiuc.edu Fri Jan 21 09:44:39 2005 From: cjfields at uiuc.edu (Chris Fields) Date: Fri Jan 21 09:43:02 2005 Subject: [Bioperl-l] Installing Bioperl using PPM In-Reply-To: <41F0405B.2050301@iastate.edu> References: <41F0405B.2050301@iastate.edu> Message-ID: <6.1.1.1.2.20050121084352.01a67ec8@express.cites.uiuc.edu> I think he means that you should do the following: 1) use "search bioperl" 2) pick the number of the correct bioperl from the list (NOT the version number) and type "install #" Here's what it looks like get when I use PPM3 C:\Documents and Settings\Chris Fields>ppm PPM - Programmer's Package Manager version 3.1. Copyright (c) 2001 ActiveState SRL. All Rights Reserved. Entering interactive shell. Using Term::ReadLine::Stub as readline library. Type 'help' to get started. ppm> rep Repositories: [1] bioperl [ ] ActiveState Package Repository [ ] ActiveState PPM2 Repository [ ] gmod [ ] kobes [ ] local ppm> search bioperl Searching in Active Repositories 1. Bioperl-1.2 [1.2] Bioperl 1.2 PPM3 Archive 2. Bioperl-1.2.1 [1.2.1] Bioperl 1.2.1 PPM3 Archive 3. Bioperl-1.2.3 [1.2.3] Bioperl 1.2.3 PPM3 Archive 4. Bioperl-1.4 [1.4] Bioperl 1.4 PPM3 Archive ppm> install 4 .... Chris At 05:35 PM 1/20/2005, Tim Alcon wrote: >Typing "install 1.4" didn't work, but typing "install Bioperl-1.4" did. >Thanks. > >Tim > > > >Nathan Haigh wrote: > >>Please read this even if you think you know how to install modules via PPM! >> >>This is just a note on what to do to install the latest version of >>Bioperl (or any other module) via PPM: >>Because of inconsistencies (see ActiveStates comments on this at the >>bottom) with the way PPM determines modules names/versions etc >>it is NOT WISE to install modules by going: >> "install bioperl" >>OR >> "upgrade bioperl" >> >>You are very likely NOT to install the most recent version of a >>particular module by doing this! Instead you should do the >>following: >> "search bioperl" >>This gives a numbered list of the available modules in the repository's >>searched by your PPM (you can add additional repositories in >>addition to the defaults given during installation - and this is >>advised). Chose the number of the correct module to install from >>the list and do: >> "install " >>Where is the number of the module you wish to install. This way >>you will ensure you install the correct module/version YOU >>want not the arbitrary module that PPM seems to want to install most of >>the time! >> >>As soon as the official Bioperl 1.5 is released, I'll make the ppd and >>tar.gz files so it can be installed via PPM. >> >>Nathan >> >>ActiveStates comment on PPM's inconsistencies for determining module >>name/versions: >>"Sorry for the confusion, ppm3 is kind of inconsistent in spots." >> >> >> >> >>>-----Original Message----- >>>From: bioperl-l-bounces@portal.open-bio.org >>>[mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Tim Alcon >>>Sent: 18 January 2005 22:20 >>>To: bioperl-l@portal.open-bio.org >>>Subject: [Bioperl-l] accessing GenBank >>> >>>I seem unable to access GenBank. When running bptutorial.exe, it seems >>>like all the other examples run fine except that one. Anyone know why >>>that would be? I'm using ActivePerl on Windows XP. I have whichever >>>version of bioperl is the current default using ppm (it's at least >>>1.0). When I run the exact same code from my campus Unix account, it >>>works fine. >>> >>>Tim >>> >>>_______________________________________________ >>>Bioperl-l mailing list >>>Bioperl-l@portal.open-bio.org >>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>--- >>>avast! Antivirus: Inbound message clean. >>>Virus Database (VPS): 0503-0, 18/01/2005 >>>Tested on: 19/01/2005 08:41:49 >>>avast! is copyright (c) 2000-2003 ALWIL Software. >>>http://www.avast.com >>> >>> >>> >> >>--- >>avast! Antivirus: Outbound message clean. >>Virus Database (VPS): 0503-0, 18/01/2005 >>Tested on: 19/01/2005 09:00:08 >>avast! is copyright (c) 2000-2003 ALWIL Software. >>http://www.avast.com >> >> >> >> >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l@portal.open-bio.org >>http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l __________________________________ Chris Fields - Postdoctoral Researcher Lab of Dr. Robert Switzer Address: University of Illinois at Urbana-Champaign Dept. of Biochemistry - 323 RAL 600 S. Mathews Ave. Urbana, IL 61801 Phone : (217) 333-7098 Fax : (217) 244-5858 From raoul.bonnal at itb.cnr.it Fri Jan 21 10:11:11 2005 From: raoul.bonnal at itb.cnr.it (Raoul Jean Pierre Bonnal) Date: Fri Jan 21 10:08:01 2005 Subject: [Bioperl-l] bioperl-1.5.0 RC2 In-Reply-To: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu> References: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu> Message-ID: <1106320271.7583.10.camel@localhost> This is perl, v5.8.4 built for i386-linux-thread-multi Debian Unstable/Amd Athlon XP PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/AAChange...................ok t/AAReverseMutate............ok t/AlignIO....................ok t/AlignStats.................ok t/AlignUtil..................ok t/Allele.....................ok t/Alphabet...................ok t/Annotation.................ok t/AnnotationAdaptor..........ok t/Assembly...................ok t/Biblio.....................ok t/Biblio_biofetch............ok t/Biblio_eutils..............ok t/BiblioReferences...........ok t/BioDBGFF...................ok t/BioFetch_DB................ok t/BioGraphics................ok t/BlastIndex.................ok t/BPbl2seq...................ok t/BPlite.....................ok t/BPpsilite..................ok t/Chain......................ok t/cigarstring................ok t/ClusterIO..................ok t/Coalescent.................ok t/CodonTable.................ok t/consed.....................ok t/CoordinateGraph............ok t/CoordinateMapper...........ok t/Correlate..................ok t/CytoMap....................ok t/DB.........................ok t/DBCUTG.....................ok 22/24 skipped: tests which require remote servers - set env variable BIOPERLDEBUG to test t/DBFasta....................ok t/DNAMutation................ok t/Domcut.....................ok 22/25 skipped: tests which require remote servers - set env variable BIOPERLDEBUG to test t/ECnumber...................ok t/ELM........................ok t/EMBL_DB....................ok t/EMBOSS_Tools...............ok t/EncodedSeq.................ok t/ePCR.......................ok t/ESEfinder..................error is 0 t/ESEfinder..................ok 10/12 skipped: tests which require remote servers - set env variable BIOPERLDEBUG to test t/est2genome.................ok t/Exception..................ok t/Exonerate..................ok t/flat.......................ok t/FootPrinter................ok t/game.......................ok t/GDB........................ok t/GeneCoordinateMapper.......ok t/Geneid.....................ok t/Genewise...................ok 2/51 skipped: t/Genomewise.................ok t/Genpred....................ok t/GFF........................ok t/GOR4.......................ok 10/13 skipped: tests which require remote servers - set env variable BIOPERLDEBUG to test t/GOterm.....................ok t/GuessSeqFormat.............ok t/hmmer......................ok t/HNN........................ok 10/13 skipped: tests which require remote servers - set env variable BIOPERLDEBUG to test t/HtSNP......................ok t/Index......................ok t/InstanceSite...............ok t/InterProParser.............ok t/IUPAC......................ok t/largefasta.................ok t/LargeLocatableSeq..........ok t/largepseq..................ok t/LinkageMap.................ok t/LiveSeq....................ok t/LocatableSeq...............ok t/Location...................ok t/LocationFactory............ok t/LocusLink..................ok t/lucy.......................ok t/Map........................ok t/MapIO......................ok t/Matrix.....................ok t/Measure....................ok t/MeSH.......................ok t/MetaSeq....................ok t/MicrosatelliteMarker.......ok t/MiniMIMentry...............ok t/MitoProt...................ok 5/8 skipped: tests which require remote servers - set env variable BIOPERLDEBUG to test t/Molphy.....................ok t/multiple_fasta.............ok t/Mutation...................ok t/Mutator....................ok t/NetPhos....................ok t/Node.......................ok t/OddCodes...................ok t/OMIMentry..................ok t/OMIMentryAllelicVariant....ok t/OMIMparser.................ok t/Ontology...................ok t/OntologyEngine.............ok t/OntologyStore..............ok t/PAML.......................ok t/Perl.......................ok t/phd........................ok t/Phenotype..................ok t/PhylipDist.................ok t/pICalculator...............ok t/Pictogram..................SVG not installed, skipping tests at t/Pictogram.t line 29. t/Pictogram..................ok t/PopGen.....................ok t/PopGenSims.................ok t/primaryqual................ok t/PrimarySeq.................ok t/primedseq..................ok t/Primer.....................ok t/primer3....................ok t/Promoterwise...............ok t/ProtDist...................ok t/protgraph..................Class::AutoClass or Clone not installed. This means that the module is not usable. Skipping tests at t/protgraph.t line 23. t/protgraph..................ok t/ProtMatrix.................ok t/ProtPsm....................ok t/psm........................ok t/QRNA.......................ok t/qual.......................ok t/RandDistFunctions..........ok t/RandomTreeFactory..........ok t/Range......................ok t/RangeI.....................ok t/RefSeq.....................ok 10/13 skipped: tests which require remote servers - set env variable BIOPERLDEBUG to test t/Registry...................ok t/Relationship...............ok t/RelationshipType...........ok t/RemoteBlast................ok 4/6 skipped: to avoid timeout t/RepeatMasker...............ok t/RestrictionAnalysis........ok t/RestrictionEnzyme..........ok t/RestrictionIO..............ok t/RNAChange..................ok t/RootI......................ok t/RootIO.....................ok t/RootStorable...............ok t/Scansite...................ok t/scf........................ok t/SearchDist.................ok t/SearchIO...................ok t/Seq........................ok t/SeqAnalysisParser..........ok t/SeqBuilder.................ok t/SeqDiff....................ok t/SeqFeatCollection..........ok t/SeqFeature.................ok t/seqfeaturePrimer...........ok t/SeqIO......................XML::DOM::XPath not found - skipping interpro tests XML::SAX::Base or XML::SAX or XML::SAX::Writer not found - skipping BSML_SAX tests t/SeqIO......................ok t/SeqPattern.................ok t/seqread_fail...............ok t/SeqStats...................ok t/SequenceFamily.............ok t/sequencetrace..............ok t/SeqUtils...................ok t/seqwithquality.............ok t/SeqWords...................ok t/Sigcleave..................ok t/Sim4.......................ok t/SimilarityPair.............ok t/SimpleAlign................ok t/simpleGOparser.............ok 88/101Use of uninitialized value in hash element at /home/febo/DownLoad/bioperl-1.5.0-RC2/blib/lib/Bio/Ontology/OntologyStore.pm line 263, line 11. t/simpleGOparser.............ok t/singlet....................ok t/sirna......................ok t/SiteMatrix.................ok t/SNP........................ok t/Sopma......................ok 12/15 skipped: tests which require remote servers - set env variable BIOPERLDEBUG to test t/Species....................ok t/splicedseq.................ok t/StandAloneBlast............ok t/StructIO...................ok t/Structure..................ok t/Swiss......................ok t/Symbol.....................ok t/TagHaplotype...............ok t/Taxonomy...................ok 7/8 skipped: to avoid blocking t/Tempfile...................ok t/Term.......................ok t/tinyseq....................ok t/Tools......................ok t/Tree.......................ok t/TreeBuild..................ok t/TreeIO.....................ok 2/50 skipped: SVG::Graph output, SVG::Graph not installed t/trim.......................ok t/tRNAscanSE.................ok t/tutorial...................ok 18/21Use of uninitialized value in print at /home/febo/DownLoad/bioperl-1.5.0-RC2/blib/lib/bptutorial.pl line 4039, line 934. t/tutorial...................ok t/UCSCParsers................ok t/Unflattener................ok t/Unflattener2...............ok t/UniGene....................ok t/Variation_IO...............ok t/WABA.......................ok t/XEMBL_DB...................ok All tests successful, 116 subtests skipped. Files=193, Tests=8942, 298 wallclock secs (114.66 cusr + 6.80 csys = 121.46 CPU) by RJP From raoul.bonnal at itb.cnr.it Fri Jan 21 10:45:14 2005 From: raoul.bonnal at itb.cnr.it (Raoul Jean Pierre Bonnal) Date: Fri Jan 21 10:41:33 2005 Subject: [Bioperl-l] gff -> match/hsp in gbrowse Message-ID: <1106322314.7583.27.camel@localhost> Dear Community, today I have upgraded my bioperl installation to 1.5.0-rc2. How can I configure my gbrose db.conf to display match/hsp from myfile.gff ( default bioperl 1.5.0-rc2 format ) ? Gbrowse's tutorial describe the configuration of the previous format and it doesn't work for gff3. Is it possible to filter hsp for every match by rank or score from gbrowser db.conf file ? Can you post a working example, plez? tnx in advance. by RJP From jdw at ou.edu Fri Jan 21 11:47:22 2005 From: jdw at ou.edu (James D. White) Date: Fri Jan 21 11:43:24 2005 Subject: [Bioperl-l] Re: Bioperl-l Digest, Vol 21, Issue 12 References: <200501161451.j0GEpNKr028052@portal.open-bio.org> Message-ID: <41F1321A.72FB2289@ou.edu> Starting with: $regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})\1.*/i; The slashes in tr/// confused the Perl parser. You need to use different delimiters for the m// operator (the m is implied by //) and the tr/// operator. Also the tr/// operator does not use the i flag, so lower case needs to be handled explicitly. So let's try the following: $regex =~ m:\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCGatcg/TAGCtagc/);})\1.*:i; This gives the error: Can't modify constant item in transliteration (tr///) at (re_eval 1) line 1, near "tr/ATCGatcg/TAGCtagc/)" Inside the (??{ CODE }) sequence, use $1, $2, ..., instead of \1, \2, ... (See Programming Perl, 3rd Edition, "Match-time pattern interpolation", p. 213) Inside the evaluated CODE, \2 is a constant, not the value of the second captured substring. Also I'm not sure what modifying $2 would do, so let's try: $regex =~ m:\S+(\S+)(\S{10}).*(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/; reverse($rev);})\1.*:i; This works, but I would get rid of the leading "\S+" and trailing ".*". The ".*" adds nothing useful, so just drop it. You probably don't need the leading "\S+", because the pattern is not anchored to the beginning of the string with "^". The leading "\S+" gobbles up the entire string, forcing the match to backtrack character by character from the end. It also forces the substring match saved in $1 to occur after the first character. Unless you never want $1 to consider the first character, just drop the leading "\S+". If you don't want to search the first character, then just use "\S". This results in: $regex =~ m:(\S+)(\S{10}).*(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/; reverse($rev);})\1:i; Finally I would probably change the remaining ".*" to ".*?". If you search with ".*" on a long sequence which could contain multiple sequences of interest, the ".*" pattern will match the rest of the sequence and force backtracking to match the first occurrence of "$1$2" with the last occurrence of "revcomp($2)$1". If you use ".*?", you match the first occurrence of "$1$2" with the nearest occurrence of "revcomp($2)$1". This results in the final regular expression: $regex =~ m:(\S+)(\S{10}).*?(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/; reverse($rev);})\1:i; > Date: Fri, 14 Jan 2005 14:12:46 -0500 > From: Guojun Yang > Subject: [Bioperl-l] regular expression help! > To: bioperl-l@portal.open-bio.org > Message-ID: <20050114141246.94c7cb46@dogwood.plantbio.uga.edu> > Content-Type: text/plain; charset="us-ascii" > > Hi, Everybody, > I was trying to use a regex recognizing a patter of inverted repeat DNA seq flanked by direct repeats (see below), it returns errors saying "(?{...}) not terminated or {...} not balanced. Can anybody help me sorting this out? > The regex I have is: > $regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})\1.*/i; > Thank you, > Yang > -- James D. White (jdw@ou.edu) Director of Bioinformatics Department of Chemistry and Biochemistry/ACGT University of Oklahoma 101 David L. Boren Blvd., SRTC 2100 Norman, OK 73019 Phone: (405) 325-4912, FAX: (405) 325-7762 From brian_osborne at cognia.com Fri Jan 21 11:48:52 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Jan 21 11:45:18 2005 Subject: [Bioperl-l] Reading all sequences using Bio::DB::Flat in SwissProtfile In-Reply-To: Message-ID: Kenny, Did you take a look at Bio/Index/Swissprot.pm? What's important for you will be building the index using the keys you're interested in as opposed to the default key, using the id_parser method. See the Bio::Index section in the bptutorial for an example. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Daily, Kenneth Michael Sent: Wednesday, January 19, 2005 11:49 AM To: bioperl-l@portal.open-bio.org Subject: [Bioperl-l] Reading all sequences using Bio::DB::Flat in SwissProtfile I want to work with a local copy of the SwissProt database, and need to search through all of the entries. I only see methods to return sequences by accession. However, I cannot use just FASTA format of the SwissProt records, as I need to use the feature fields. What I need to learn is how to do a DB search on the features field of the SwissProt records, if its possible. Would there be any advantage do doing it with the DB instead of just using SeqIO as an input stream? I think it might, since every time I want to do a search I must read in the entire file again, which is very costly. Thank you. Kenny Daily Indiana University School of Informatics kmdaily [at] indiana [dot] edu _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From jdw at ou.edu Fri Jan 21 11:54:37 2005 From: jdw at ou.edu (James D. White) Date: Fri Jan 21 11:50:38 2005 Subject: [Bioperl-l] regular expression help! References: <200501161451.j0GEpNKr028052@portal.open-bio.org> <41F1321A.72FB2289@ou.edu> Message-ID: <41F133CD.3BDCA957@ou.edu> Sorry about double posting, but I forgot to change the subject before sending the first message. > Starting with: > > $regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})\1.*/i; > > The slashes in tr/// confused the Perl parser. You need to use > different delimiters for the m// operator (the m is implied by //) > and the tr/// operator. Also the tr/// operator does not use the > i flag, so lower case needs to be handled explicitly. So let's > try the following: > > $regex =~ m:\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCGatcg/TAGCtagc/);})\1.*:i; > > This gives the error: > Can't modify constant item in transliteration (tr///) at (re_eval 1) > line 1, near "tr/ATCGatcg/TAGCtagc/)" > > Inside the (??{ CODE }) sequence, use $1, $2, ..., instead of > \1, \2, ... (See Programming Perl, 3rd Edition, "Match-time pattern > interpolation", p. 213) Inside the evaluated CODE, \2 is a > constant, not the value of the second captured substring. Also I'm > not sure what modifying $2 would do, so let's try: > > $regex =~ m:\S+(\S+)(\S{10}).*(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/; reverse($rev);})\1.*:i; > > This works, but I would get rid of the leading "\S+" and trailing > ".*". The ".*" adds nothing useful, so just drop it. You > probably don't need the leading "\S+", because the pattern is not > anchored to the beginning of the string with "^". The leading > "\S+" gobbles up the entire string, forcing the match to backtrack > character by character from the end. It also forces the substring > match saved in $1 to occur after the first character. Unless you > never want $1 to consider the first character, just drop the > leading "\S+". If you don't want to search the first character, > then just use "\S". This results in: > > $regex =~ m:(\S+)(\S{10}).*(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/; reverse($rev);})\1:i; > > Finally I would probably change the remaining ".*" to ".*?". If > you search with ".*" on a long sequence which could contain > multiple sequences of interest, the ".*" pattern will match the rest > of the sequence and force backtracking to match the first occurrence > of "$1$2" with the last occurrence of "revcomp($2)$1". If you use > ".*?", you match the first occurrence of "$1$2" with the nearest > occurrence of "revcomp($2)$1". This results in the final regular > expression: > > $regex =~ m:(\S+)(\S{10}).*?(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/; reverse($rev);})\1:i; > > > Date: Fri, 14 Jan 2005 14:12:46 -0500 > > From: Guojun Yang > > Subject: [Bioperl-l] regular expression help! > > To: bioperl-l@portal.open-bio.org > > Message-ID: <20050114141246.94c7cb46@dogwood.plantbio.uga.edu> > > Content-Type: text/plain; charset="us-ascii" > > > > Hi, Everybody, > > I was trying to use a regex recognizing a patter of inverted repeat DNA seq flanked by direct repeats (see below), it returns errors saying "(?{...}) not terminated or {...} not balanced. Can anybody help me sorting this out? > > The regex I have is: > > $regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})\1.*/i; > > Thank you, > > Yang > > > > -- > James D. White (jdw@ou.edu) > Director of Bioinformatics > Department of Chemistry and Biochemistry/ACGT > University of Oklahoma > 101 David L. Boren Blvd., SRTC 2100 > Norman, OK 73019 > Phone: (405) 325-4912, FAX: (405) 325-7762 -- James D. White (jdw@ou.edu) Director of Bioinformatics Department of Chemistry and Biochemistry/ACGT University of Oklahoma 101 David L. Boren Blvd., SRTC 2100 Norman, OK 73019 Phone: (405) 325-4912, FAX: (405) 325-7762 From ed at compbio.berkeley.edu Fri Jan 21 12:09:04 2005 From: ed at compbio.berkeley.edu (Ed Green) Date: Fri Jan 21 12:06:05 2005 Subject: [Bioperl-l] dssp script In-Reply-To: <5F7CE35370B6CF429AA3CA960ECC278001638D65@EXCHANGE2.charite.de> References: <5F7CE35370B6CF429AA3CA960ECC278001638D65@EXCHANGE2.charite.de> Message-ID: <41F13730.1080203@compbio.berkeley.edu> Dear Peter, These two are in fact bugs that I will fix. The first results because of the presence of 'termination residues' that don't have residue numbers. Their residue numbers, then, can't be compared numerically. Fortunately, this bug won't result in wrong results as we want this comparison to always be false anyway. The solution to this is to first check if either of the termination residue signals are set and if so, don't do this numerical comparison. The second, blank line(s) at end of file will also be fixed. Beware that there is, I think, a bug in your script. It appears that you are attempting to iterate over all residues. However, iterating A:1 .. A:max doesn't get it done because of the crazy way residues can be numbered in PDB files: you'll miss all the residues with altloc codes (A:27A, A:27B, A:27C, e.g.). To make this easy an iterator is called for. It will just return all 'real' residues for the pdb file or for a specified chain - I'll try to get that done this weekend. Regards, Ed Green Robinson, Peter wrote: > Dear BioPerlers, > > I am writing a script to use the BioPerl DSSP module to print out a list of phi and psi angles for all applicable residues of all chains. Although the results are correct, I get the following error message at the end of each chain: > > Argument "" isn't numeric in numeric eq (==) at /usr/local/share/perl/5.8.4/Bio/Structure/SecStr/DSSP/Res.pm line 1168. > > and I am not quite sure where it is coming from. Perhaps I am using the wrong part of the API, but I am trying to get a list of all residues for each chain as follows: > > foreach my $ch (@chains) { > my $ss_elements_pts = $dssp->secBounds($ch); > print "Chain $ch:\n"; > my $pos = 0; > my $max = 0; > foreach my $stretch (@{$ss_elements_pts}) { > my $start = $stretch->[0]; > my $end = $stretch->[1]; > if ($end =~ m/(\d+)/) { $end = $1; } > > if ($end > $max) { $max = $end; } > } > ## END is now the last residue in this chain > for my $res (1..$max) { > my $residueID = $res . ":" . $ch; > my ($phi,$psi,$SS,$SSsum,$AA); > eval { $phi = $dssp->resPhi($residueID);}; > etc. > > The full script is appended to the bottom of this mail. > > > I also noticed what might be a minor bug in the module DSSP/Res.pm; when I use dsspcmbi to analyze a PDB file, it produces a results file with an empty last line. This causes a crash: > > Use of uninitialized value in chomp at /usr/local/share/perl/5.8.4/Bio/Structure/SecStr/DSSP/Res.pm line 1284, line 955. > > > If I manually remove this last empty line, there was no error. By adding the following line at Res.pm l.1284, you can fix the problem: > > > while ( chomp( $cur = <$file> ) ) { > next if ($cur =~ m/^\s*$/); ********************************************* > $res_num = substr( $cur, 0, 5 ); > $res_num =~ s/\s//g; > $self->{ 'Res' }->[ $res_num ] = &_parseResLine( $cur ); > } > } > > > > > Thanks in adavance for any tips! Peter > Peter N. Robinson, M.D. > Institute of Medical Genetics > Charit? University Hospital > Augustenburger Platz 1 > 13353 Berlin > Germany > ++49-30-450 569124 > peter.robinson@charite.de > http://www.charite.de/ch/medgen/robinson > Beware of bugs in the above code; I have only proved it correct, not tried it. -Donald Knuth, computer scientist (1938- ) > > ######################## > > #!/usr/bin/perl -w > use IO::File; > use Bio::Structure::SecStr::DSSP::Res; > use Data::Dumper; > > > =pod > parseDSSP.pl > Script to parse the output of DSSP using the BioPerl module > Bio::Structure::SecStr::DSSP::Res. To use it, process a PDB > file with dssp or dsspcmbi, and pass the resulting file to > this script. For more information on dssp and BioPerl see the > module documentation at http://bioperl.org > > @email peter.robinson@charite.de > 21 January, 2005 > > =cut > > > > my $file = "pdb43ca.dssp"; > my $dssp = new Bio::Structure::SecStr::DSSP::Res('-file'=> "$file"); > > my $pdbID = $dssp->pdbID(); > my $auth = $dssp->pdbAuthor(); > my $cmpd = $dssp->pdbCompound(); > my $pdb_date = $dssp->pdbDate(); > my $header = $dssp->pdbHeader(); > my $pdbSource = $dssp->pdbSource(); > > print "PDB entry $pdbID \n\tauthor:\t$auth", > "\n\tCompound:\t$cmpd", > "\n\tDate:\t$pdb_date", > "\n\tHeader:\t$header", > "\n\tsource:\t$pdbSource\n\n"; > > my $totalRes = $dssp->numResidues(); > print "Total residue count (all chains):$totalRes\n"; > > > my $surArea= $dssp->totSurfArea(); > print "Total accessible surface area:\t$surArea (square Ang)\n"; > > > my $chainRef = $dssp->chains(); > my @chains = sort @{$chainRef}; > print "Chain[s]:\n"; > foreach my $ch (@chains) { > print "\t$ch"; > } > print "\n"; > > my $hb = $dssp->hBonds(); > print "H BONDS.\n"; > print "TYPE O(I)-->H-N(J): $hb->[0]\n", > "IN PARALLEL BRIDGES: $hb->[1]\n", > "IN ANTIPARALLEL BRIDGES $hb->[2]\n", > "TYPE O(I)-->H-N(I-5) $hb->[3]\n", > "TYPE O(I)-->H-N(I-4) $hb->[4]\n", > "TYPE O(I)-->H-N(I-3) $hb->[5]\n", > "TYPE O(I)-->H-N(I-2) $hb->[6]\n", > "TYPE O(I)-->H-N(I-1) $hb->[7]\n", > "TYPE O(I)-->H-N(I+0) $hb->[8]\n", > "TYPE O(I)-->H-N(I+1) $hb->[9]\n", > "TYPE O(I)-->H-N(I+2) $hb->[10]\n", > "TYPE O(I)-->H-N(I+3) $hb->[11]\n", > "TYPE O(I)-->H-N(I+4) $hb->[12]\n", > "TYPE O(I)-->H-N(I+5) $hb->[13]\n", > "\n"; > > > > foreach my $ch (@chains) { > my $ss_elements_pts = $dssp->secBounds($ch); > print "Chain $ch:\n"; > my $pos = 0; > my $max = 0; > foreach my $stretch (@{$ss_elements_pts}) { > my $start = $stretch->[0]; > my $end = $stretch->[1]; > if ($end =~ m/(\d+)/) { $end = $1; } > > if ($end > $max) { $max = $end; } > } > ## END is now the last residue in this chain > for my $res (1..$max) { > my $residueID = $res . ":" . $ch; > my ($phi,$psi,$SS,$SSsum,$AA); > eval { $phi = $dssp->resPhi($residueID);}; > eval { $psi = $dssp->resPsi($residueID);}; > eval { $SS = $dssp->resSecStr($residueID);}; > eval { $SSsum = $dssp->resSecStrSum($residueID);}; > $AA = $dssp->resAA($residueID); > $phi = $phi || "n/a"; > $psi = $psi || "n/a"; > $SS = $SS || "-"; > my $SSclass; > if ($SSsum eq "H") { $SSclass = "helix"; } > elsif ($SSsum eq "T") { $SSclass = "turn"; } > elsif ($SSsum eq "B") { $SSclass = "beta"; } > else { $SSclass = $SSsum; } > print "$residueID) [$AA] phi:$phi psi:$psi SecStruct: $SS ($SSclass) \n"; > } > } > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From MAG at Stowers-Institute.org Fri Jan 21 12:14:47 2005 From: MAG at Stowers-Institute.org (Goel, Manisha) Date: Fri Jan 21 12:11:28 2005 Subject: [Bioperl-l] Amino acid frequency counter Message-ID: <200501211711.j0LHBCKr023115@portal.open-bio.org> Hi All, I have recently started using Bio-perl to analyse and manipulate my protein sequence alignments. I need to calculate aminoacid frequencies at each column of the alignment. Which module could be of help ? Thanks for guiding, -Manisha From cjm at fruitfly.org Fri Jan 21 12:32:33 2005 From: cjm at fruitfly.org (Chris Mungall) Date: Fri Jan 21 12:29:32 2005 Subject: [Bioperl-l] Reading all sequences using Bio::DB::Flat in SwissProtfile In-Reply-To: References: Message-ID: Brian, Unfortunately the id_parser method isn't supported in Bio::Index::Swissprot Even if it was I don't think it would be sufficient here - Kenny needs to index using the feature fields. This implies that the search key wouldn't be unique. Bio::Index::Abstract requires a unique key for the index. Flexible indexing and retrieval such as this is best handled using some generic non-bioperl specific solution - RDB, XMLDB, SRS, Lucene, LuceGene etc I forgot to mention Don Gilbert's LuceGene in my original reply - it's a fairly sane open-source alternative to SRS. It handles lots of bioinformatics file formats (not sure about swissprot but I'm sure it could be added) See: http://www.gmod.org/lucegene/index.shtml Cheers Chris On Fri, 21 Jan 2005, Brian Osborne wrote: > Kenny, > > Did you take a look at Bio/Index/Swissprot.pm? What's important for you will > be building the index using the keys you're interested in as opposed to the > default key, using the id_parser method. See the Bio::Index section in the > bptutorial for an example. > > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Daily, > Kenneth Michael > Sent: Wednesday, January 19, 2005 11:49 AM > To: bioperl-l@portal.open-bio.org > Subject: [Bioperl-l] Reading all sequences using Bio::DB::Flat in > SwissProtfile > > > I want to work with a local copy of the SwissProt database, and need to > search through all of the entries. I only see methods to return sequences by > accession. However, I cannot use just FASTA format of the SwissProt records, > as I need to use the feature fields. What I need to learn is how to do a DB > search on the features field of the SwissProt records, if its possible. > Would there be any advantage do doing it with the DB instead of just using > SeqIO as an input stream? I think it might, since every time I want to do a > search I must read in the entire file again, which is very costly. Thank > you. > > Kenny Daily > Indiana University > School of Informatics > kmdaily [at] indiana [dot] edu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From yguo at vbi.vt.edu Fri Jan 21 12:39:27 2005 From: yguo at vbi.vt.edu (yguo@vbi.vt.edu) Date: Fri Jan 21 12:35:31 2005 Subject: [Bioperl-l] Code for retrieving PDF file using a Pubmed link. In-Reply-To: <200501211711.j0LHBCKr023115@portal.open-bio.org> References: <200501211711.j0LHBCKr023115@portal.open-bio.org> Message-ID: <34464.128.173.99.81.1106329167.squirrel@webmail.vbi.vt.edu> [Seems that attachment is not supported. Here I re-send it...] Hi, Here I attached the code mentioned earlier. I donot know if the mailing list system supports attachement. So, I also paste the code at the end of this email. I have put the detailed instruction in the comment part. Any usage problem, please contact me. The module will do its best to find the PDF link. But it can fail at some publisher sites. You can let the module to put the processing result in a log file. The flag of "NOT_FOUND_OR_ALLOWED" means that it failed to download the PDF file. It is possible that the PDF location is too complicated to the parser, or your institute does not have right to view the full text. For around 360 publication (with full text link) required in our project, the module can got the PDF for around 330 of them. While our project going on, I will update this module to make it more robust. I hope the module can be a part of Bioperl ultimately. But before that, you guys can help me to test. Good weekend, Yongjian Guo at Virginia Bioinformatics Institute ----------------------------------------------------------------------- # $Id: PDFDownloader.pm 2005/1/20$ # Version 0.1 # # Cared for by Yongjian Guo # For copyright and disclaimer see below. # POD documentation - main docs before the code =head1 NAME PDFDownloader - Download full text PDF file using a Pubmed entry. =head1 SYNOPSIS use PDFDownloader; #build the object, $worker = new PDFDownloader({logFile=>$logFile, link=>$link, dir=>$dirName, fileName=>$fileName}); #start to download. $worker->start(); The log information can be saved in the log file or shown on screen. The following information will be given: DONE : Successfully finish downloading. NOT_OPEN_MED : Can not open the medine page. NOT_OPEN_PUB : Can not open the publisher site, NO_LINK : The given link does not have full text link out. NOT_FOUND_OR_ALLOWED : PDF entry can not be found or user does not have right to view full text. =head1 DESCRIPTION This module will download the full text PDF file from the publisher website using a Pubmed entry, if there is full text available. =head1 Attributes link: The pubmed link for an article. logFile: The assigned log file name. If it is the empty, the information will be shown on screen. dir: The directory of the pdf file to be saved. fileName: The name prefix of the target PDF file to be saved. The downloaded file has the name of fileName.pdf =head1 FEEDBACK =head2 Reporting Bugs Report bugs to yguo@vbi.vt.edu. =head1 AUTHORS Yongjian Guo @ Virginia Bioinformatics Institute. =head1 COPYRIGHT Copyright (c) 2004 Virginia Bioinformatics Institute. All Rights Reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =head1 DISCLAIMER This software is provided "as is" without warranty of any kind. =cut package PDFDownloader; use strict; use LWP::UserAgent; use HTTP::Cookies; #Function to create the PDFDownloader object. #the parameter is a hash and its required entry #is "link", which is a Pubmed article entry, like: #http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=8931319 sub new{ my $self=shift; my $para=shift; my %op=(); $op{keep_alive}=1; $op{agent}="Mozilla/5.0"; $op{timeout}=20; $op{cookie_jar}=HTTP::Cookies->new(file => "cookies.txt"); my $class=bless{ logFile=>$para->{logFile} || "", link=>$para->{link}, #what given is a link. dir=>$para->{dir} || ".", fileName=>$para->{fileName} || rand(), base=>"", #use to save the base url of the publisher site. fp=>LWP::UserAgent->new(%op), }, $self; if(!defined $class) { die "can not create object $class\n"; } return $class; } #function to start searching and download process. sub start{ my $self=shift; my $ncbiBase="http://www.ncbi.nlm.nih.gov"; my $data=$self->_getLinkContent($self->{link}); if(length($data)==0){ $self->_log("NOT_OPEN_MED\t".$self->{link}); } #ok we get the link content, we analysis. if($data=~/href=\".*?db=pubmed\&url=(.*?)\"\s+/){ #if we can get this pattern. direct $self->_parsePubSite($1); }elsif($data=~/href=\"(.*?articlerender.*?)\"\s+/){ #if is the direct deposit, not direct $self->_parsePubSite($1); }else{ $self->_log("NO_LINK\t".$self->{link}); } } #function to parse the first page on the publisher site. sub _parsePubSite{ my($self, $link)=@_; my $data=$self->_getLinkContent($link); my ($pos, $pos2, $tmpString, $result, @array); $result=-1; #initial is negative. if(length($data)==0){ $self->_log("NOT_OPEN_PUB\t".$link); return; } #find the PDF string, $data=~s/\n//g; $data=~s/ / /g; #first we try if this is a direct link, $tmpString=$self->_getPDFLink("", $data); if(length($tmpString)!=0){ #found the link, $result=$self->_tryGetPDF($tmpString); if($result==1){ return; } } if($data=~/([\s|(|>]pdf[\)|\s|<])/ || $data=~/([\s|(|>]PDF[\)|\s|<|"])/ ){ #two possiblities, # 1. a link, # 2. a javascript. $pos=index($data, $1); $pos2=rindex(substr($data, 0, $pos), "href="); #found the earliest href. $tmpString=substr($data, $pos2, $pos-$pos2); if($tmpString=~/\"(.*?)\"/){ $tmpString=$1; } #further extraction if($tmpString=~/\'(.*?)\'/){ $tmpString=$1; } #ok, here we got the $tmpString for a next link, #use a try mechanism, $result=$self->_tryGetPDF($tmpString); if($result==-1){ #no success, @array=$self->_getSubLinks($tmpString); foreach my $entry (@array){ if($self->_tryGetPDF($self->_getPDFLink($entry))==1){ $result=1; #success, last; } } } } #further try, if($result!=1){ #it is possible that the direct link is a frame, @array=$self->_getSubLinks($link); foreach my $entry (@array){ if($self->_tryGetPDF($self->_getPDFLink($entry))==1){ $result=1; #success, last; } } if($result!=1){ $self->_log("NOT_FOUND_OR_ALLOWED\t".$self->{link}); } } } sub _tryGetPDF{ my($self, $link)=@_; my $result=$self->_getPDFFile($link); if($result==1){ $self->_log("DONE\t".$self->{link}); } return $result; } #given a web page, use this one to get all of the links in that page. sub _getSubLinks{ my ($self, $link)=@_; my $data=$self->_getLinkContent($link); my @array=(); my $pos=0; my $pos2=0; my $tmp=""; my $count=0; while(1){ $pos=index($data, "\"", $pos); if($pos==-1 || $count>50){ #it is possible the page does not have link. we use the number to control. last; } $pos2=index($data, "\"", $pos+1); $tmp=substr($data, $pos+1, $pos2-$pos-1); if($tmp=~/^(http)|\//){ push(@array, $tmp); } $pos=$pos2+1; $count++; } if($count>=50){ @array=(); } return @array; } #function to return a pdf file link from a webpage. sub _getPDFLink{ my($self, $link, $data)=@_; if(length($link)!=0){ $data=$self->_getLinkContent($link); $data=~s/\n//g; } if($data=~/.*[\"|\'](.{5,}\.pdf)[\"|\']/ || $data=~/.*[\"|\'](.{5,}\.PDF)[\"|\']/ ){ #ok, there is pdf file. return $1; } return ""; #not found. } #function to get the homepage. redirection is taken cared. sub _getLinkContent{ my ($self, $link)=@_; if($link!~/http:\/\//){ #some link has the format of http:/www.. my $pos=index($link, "/"); $link=substr($link, $pos); } if($link!~/^http/){ $link=$self->_buildURL($link); } $link=~s/&\;/&/g; my $response=$self->{fp}->get($link); my $rHeader=""; #ok, we need to analysis the header. to see if there is a refresh, #if yes, we will refresh the link, if($response->is_success()){ $rHeader=$response->header("Refresh"); if(length($rHeader)>0){ if($rHeader=~/URL\=(.*)/){ return $self->_getLinkContent($1); } }else{ #update the base url. $self->{base}=$response->base(); return $response->content; } } return ""; } #get the real pdf file. redirection is taken cared. sub _getPDFFile{ my($self, $link)=@_; if($link!~/^http/){ $link=$self->_buildURL($link); } $link=~s/&\;/&/g; my $done=0; my $fileName=$self->{dir}."/".$self->{fileName}.".pdf"; #try to see if there is a refresh, my $response=$self->{fp}->get($link); if($response->is_success()){ my $rHeader=$response->header("Refresh"); if(length($rHeader)>0 && $rHeader=~/URL\=(.*)/){ return $self->_getPDFFile($1); } } $self->{fp}->get($link, ":content_file"=>$fileName); #ok now, we test if this file is the pdf file, if yes, #we done, if not, return some message. open PDFIN, $fileName; while(){ if($_=~/^%PDF/){ $done=1; last; } } close PDFIN; if($done==0){ unlink $fileName; return -1; } return 1; #everything ok, } #function to record the log sub _log{ my($self, $data)=@_; if(length($self->{logFile})==0){ print $data,"\n"; return; } open LOGOUT, ">>".$self->{logFile} or die "can not open the log file to write\n"; print LOGOUT $data, "\n"; close LOGOUT; return; } #function to build the full url. sub _buildURL{ my($self, $target)=@_; if($target=~/^\//){ if($self->{base}=~/(http:\/\/.*?)\//){ return $1.$target; } }else{ if($self->{base}=~/(http:\/\/.*)\//){ return $1."/".$target; } } return $target; } 1; From barry.moore at genetics.utah.edu Fri Jan 21 13:51:38 2005 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Fri Jan 21 13:47:44 2005 Subject: [Bioperl-l] regular expression help! In-Reply-To: <41F133CD.3BDCA957@ou.edu> References: <200501161451.j0GEpNKr028052@portal.open-bio.org> <41F1321A.72FB2289@ou.edu> <41F133CD.3BDCA957@ou.edu> Message-ID: <41F14F3A.2010604@genetics.utah.edu> Excellent reply. I think we all learned something from that one. Barry James D. White wrote: >Sorry about double posting, but I forgot to change the subject before >sending the first message. > > > >>Starting with: >> >>$regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})\1.*/i; >> >>The slashes in tr/// confused the Perl parser. You need to use >>different delimiters for the m// operator (the m is implied by //) >>and the tr/// operator. Also the tr/// operator does not use the >>i flag, so lower case needs to be handled explicitly. So let's >>try the following: >> >>$regex =~ m:\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCGatcg/TAGCtagc/);})\1.*:i; >> >>This gives the error: >>Can't modify constant item in transliteration (tr///) at (re_eval 1) >>line 1, near "tr/ATCGatcg/TAGCtagc/)" >> >>Inside the (??{ CODE }) sequence, use $1, $2, ..., instead of >>\1, \2, ... (See Programming Perl, 3rd Edition, "Match-time pattern >>interpolation", p. 213) Inside the evaluated CODE, \2 is a >>constant, not the value of the second captured substring. Also I'm >>not sure what modifying $2 would do, so let's try: >> >>$regex =~ m:\S+(\S+)(\S{10}).*(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/; reverse($rev);})\1.*:i; >> >>This works, but I would get rid of the leading "\S+" and trailing >>".*". The ".*" adds nothing useful, so just drop it. You >>probably don't need the leading "\S+", because the pattern is not >>anchored to the beginning of the string with "^". The leading >>"\S+" gobbles up the entire string, forcing the match to backtrack >>character by character from the end. It also forces the substring >>match saved in $1 to occur after the first character. Unless you >>never want $1 to consider the first character, just drop the >>leading "\S+". If you don't want to search the first character, >>then just use "\S". This results in: >> >>$regex =~ m:(\S+)(\S{10}).*(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/; reverse($rev);})\1:i; >> >>Finally I would probably change the remaining ".*" to ".*?". If >>you search with ".*" on a long sequence which could contain >>multiple sequences of interest, the ".*" pattern will match the rest >>of the sequence and force backtracking to match the first occurrence >>of "$1$2" with the last occurrence of "revcomp($2)$1". If you use >>".*?", you match the first occurrence of "$1$2" with the nearest >>occurrence of "revcomp($2)$1". This results in the final regular >>expression: >> >>$regex =~ m:(\S+)(\S{10}).*?(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/; reverse($rev);})\1:i; >> >> >> >>>Date: Fri, 14 Jan 2005 14:12:46 -0500 >>>From: Guojun Yang >>>Subject: [Bioperl-l] regular expression help! >>>To: bioperl-l@portal.open-bio.org >>>Message-ID: <20050114141246.94c7cb46@dogwood.plantbio.uga.edu> >>>Content-Type: text/plain; charset="us-ascii" >>> >>>Hi, Everybody, >>>I was trying to use a regex recognizing a patter of inverted repeat DNA seq flanked by direct repeats (see below), it returns errors saying "(?{...}) not terminated or {...} not balanced. Can anybody help me sorting this out? >>>The regex I have is: >>>$regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})\1.*/i; >>>Thank you, >>>Yang >>> >>> >>> >>-- >>James D. White (jdw@ou.edu) >>Director of Bioinformatics >>Department of Chemistry and Biochemistry/ACGT >>University of Oklahoma >>101 David L. Boren Blvd., SRTC 2100 >>Norman, OK 73019 >>Phone: (405) 325-4912, FAX: (405) 325-7762 >> >> > >-- >James D. White (jdw@ou.edu) >Director of Bioinformatics >Department of Chemistry and Biochemistry/ACGT >University of Oklahoma >101 David L. Boren Blvd., SRTC 2100 >Norman, OK 73019 >Phone: (405) 325-4912, FAX: (405) 325-7762 > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT From akozik at atgc.org Fri Jan 21 19:21:15 2005 From: akozik at atgc.org (Alexander Kozik) Date: Fri Jan 21 18:17:07 2005 Subject: [Bioperl-l] GenBank gene field Message-ID: <41F19C7B.3000101@atgc.org> Please take a look on two sample records from GenBank files (Arabidopsis and C.elegans) C.elegans file has "/gene" entries for both "gene" and "CDS" fields. Arabidopsis file has no "/gene" entries at all. Previous version of Arabidopsis GenBank file was with "/gene" entries. Could you help to understand why it happens and what entry you suggest to extract if user is interested in extraction of corresponding gene names. Do I use terms "entry" and "field" properly? Thanks a lot in advance, Alexander Kozik Bioinformatics Specialist Genome and Biomedical Sciences Facility 451 East Health Sciences Drive University of California Davis, CA 95616-8816 Phone: (530) 754-9127 email: akozik@atgc.org web: http://www.atgc.org/ ---- Arabidopsis GenBank file NC_003070.gbk: gene complement(38753..40944) /locus_tag="At1g01070" /note="synonym: T25K16.7; nodulin MtN21 family protein" /db_xref="GeneID:839550" ... CDS complement(join(38898..39054,39136..39287,39409..39814, 40213..40329,40473..40535,40675..40877)) /locus_tag="At1g01070" /note="similar to MtN21 GI:2598575 (root nodule development) from [Medicago truncatula]" /codon_start=1 /protein_id="NP_563617.1" /db_xref="GI:18378792" /db_xref="GeneID:839550" /translation="MAG... ---- C.elegans GenBank file NC_003279.gbk: gene 43733..44677 /gene="1A519" /locus_tag="1A519" /synonym="Y74C9A.1" /note="Title: Caenorhabditis elegans expressed gene 1A519." ... CDS join(43733..43961,44030..44234,44281..44328,44521..44677) /gene="1A519" /locus_tag="1A519" /codon_start=1 /product="putative protein (1A519)" /protein_id="17510627" /db_xref="GI:17510627" ... From jason.stajich at duke.edu Fri Jan 21 22:22:52 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Jan 21 22:19:06 2005 Subject: [Bioperl-l] Amino acid frequency counter In-Reply-To: <200501211711.j0LHBCKr023115@portal.open-bio.org> References: <200501211711.j0LHBCKr023115@portal.open-bio.org> Message-ID: Bio::AlignIO for reading in sequence alignments produces Bio::SimpleAlign objects. On Jan 21, 2005, at 12:14 PM, Goel, Manisha wrote: > Hi All, > I have recently started using Bio-perl to analyse and manipulate my > protein sequence alignments. > I need to calculate aminoacid frequencies at each column of the > alignment. Which module could be of help ? > Thanks for guiding, > -Manisha > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From jason.stajich at duke.edu Fri Jan 21 22:23:51 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Jan 21 22:19:55 2005 Subject: [Bioperl-l] GenBank gene field In-Reply-To: <41F19C7B.3000101@atgc.org> References: <41F19C7B.3000101@atgc.org> Message-ID: <09DD5061-6C25-11D9-A728-000393C44276@duke.edu> You should probably ask the data providers.... On Jan 21, 2005, at 7:21 PM, Alexander Kozik wrote: > Please take a look on two sample records from GenBank files > (Arabidopsis and C.elegans) > C.elegans file has "/gene" entries for both "gene" and "CDS" fields. > Arabidopsis file has no "/gene" entries at all. > Previous version of Arabidopsis GenBank file was with "/gene" entries. > Could you help to understand why it happens and what entry you suggest > to extract if user is interested in extraction of corresponding gene > names. > Do I use terms "entry" and "field" properly? > > Thanks a lot in advance, > > Alexander Kozik > Bioinformatics Specialist > Genome and Biomedical Sciences Facility > 451 East Health Sciences Drive > University of California > Davis, CA 95616-8816 > Phone: (530) 754-9127 > email: akozik@atgc.org > web: http://www.atgc.org/ > > ---- > > Arabidopsis GenBank file NC_003070.gbk: > > gene complement(38753..40944) > /locus_tag="At1g01070" > /note="synonym: T25K16.7; nodulin MtN21 family > protein" > /db_xref="GeneID:839550" > ... > CDS > complement(join(38898..39054,39136..39287,39409..39814, > 40213..40329,40473..40535,40675..40877)) > /locus_tag="At1g01070" > /note="similar to MtN21 GI:2598575 (root nodule > development) from [Medicago truncatula]" > /codon_start=1 > /protein_id="NP_563617.1" > /db_xref="GI:18378792" > /db_xref="GeneID:839550" > /translation="MAG... > ---- > > C.elegans GenBank file NC_003279.gbk: > > gene 43733..44677 > /gene="1A519" > /locus_tag="1A519" > /synonym="Y74C9A.1" > /note="Title: Caenorhabditis elegans expressed gene > 1A519." > ... > CDS > join(43733..43961,44030..44234,44281..44328,44521..44677) > /gene="1A519" > /locus_tag="1A519" > /codon_start=1 > /product="putative protein (1A519)" > /protein_id="17510627" > /db_xref="GI:17510627" > ... > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From gyang at plantbio.uga.edu Fri Jan 21 22:31:55 2005 From: gyang at plantbio.uga.edu (Guojun Yang) Date: Fri Jan 21 22:28:08 2005 Subject: what about the speed on longer seq? Re: [Bioperl-l] regular expression help! In-Reply-To: <41F133CD.3BDCA957@ou.edu> Message-ID: <20050121223155.bd16abb4@dogwood.plantbio.uga.edu> Thank you James for your detailed info. An earlier solution given is to use =~ /(\S{4,})(\S{10,}).+(??{sub($2)})\1/i; the sub is to do the transliteration and reversion of $2. It works greatly on ~80 bp seq. However, on a seq ~500 bp, it takes forever to do. Is there any similarity in processing time for the regex? I will definitely try it. Have a great one, Yang ----- Original Message ----- From: James D. White To: bioperl-l@portal.open-bio.org Sent: Fri, 21 Jan 2005 11:54:37 -0500 Subject: Re: [Bioperl-l] regular expression help! > Sorry about double posting, but I forgot to change the subject before > sending the first message. > > > Starting with: > > > > $regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})\1.*/i; > > > > The slashes in tr/// confused the Perl parser. You need to use > > different delimiters for the m// operator (the m is implied by //) > > and the tr/// operator. Also the tr/// operator does not use the > > i flag, so lower case needs to be handled explicitly. So let's > > try the following: > > > > $regex =~ m:\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ > tr/ATCGatcg/TAGCtagc/);})\1.*:i; > > > > This gives the error: > > Can't modify constant item in transliteration (tr///) at (re_eval 1) > > line 1, near "tr/ATCGatcg/TAGCtagc/)" > > > > Inside the (??{ CODE }) sequence, use $1, $2, ..., instead of > > \1, \2, ... (See Programming Perl, 3rd Edition, "Match-time pattern > > interpolation", p. 213) Inside the evaluated CODE, \2 is a > > constant, not the value of the second captured substring. Also I'm > > not sure what modifying $2 would do, so let's try: > > > > $regex =~ m:\S+(\S+)(\S{10}).*(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/; > reverse($rev);})\1.*:i; > > > > This works, but I would get rid of the leading "\S+" and trailing > > ".*". The ".*" adds nothing useful, so just drop it. You > > probably don't need the leading "\S+", because the pattern is not > > anchored to the beginning of the string with "^". The leading > > "\S+" gobbles up the entire string, forcing the match to backtrack > > character by character from the end. It also forces the substring > > match saved in $1 to occur after the first character. Unless you > > never want $1 to consider the first character, just drop the > > leading "\S+". If you don't want to search the first character, > > then just use "\S". This results in: > > > > $regex =~ m:(\S+)(\S{10}).*(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/; > reverse($rev);})\1:i; > > > > Finally I would probably change the remaining ".*" to ".*?". If > > you search with ".*" on a long sequence which could contain > > multiple sequences of interest, the ".*" pattern will match the rest > > of the sequence and force backtracking to match the first occurrence > > of "$1$2" with the last occurrence of "revcomp($2)$1". If you use > > ".*?", you match the first occurrence of "$1$2" with the nearest > > occurrence of "revcomp($2)$1". This results in the final regular > > expression: > > > > $regex =~ m:(\S+)(\S{10}).*?(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/; > reverse($rev);})\1:i; > > > > > Date: Fri, 14 Jan 2005 14:12:46 -0500 > > > From: Guojun Yang > > > Subject: [Bioperl-l] regular expression help! > > > To: bioperl-l@portal.open-bio.org > > > Message-ID: <20050114141246.94c7cb46@dogwood.plantbio.uga.edu> > > > Content-Type: text/plain; charset="us-ascii" > > > > > > Hi, Everybody, > > > I was trying to use a regex recognizing a patter of inverted repeat DNA seq > flanked by direct repeats (see below), it returns errors saying "(?{...}) not > terminated or {...} not balanced. Can anybody help me sorting this out? > > > The regex I have is: > > > $regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ > tr/ATCG/TAGC/i);})\1.*/i; > > > Thank you, > > > Yang > > > > > > > -- > > James D. White (jdw@ou.edu) > > Director of Bioinformatics > > Department of Chemistry and Biochemistry/ACGT > > University of Oklahoma > > 101 David L. Boren Blvd., SRTC 2100 > > Norman, OK 73019 > > Phone: (405) 325-4912, FAX: (405) 325-7762 > > -- > James D. White (jdw@ou.edu) > Director of Bioinformatics > Department of Chemistry and Biochemistry/ACGT > University of Oklahoma > 101 David L. Boren Blvd., SRTC 2100 > Norman, OK 73019 > Phone: (405) 325-4912, FAX: (405) 325-7762 > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From davhum at garvan.unsw.edu.au Thu Jan 20 21:11:26 2005 From: davhum at garvan.unsw.edu.au (davhum@garvan.unsw.edu.au) Date: Sat Jan 22 11:17:29 2005 Subject: [Bioperl-l] WebDBSeqI Request error (Bad protocol 'tcp')???? Message-ID: <4248.129.94.225.7.1106273486.squirrel@gimr.garvan.unsw.edu.au> Hi bioperl-groovers, Has anyone ever had to deal with the following error message? MSG: WebDBSeqI Request error: 500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (Bad protocol 'tcp') I have traced previous threads from the archives but they appear to be slightly different. What is confusing me the most is that the script that leaves this error worked perfectly on my machine but not on my colleagues new machine (running win XP). I don't understand why the "Bad protocol 'tcp'" comment is there, but the error cascades out from bioperl modules. Is it possible I did not install perl or bioperl correctly?. Any ideas or suggestions would be most appreciated? thanks in advance David Humphreys From yguo at vbi.vt.edu Fri Jan 21 12:35:47 2005 From: yguo at vbi.vt.edu (yguo@vbi.vt.edu) Date: Sat Jan 22 11:17:41 2005 Subject: [Bioperl-l] Code for automatic retrieving pdf file from the publisherwebsite. In-Reply-To: <001c01c4fe9d$08c64a70$7d75f345@WATSON> References: <1109.151.199.12.38.1106187167.squirrel@webmail.vbi.vt.edu> <001c01c4fe9d$08c64a70$7d75f345@WATSON> Message-ID: <34461.128.173.99.81.1106328947.squirrel@webmail.vbi.vt.edu> Hi, Here I attached the code mentioned earlier. I donot know if the mailing list system supports attachement. So, I also paste the code at the end of this email. I have put the detailed instruction in the comment part. Any usage problem, please contact me. The module will do its best to find the PDF link. But it can fail at some publisher sites. You can let the module to put the processing result in a log file. The flag of "NOT_FOUND_OR_ALLOWED" means that it failed to download the PDF file. It is possible that the PDF location is too complicated to the parser, or your institute does not have right to view the full text. For around 360 publication (with full text link) required in our project, the module can got the PDF for around 330 of them. While our project going on, I will update this module to make it more robust. I hope the module can be a part of Bioperl ultimately. But before that, you guys can help me to test. Good weekend, Yongjian Guo at Virginia Bioinformatics Institute ----------------------------------------------------------------------- # $Id: PDFDownloader.pm 2005/1/20$ # Version 0.1 # # Cared for by Yongjian Guo # For copyright and disclaimer see below. # POD documentation - main docs before the code =head1 NAME PDFDownloader - Download full text PDF file using a Pubmed entry. =head1 SYNOPSIS use PDFDownloader; #build the object, $worker = new PDFDownloader({logFile=>$logFile, link=>$link, dir=>$dirName, fileName=>$fileName}); #start to download. $worker->start(); The log information can be saved in the log file or shown on screen. The following information will be given: DONE : Successfully finish downloading. NOT_OPEN_MED : Can not open the medine page. NOT_OPEN_PUB : Can not open the publisher site, NO_LINK : The given link does not have full text link out. NOT_FOUND_OR_ALLOWED : PDF entry can not be found or user does not have right to view full text. =head1 DESCRIPTION This module will download the full text PDF file from the publisher website using a Pubmed entry, if there is full text available. =head1 Attributes link: The pubmed link for an article. logFile: The assigned log file name. If it is the empty, the information will be shown on screen. dir: The directory of the pdf file to be saved. fileName: The name prefix of the target PDF file to be saved. The downloaded file has the name of fileName.pdf =head1 FEEDBACK =head2 Reporting Bugs Report bugs to yguo@vbi.vt.edu. =head1 AUTHORS Yongjian Guo @ Virginia Bioinformatics Institute. =head1 COPYRIGHT Copyright (c) 2004 Virginia Bioinformatics Institute. All Rights Reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =head1 DISCLAIMER This software is provided "as is" without warranty of any kind. =cut package PDFDownloader; use strict; use LWP::UserAgent; use HTTP::Cookies; #Function to create the PDFDownloader object. #the parameter is a hash and its required entry #is "link", which is a Pubmed article entry, like: #http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=8931319 sub new{ my $self=shift; my $para=shift; my %op=(); $op{keep_alive}=1; $op{agent}="Mozilla/5.0"; $op{timeout}=20; $op{cookie_jar}=HTTP::Cookies->new(file => "cookies.txt"); my $class=bless{ logFile=>$para->{logFile} || "", link=>$para->{link}, #what given is a link. dir=>$para->{dir} || ".", fileName=>$para->{fileName} || rand(), base=>"", #use to save the base url of the publisher site. fp=>LWP::UserAgent->new(%op), }, $self; if(!defined $class) { die "can not create object $class\n"; } return $class; } #function to start searching and download process. sub start{ my $self=shift; my $ncbiBase="http://www.ncbi.nlm.nih.gov"; my $data=$self->_getLinkContent($self->{link}); if(length($data)==0){ $self->_log("NOT_OPEN_MED\t".$self->{link}); } #ok we get the link content, we analysis. if($data=~/href=\".*?db=pubmed\&url=(.*?)\"\s+/){ #if we can get this pattern. direct $self->_parsePubSite($1); }elsif($data=~/href=\"(.*?articlerender.*?)\"\s+/){ #if is the direct deposit, not direct $self->_parsePubSite($1); }else{ $self->_log("NO_LINK\t".$self->{link}); } } #function to parse the first page on the publisher site. sub _parsePubSite{ my($self, $link)=@_; my $data=$self->_getLinkContent($link); my ($pos, $pos2, $tmpString, $result, @array); $result=-1; #initial is negative. if(length($data)==0){ $self->_log("NOT_OPEN_PUB\t".$link); return; } #find the PDF string, $data=~s/\n//g; $data=~s/ / /g; #first we try if this is a direct link, $tmpString=$self->_getPDFLink("", $data); if(length($tmpString)!=0){ #found the link, $result=$self->_tryGetPDF($tmpString); if($result==1){ return; } } if($data=~/([\s|(|>]pdf[\)|\s|<])/ || $data=~/([\s|(|>]PDF[\)|\s|<|"])/ ){ #two possiblities, # 1. a link, # 2. a javascript. $pos=index($data, $1); $pos2=rindex(substr($data, 0, $pos), "href="); #found the earliest href. $tmpString=substr($data, $pos2, $pos-$pos2); if($tmpString=~/\"(.*?)\"/){ $tmpString=$1; } #further extraction if($tmpString=~/\'(.*?)\'/){ $tmpString=$1; } #ok, here we got the $tmpString for a next link, #use a try mechanism, $result=$self->_tryGetPDF($tmpString); if($result==-1){ #no success, @array=$self->_getSubLinks($tmpString); foreach my $entry (@array){ if($self->_tryGetPDF($self->_getPDFLink($entry))==1){ $result=1; #success, last; } } } } #further try, if($result!=1){ #it is possible that the direct link is a frame, @array=$self->_getSubLinks($link); foreach my $entry (@array){ if($self->_tryGetPDF($self->_getPDFLink($entry))==1){ $result=1; #success, last; } } if($result!=1){ $self->_log("NOT_FOUND_OR_ALLOWED\t".$self->{link}); } } } sub _tryGetPDF{ my($self, $link)=@_; my $result=$self->_getPDFFile($link); if($result==1){ $self->_log("DONE\t".$self->{link}); } return $result; } #given a web page, use this one to get all of the links in that page. sub _getSubLinks{ my ($self, $link)=@_; my $data=$self->_getLinkContent($link); my @array=(); my $pos=0; my $pos2=0; my $tmp=""; my $count=0; while(1){ $pos=index($data, "\"", $pos); if($pos==-1 || $count>50){ #it is possible the page does not have link. we use the number to control. last; } $pos2=index($data, "\"", $pos+1); $tmp=substr($data, $pos+1, $pos2-$pos-1); if($tmp=~/^(http)|\//){ push(@array, $tmp); } $pos=$pos2+1; $count++; } if($count>=50){ @array=(); } return @array; } #function to return a pdf file link from a webpage. sub _getPDFLink{ my($self, $link, $data)=@_; if(length($link)!=0){ $data=$self->_getLinkContent($link); $data=~s/\n//g; } if($data=~/.*[\"|\'](.{5,}\.pdf)[\"|\']/ || $data=~/.*[\"|\'](.{5,}\.PDF)[\"|\']/ ){ #ok, there is pdf file. return $1; } return ""; #not found. } #function to get the homepage. redirection is taken cared. sub _getLinkContent{ my ($self, $link)=@_; if($link!~/http:\/\//){ #some link has the format of http:/www.. my $pos=index($link, "/"); $link=substr($link, $pos); } if($link!~/^http/){ $link=$self->_buildURL($link); } $link=~s/&\;/&/g; my $response=$self->{fp}->get($link); my $rHeader=""; #ok, we need to analysis the header. to see if there is a refresh, #if yes, we will refresh the link, if($response->is_success()){ $rHeader=$response->header("Refresh"); if(length($rHeader)>0){ if($rHeader=~/URL\=(.*)/){ return $self->_getLinkContent($1); } }else{ #update the base url. $self->{base}=$response->base(); return $response->content; } } return ""; } #get the real pdf file. redirection is taken cared. sub _getPDFFile{ my($self, $link)=@_; if($link!~/^http/){ $link=$self->_buildURL($link); } $link=~s/&\;/&/g; my $done=0; my $fileName=$self->{dir}."/".$self->{fileName}.".pdf"; #try to see if there is a refresh, my $response=$self->{fp}->get($link); if($response->is_success()){ my $rHeader=$response->header("Refresh"); if(length($rHeader)>0 && $rHeader=~/URL\=(.*)/){ return $self->_getPDFFile($1); } } $self->{fp}->get($link, ":content_file"=>$fileName); #ok now, we test if this file is the pdf file, if yes, #we done, if not, return some message. open PDFIN, $fileName; while(){ if($_=~/^%PDF/){ $done=1; last; } } close PDFIN; if($done==0){ unlink $fileName; return -1; } return 1; #everything ok, } #function to record the log sub _log{ my($self, $data)=@_; if(length($self->{logFile})==0){ print $data,"\n"; return; } open LOGOUT, ">>".$self->{logFile} or die "can not open the log file to write\n"; print LOGOUT $data, "\n"; close LOGOUT; return; } #function to build the full url. sub _buildURL{ my($self, $target)=@_; if($target=~/^\//){ if($self->{base}=~/(http:\/\/.*?)\//){ return $1.$target; } }else{ if($self->{base}=~/(http:\/\/.*)\//){ return $1."/".$target; } } return $target; } 1; -------------- next part -------------- A non-text attachment was scrubbed... Name: PDFDownloader.pm Type: application/octet-stream Size: 8962 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050121/a5eb3594/PDFDownloader.obj From lstein at cshl.edu Fri Jan 21 18:17:26 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Sat Jan 22 11:17:44 2005 Subject: [Bioperl-l] gff -> match/hsp in gbrowse In-Reply-To: <1106322314.7583.27.camel@localhost> References: <1106322314.7583.27.camel@localhost> Message-ID: <200501211817.27073.lstein@cshl.edu> You can continue to work in gff2, even when using bioperl 1.5. Alternatively the GFF3 version of HSP alignments is a simple matter of replacing the target coordinates with the Target=XXXXXXX attribute using the format described in the GFF3 spec. Lincoln On Friday 21 January 2005 10:45 am, Raoul Jean Pierre Bonnal wrote: > Dear Community, > today I have upgraded my bioperl installation to 1.5.0-rc2. > How can I configure my gbrose db.conf to display match/hsp from > myfile.gff ( default bioperl 1.5.0-rc2 format ) ? > Gbrowse's tutorial describe the configuration of the previous > format and it doesn't work for gff3. > > Is it possible to filter hsp for every match by rank or score from > gbrowser db.conf file ? Can you post a working example, plez? > > > tnx in advance. > > by RJP > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 NOTE: Please copy Sandra Michelsen on all emails regarding scheduling and other time-critical topics. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050121/b89071dd/attachment.bin From colemanm at MIT.EDU Sat Jan 22 16:07:12 2005 From: colemanm at MIT.EDU (Maureen L Coleman) Date: Sat Jan 22 16:03:40 2005 Subject: [Bioperl-l] protal2dna and Bio::SimpleAlign Message-ID: <9628A3F4-6CB9-11D9-8EF9-000A95E515DC@mit.edu> Hi. I'm trying to use the protal2dna script (downloaded from Pasteur site) to convert protein alignments back to DNA alignments. It works in some cases but not in others. In the cases where it doesn't work, it pulls out the same sequence twice instead of pulling out seq1 and seq2 from my protein alignment. Then when it tries to match it up with the corresponding DNA sequence, it doesn't work - it matches prot1 with dna1 (correctly) and prot1 with dna2 (incorrectly). I suspect this might be related to the name,start,end (nse) method in Bio::SimpleAlign. Any suggestions? Thanks, Maureen From talcon at iastate.edu Sat Jan 22 19:57:43 2005 From: talcon at iastate.edu (Tim Alcon) Date: Sat Jan 22 19:53:54 2005 Subject: [Bioperl-l] bioperl-run for windows? Message-ID: <41F2F687.4040303@iastate.edu> Does a Windows version of bioperl-run exitst? If so, how do I get it? Tim From mlemieux at bioinfo.ca Sat Jan 22 23:04:11 2005 From: mlemieux at bioinfo.ca (Madeleine Lemieux) Date: Sat Jan 22 23:00:23 2005 Subject: what about the speed on longer seq? Re: [Bioperl-l] regular Message-ID: Below is a test string seeded with 3 instances of inverted repeats flanked by direct repeats and some code to find all such patterns. It's not as flexible as the EMBOSS palindrome finder nor is it a one-liner but it finds perfect inverted repeats fast. HTH, Madeleine --------------- #!/usr/bin/perl -w my $test_string = "GAAAATGGTTTAATCGGAAATTGAGTAGGAGGATAAAAGTCGCATGCTATTATAAATGAGATGCACTTTC GACACCTCGCGGAAGTATATAAATGAAAGAAGCCCTCAGAAAACTTTAAATTGGAAATAGAGGGAAAATT ACTGATGGTTGAAATCAGACCAAAATGGGATTGAAAGAGCCTTTCAGCCCTAGTGTGAGTGTCAGGTTTA acgtgggtttatctcaaacccacgtCTCTTGTTGAAATCAGACCAAAATGGGATTGAAAGGTTTGTTAAGGG CTTTGATTTGCTCCTCGGTGGCT CTGGTTGAAATCAGACCAAAATGGGATTGAAAGTAAAGCAGTTCACCCCTGTTACTGGTTTAACTGCCTT GTTGAAATCAGACCAAAATGGGATTGAAAGGTATTTGAATCAATGAAAAGAAATCTTACCTCGTCGTTGA AATCAGACCAAAATGGGATTGAAAGAGTCTTCTGGATGGGTCACAAGGGAGACATCGAGGCGTTGAAATC AGACCAAAATGGGATTGAAAGTCAGCAAGGTTACGTCGGAGATCCTCGAAGAGGGTATCAGTTGAAATCA GACCAAAATGGGATTGAAAGCGAGGATTGCTGCCAAAGAGAGCGCCTCGTTCTTCGGTTGAAATCAGACC AAAATGGGATTGAAAGAAAGTGAACATGCTTAAAGAAATGCTGACAGAAATTGAGTTGAAATCAGACCAA AATGGGATTGAAAGAGCGAGGAAGAGCTTGACGAATTCTTCAAAAGCGGAGTTGAAATCAGACCAAAATG GGATTGAAAGTTGCATTTACATCGGCAGAATTGGTCTCGTCGGAAGGCATGTTGAAATCAGACCAAAATG tttaatatcaaAGCATgggaaaggatattCCAAaatatcctttcccGCATacatataccataGGATTGAAAG CGGTTCTCTTACGTACTCATGCGAGAAGTGAGACTCGCGTTGGTTGAAATCAGACCAAAA TGGGATTGAAAGAGCAAGTCGTGAAACTGAGCAGTCAAAACAGATCGTTAGTTGAAATCAGACCAAAATG GGATTGAAAGTTTTCCCATACAATTACGACTTCGCCGGAAAAAAAGTTGAAATCAGACCAAAATGGGATT GAAAGAGCGAGTTCGACCACGTCGTAGGTCTGCTGTCGGCAAGTTGAAATCAGACCAAAATGGGATTGAA AGTGTTTGAAGTAGTTGAATACACCGTTGTGCTGTTTGTTGTTGAAATCAGACCAAAATGGGATTGAAAG AGAGGGAGTATTAGGGCCATACTGGCCGGAGTTGTGGTTGTTGAAATCAGACCAAAATGGGATTGAAAGA TTCCAAATTGCGGAAAAAGATTCGAGGGCAGTTACTTCCCGTTGAAATCAGACCAAAATGGGATTGAAAG ccttgtgtacacccttACGTCGTTTATTGCCGTAACGCTAACACCATACTCAAGAGTTGAAATCAGACCAAA ATGGGATTGAAAGA AAGCCGTCCAGCGATTGTTTTCATCCGCACCGATAATAGGTTGAAATCAGACCAAAATGGGATTGAAAGG GTTTAGACTTCCAGCAGGTAAGACATTCAAGGTTCGTTGAAATCAGACCAAAATGGGATTGAAAGGAGGT AATAGCTGCGAGGGTCAAGCAGGTTTACGAGAAGTTGAAATCAGACCAAAATGGGATTGAAAGGAGCAAT"; # arbitrarily insist on direct and inverted repeats of at least 4 bases long while ( (length $test_string) > 15 ) { $seq = lc $test_string; # find direct repeats and work on the sequence between them $seq =~ m/([acgt]{4,})(?=([acgtn]+)\1)/; my $direct = $1; my $middle_stuff = my $reverse_complement = $2; if ($direct && $middle_stuff) { $reverse_complement = reverse $reverse_complement; $reverse_complement =~ tr/acgtn/tgcan/; my $inverted = ""; my $char = ""; # starting from the position next to the direct repeat, build up a string # from the matching characters of the original sequence and its rev_compl # don't bother looking past mid_point of string my $mid_point = (length $middle_stuff) / 2; while ( ((length $middle_stuff) > $mid_point) && (($char = chop $middle_stuff) eq (chop $reverse_complement)) ) { $inverted = $inverted . $char; } if ( (length $inverted) > 3) { if ($inverted =~ m/n/) { print "possible inverted repeat found: $inverted\nbetween $direct\n"; } else { print "inverted repeat found: $inverted\nbetween $direct\n"; } print "substring length = ", length $test_string, "\n\n"; # last; } # step through the original string from the 2nd position of the # current direct repeat $seq =~ m/$direct/g; my $newstart = pos($seq) - (length $direct) + 1; $test_string = substr $test_string, $newstart; } else { last; } } From ch01ph14 at uohyd.ernet.in Sat Jan 22 23:11:51 2005 From: ch01ph14 at uohyd.ernet.in (Sunil Kumar Panigahi) Date: Sat Jan 22 23:07:57 2005 Subject: [Bioperl-l] Perl Script for Hydrogen Bonding Message-ID: <1049.202.41.85.161.1106453511.squirrel@uohmail.uohyd.ernet.in> Hi, Can any body provide me the script for hydrogen bonding. I want to calculate the hydrogen bond in Pdb(Protein data bank file). Thanks in advance Sunil ----------------------------------------- This email was sent using UOH MAIL SERVER. " Confidential Information!" http://www.uohyd.ernet.in/ From rob at salmonella.org Sat Jan 22 23:22:12 2005 From: rob at salmonella.org (Rob Edwards) Date: Sat Jan 22 23:19:40 2005 Subject: what about the speed on longer seq? Re: [Bioperl-l] regular In-Reply-To: References: Message-ID: <5AFC029A-6CF6-11D9-A47D-000A959E1622@salmonella.org> There is also a module I wrote a while back to go into Bio::Tools that will find direct and indirect, exact and some imperfect repeats. I have not benchmarked this against other sequences, it does what I need and slow is not always bad (it gives you time for coffee...) You can pass in a sequence object and get back a sequence object with the repeats annotated in (so that you can just write them out or get their sequences or other bioperly kinds of things). There is one dependency on Tie::RefHash. YMMV, but take a look: http://salmonella.org/bioperl/RepeatFinder.pm Rob From talcon at iastate.edu Sun Jan 23 15:02:18 2005 From: talcon at iastate.edu (Tim Alcon) Date: Sun Jan 23 15:44:40 2005 Subject: [Bioperl-l] bioperl-run for windows? In-Reply-To: References: <41F2F687.4040303@iastate.edu> Message-ID: <41F402CA.3020905@iastate.edu> If I just grab it off CPAN, will it work on Windows, or does it use Unix system calls? Tim Jason Stajich wrote: > Is there a PPM on the bioperl site? > No > > Can you install bioperl-run on windows? > Yes - but you'll have to do it manually, or learn how to build PPMs > (quite simple really), or encourage someone to produce a PPM for > bioperl-run. > > -jason > On Jan 22, 2005, at 7:57 PM, Tim Alcon wrote: > >> Does a Windows version of bioperl-run exitst? If so, how do I get it? >> >> Tim >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > From jason.stajich at duke.edu Sun Jan 23 09:20:43 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Sun Jan 23 16:13:19 2005 Subject: [Bioperl-l] bioperl-run for windows? In-Reply-To: <41F2F687.4040303@iastate.edu> References: <41F2F687.4040303@iastate.edu> Message-ID: Is there a PPM on the bioperl site? No Can you install bioperl-run on windows? Yes - but you'll have to do it manually, or learn how to build PPMs (quite simple really), or encourage someone to produce a PPM for bioperl-run. -jason On Jan 22, 2005, at 7:57 PM, Tim Alcon wrote: > Does a Windows version of bioperl-run exitst? If so, how do I get it? > > Tim > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From jason.stajich at duke.edu Sun Jan 23 09:19:13 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Sun Jan 23 16:13:26 2005 Subject: [Bioperl-l] protal2dna and Bio::SimpleAlign In-Reply-To: <9628A3F4-6CB9-11D9-8EF9-000A95E515DC@mit.edu> References: <9628A3F4-6CB9-11D9-8EF9-000A95E515DC@mit.edu> Message-ID: I'm not familiar with the script. Bio::Align::Utilities does protein to DNA mapping for an alignment with the aa_to_dna_aln function. -jason On Jan 22, 2005, at 4:07 PM, Maureen L Coleman wrote: > Hi. > I'm trying to use the protal2dna script (downloaded from Pasteur site) > to convert protein alignments back to DNA alignments. It works in some > cases but not in others. In the cases where it doesn't work, it pulls > out the same sequence twice instead of pulling out seq1 and seq2 from > my protein alignment. Then when it tries to match it up with the > corresponding DNA sequence, it doesn't work - it matches prot1 with > dna1 (correctly) and prot1 with dna2 (incorrectly). > > I suspect this might be related to the name,start,end (nse) method in > Bio::SimpleAlign. Any suggestions? > > Thanks, > Maureen > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From jason.stajich at duke.edu Sun Jan 23 15:11:07 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Sun Jan 23 16:14:21 2005 Subject: [Bioperl-l] bioperl-run for windows? In-Reply-To: <41F402CA.3020905@iastate.edu> References: <41F2F687.4040303@iastate.edu> <41F402CA.3020905@iastate.edu> Message-ID: It uses perl system calls to execute a program - usually just the backticks approach. Honestly I have no idea how much success you will have. If you use Cygwin it will probably do okay for most programs, but I have never made any attempt to run any of it myself under windows. You'll have to see if any other list members have experiences. We tried to make the code flexible (adding .exe to the names of executables, etc) when the program is being run on windows. You'll just have to give it a try and report in with problems. -jason On Jan 23, 2005, at 3:02 PM, Tim Alcon wrote: > If I just grab it off CPAN, will it work on Windows, or does it use > Unix system calls? > > Tim > > > > Jason Stajich wrote: > >> Is there a PPM on the bioperl site? >> No >> >> Can you install bioperl-run on windows? >> Yes - but you'll have to do it manually, or learn how to build PPMs >> (quite simple really), or encourage someone to produce a PPM for >> bioperl-run. >> >> -jason >> On Jan 22, 2005, at 7:57 PM, Tim Alcon wrote: >> >>> Does a Windows version of bioperl-run exitst? If so, how do I get >>> it? >>> >>> Tim >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> -- >> Jason Stajich >> jason.stajich at duke.edu >> http://www.duke.edu/~jes12/ >> >> > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From nathanhaigh at ukonline.co.uk Sun Jan 23 16:43:55 2005 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Sun Jan 23 16:40:43 2005 Subject: [Bioperl-l] bioperl-run for windows? In-Reply-To: <41F2F687.4040303@iastate.edu> Message-ID: If you mean does a ppm version of bioperl-run exist, I don't think it does. When bioperl 1.5 is released I plan to make a ppd file for both bioperl-1.5 and bioperl-run available so people can install them easily under windows. You can however, get the bioperl-run 1.4 file from the bioperl website: http://www.bioperl.org/Core/Latest/index.shtml OR get the latest CVS version from the bioperl website: http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-run/?cvsroot=bioperl To install it, you will need nmake 1.5: http://download.microsoft.com/download/vc15/Patch/1.52/W95/EN-US/Nmake15.exe Unpack the downloaded bioperl file, and run from within that directory: "perl Makefile.PL" "nmake test" "nmake install" If all this sounds to difficult, wait a couple of weeks for the ppm version of v1.5 to become available Nathan > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Tim Alcon > Sent: 23 January 2005 00:58 > To: bioperl-l@portal.open-bio.org > Subject: [Bioperl-l] bioperl-run for windows? > > Does a Windows version of bioperl-run exitst? If so, how do I get it? > > Tim > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > --- > avast! Antivirus: Inbound message clean. > Virus Database (VPS): 0503-2, 21/01/2005 > Tested on: 23/01/2005 21:34:15 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0503-2, 21/01/2005 Tested on: 23/01/2005 21:43:36 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com From jason.stajich at duke.edu Sun Jan 23 21:55:47 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Sun Jan 23 21:51:50 2005 Subject: [Bioperl-l] 1.5.0 release Message-ID: <72D911D2-6DB3-11D9-9F52-000393C44276@duke.edu> If there are some more outstanding commits before we roll 1.5.0 out, please let me know. Otherwise I'll tag and release the 1.5.0 tarball Monday. This is a developer's release so it does not have to be completely perfect. -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From Marc.Logghe at devgen.com Mon Jan 24 05:05:12 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Mon Jan 24 05:01:31 2005 Subject: [Bioperl-l] struggling with Bio::FeatureIO and Bio::SeqFeature::Annotated Message-ID: Hi all, I have some problems with Bio::FeatureIO and Bio::SeqFeature::Annotated. But maybe these modules are not designed for the things I had in mind. My initial goal seemed pretty straightforward. It turned out differently. I have a gff file containing features of bunch of bioentries sitting in BioSQL. I wanted to turn the gff into feature objects, add them to the bioentries, and save them back into the database. As a test I fetch a genbank record, strip the features and convert them to gff. The gff is again converted to features and added to the stripped seq object. The test script looks like this: ======================================================== #!/usr/bin/perl use strict; use Bio::SeqIO; use Bio::Tools::GFF; use Bio::FeatureIO; use IO::String; use Bio::DB::GenBank; use Data::Dumper; *Bio::SeqFeature::Annotated::all_tags = \*Bio::SeqFeature::Annotated::get_all_tags; my $gff; my $gffio = IO::String->new($gff); my $db = Bio::DB::GenBank->new; my $sout = Bio::SeqIO->new(-fh => \*STDOUT, -format => 'genbank'); my $seq = $db->get_Seq_by_acc('Z50755'); my @feat = $seq->remove_SeqFeatures; # writing option 1 my $fout = Bio::Tools::GFF->new(-fh => $gffio, -gff_version => 3); # writing option 2 my $fout = Bio::FeatureIO->new(-fh => $gffio, -format => 'gff', -version => 3); $fout->write_feature(@feat); $gffio = IO::String->new($gff); my $fin = Bio::FeatureIO->new(-fh => $gffio, -format => 'gff', -version => 3); while (my $feat = $fin->next_feature) { $seq->add_SeqFeature($feat); } print Data::Dumper->Dump([$seq],['seq']); $sout->write_seq($seq); ======================================================== First, I had an issue when writing the features to gff using Bio::FeatureIO (writing option 2): ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: only Bio::SeqFeature::Annotated objects are writeable STACK: Error::throw STACK: Bio::Root::Root::throw /home/marcl/src/bioperl/bioperl-live/Bio/Root/Root.pm:328 STACK: Bio::FeatureIO::gff::write_feature /home/marcl/src/bioperl/bioperl-live/Bio/FeatureIO/gff.pm:259 STACK: ./test.pl:25 ----------------------------------------------------------- Therefore, I used Bio::Tools::GFF to write (writing option 1). But then, I run into troubles when it comes to dumping the sequence into genbank format: Can't locate object method "all_tags" via package "Bio::SeqFeature::Annotated" at /home/marcl/src/bioperl/bioperl-live/Bio/SeqIO/FTHelper.pm line 212, line 52. I tried to fix this by adding the line *Bio::SeqFeature::Annotated::all_tags = \*Bio::SeqFeature::Annotated::get_all_tags; But in vain: Can't locate object method "get_all_tags" via package "Bio::Annotation::Collection" at /home/marcl/src/bioperl/bioperl-live/Bio/SeqFeature/Annotated.pm line 547, line 52. Regards, Marc From grassi.e at virgilio.it Mon Jan 24 06:55:15 2005 From: grassi.e at virgilio.it (grassi.e@virgilio.it) Date: Mon Jan 24 06:51:20 2005 Subject: [Bioperl-l] Nearly OT question(s) - across databases Message-ID: <415382EC00123E04@ims5c.cp.tin.it> Hello everybody, first of all I'd like to apologize for my poor english and the not very "bioperlic" question. I've got a list of ests from the stanford database and I need to obtain their unigene cluster and possibly gene-id (the stanford database doesn't supply this informations for all the ests). My question is: is there a quick way to do this using bioperl? I'd prefer to download the databases that are needed rather than connecting to them remotely, because it would be too time-consuming. As long as I usually use plain perl I'm looking around the entrez gene databases to understand the better way to gain the data that I need; but I was wondering if using bioperl would help me. Thank you, Elena Grassi From sdavis2 at mail.nih.gov Mon Jan 24 08:57:02 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Mon Jan 24 08:53:07 2005 Subject: [Bioperl-l] Nearly OT question(s) - across databases References: <415382EC00123E04@ims5c.cp.tin.it> Message-ID: <001001c5021c$96ef4450$7d75f345@WATSON> Elena, By Stanford database, I assume you mean the Stanford SOURCE batch query web page? If that is so, then you have already used the data available via Entrez. SOURCE uses the unigene build from NCBI to map clones or genbank accessions to unigene and entrez gene. In other words, using the entrez database will not help you get more information. Unfortunately, it is not at all uncommon to have ESTs that do not map to a gene_id or unigene cluster, so those have to remain orphans. This sounds like a microarray-type project, and if it is, what I tend to do is to find the ESTs that are "interesting" for followup and that are not annotated via other means and blast those against transcript libraries like refseq and ensembl transcripts to find the "best" match. In some cases, this "best" match will not be very good, but in others it will be perfectly adequate to tell you what you are looking at. So, in short, there are not other databases at NCBI that are likely to be helpful and your best bet is to blast the remaining ESTs against refseq for your genome of interest. Sean ----- Original Message ----- From: To: "Bioperl (E-mail)" Sent: Monday, January 24, 2005 6:55 AM Subject: [Bioperl-l] Nearly OT question(s) - across databases > Hello everybody, > > first of all I'd like to apologize for my poor english and the not very > "bioperlic" question. > I've got a list of ests from the stanford database and I need to obtain > their unigene cluster and possibly gene-id (the stanford database doesn't > supply this informations for all the ests). > My question is: is there a quick way to do this using bioperl? > I'd prefer to download the databases that are needed rather than > connecting > to them remotely, because it would be too time-consuming. > As long as I usually use plain perl I'm looking around the entrez gene > databases > to understand the better way to gain the data that I need; but I was > wondering > if using bioperl would help me. > > Thank you, > Elena Grassi > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From jason.stajich at duke.edu Mon Jan 24 09:54:35 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Jan 24 09:50:41 2005 Subject: [Bioperl-l] 1.5.0 release In-Reply-To: <00d701c5021f$23f57de0$7d75f345@WATSON> References: <72D911D2-6DB3-11D9-9F52-000393C44276@duke.edu> <00d701c5021f$23f57de0$7d75f345@WATSON> Message-ID: I'll make this on-list to hopefully clear some of this up. Aside, I don't have a lot of time to debate the philosophy here. I really never intended to be release weenie for this release so if folks have a strong opinion about how it should be done, please also consider being the next release-master. 1.5.0 (and all the release-candidates 1.5.0-RC1 and 1.5.0-RC2) are made off the HEAD of the CVS tree. Remember HEAD is the default branch you get when you do a CVS checkout. You can type 'cvs log SOME-FILE' to see all the tags that have been applied to a file. The webcvs - http://cvs.open-bio.org/ - also shows this graphically if you prefer. So 1.5.0 will have all the changes made on the HEAD - this means all the changes since we branched for 1.4 sometime last year. If we release another developer release before doing 1.6 I would think we'd still name it 1.5.1 (scenario B below) just organize things, but it might have a separate tag. Scenario A is what we do with stable releases where we make a branch tag and then all releases on that branch are derived from the 1.4 timepoint. A) --------------------------HEAD----> \ 1.4 tag (branch) \ 1.4.1 tag B) --------------------------HEAD----> \ \ 1.5 1.5.1 WRT specifics of the next releases. The idea is that 1.5.0 goes out and people play with it, but we haven't christened it as a 'stable' release, meaning not all the functionality in releases after 1.5.0 will behave the same. While we pledge that when 1.6.0 is ready, it will be stable and around for a while, and IDEALLY not break any code you wrote against the 1.4.0 library. Some recent commits to the HEAD of bioperl negate this so we will probably have them backed out of the stable release. The developer release is just one step easier than CVS checking out so if you already run your code from a CVS checked out directory this isn't going to make your life better. If you are unable to run things in-house unless they have a on official "RELEASE" stamp or if you can't use CVS, then 1.5.0 is for you. If you want the latest functionality in order to use the latest GBrowse release, 1.5.0 will be for you. If you are happy with your current Gbrowse setup, don't run this on your production server until you have tested things. If you are very conservative, run an important pipeline of analysis in-house that relies on Bioperl, and it is currently working great - you DON'T need to update to 1.5.0. In fact, don't do it in a production environment, but give the developer release a spin in a, well, development environment. Most importantly we want people to try out the release and really use it like they would in production environment, then tell us what breaks. Doing this lets the 1.6.0 release really be what you hope and yearn for.... =) A policy. What I would like to do in the future when preparing for a stable release is start to branch early - say 1 to 2 months before - and require changes on the branch to be only bugfixing or Well-Thought Out, and in general no API changes that remove functionality, GainOfFunction (GOF) probably allowed. Also remember the even numbered releases are stable releases, odd numbered are developer releases and don't go out to CPAN. We don't make API changes on the branch so that 1.4.1 is completely compatible with 1.4.0. I care less about GOF on the branch as long as it doesn't break anything. Instructions and Mechanics of how to do all of this with CVS. See this page: http://bioperl.org/UserInfo/CVShelp.shtml How do you get things on the different branches. Let's see what branches are around: [jason@lugano core]$ cvs log README RCS file: /home/repository/bioperl/bioperl-live/README,v Working file: README head: 1.36 branch: locks: strict access list: symbolic names: bioperl-release-1-5-0-rc2: 1.36 bioperl-release-1-5-0-rc1: 1.36 branch-1-4: 1.34.0.2 bioperl-release-1-4-0: 1.34 bioperl-devel-1-3-04: 1.34 bioperl-devel-1-3-03: 1.34 bioperl-devel-1-3-02: 1.33 bioperl-devel-1-3-01: 1.33 bioperl-release-1-2-3: 1.30.2.4 bioperl-release-1-2-2: 1.30.2.3 bioperl-run-release-1-2-0: 1.32 bioperl-release-1-2-1: 1.30 bioperl-1-2-1-rc1: 1.30 branch-1-2-collection: 1.30.0.6 bioperl-release-1-2-0: 1.30 branch-1-2: 1.30.0.2 bioperl-devel-1-1-1: 1.27 bioperl-release-1-1-0: 1.23 bioperl-release-1-0-2: 1.20.2.7 bioperl-release-1-0-1: 1.20.2.7 bioperl-release-1-0-0: 1.20.2.6 bioperl-1-0-alpha2-rc: 1.20.2.1 We name branches with a 'branch-' prefix. The releases have the word 'release' in them. Hopefully that is clear! So if you check out from a branch it means you get the most up-to-date code from the branch (if there were additional commits after the point in time when you made this tag, you'll get them) while a release-tag gets code at particular finite point in time. Try to get the 1.4 release (with a CVS account). The cmd line options we are using after 'checkout' -r BRANCH-NAME OR TAG-NAME) -d DIRECTORY-NAME REPOSITORY-NAME % cvs -d:ext:YOURNAME@pub.open-bio.org:/home/repository/bioperl checkout -r branch-1-4 -d bioperl-1.4 bioperl-live If you want do this via anonymous CVS (no read-write access) % cvs -d:pserver:cvs@cvs.open-bio.org:/home/repository/bioperl checkout -r branch-1-4 -d branch-1.4 bioperl-live Now if you want to make a change on the branch you HAVE to make those changes in that directory we checked out: "bioperl-1.4" When you check them in you do the normal CVS commit. If you want to merge your changes back onto the main trunk after you've made changes on the branch (or vice-versa, flip-flop the directory names) 1. check in your changes on the branch 2. Go to the OTHER directory you have where the HEAD code is checked out (called 'bioperl-live' in this example) % cvs -d:ext:YOURNAME@pub.open-bio.org:/home/repository/bioperl checkout bioperl-live 3. do an update to merge the changes from the branch, let's merge changes in Bio/SeqIO/swiss.pm to the HEAD from the branch % cd bioperl-live % cvs update -j branch-1-4 Bio/SeqIO/swiss.pm RUN THE TESTS % perl -I. -w t/SeqIO.t .... all tests pass ... % cvs commit -m "merged changes from 1.4 branch regarding this-and-that" Bio/SeqIO/swiss.pm Done. Reverse the directory and branch names to merge from the HEAD to the BRANCH % cd bioperl-1.4 % cvs update -j HEAD Bio/SeqIO/swiss.pm % cvs commit -m "merged changes from HEAD to branch regarding this-and-that" Bio/SeqIO/swiss.pm Hope that helps some. -jason On Jan 24, 2005, at 9:15 AM, Sean Davis wrote: > Jason, > > I'm sorry to bother, but what is the current CVS tag system with > regard to bioperl? For example, if I do a CO on Monday of > bioperl-live, what do I get? I have always worked from bioperl CVS > code, so just wanted to make sure that the tags weren't going to > change and, if so, what is going to be what. I wasn't sure if this > should go to the list.... > > Thanks, > Sean > > ----- Original Message ----- From: "Jason Stajich" > > To: > Sent: Sunday, January 23, 2005 9:55 PM > Subject: [Bioperl-l] 1.5.0 release > > >> If there are some more outstanding commits before we roll 1.5.0 out, >> please let me know. Otherwise I'll tag and release the 1.5.0 tarball >> Monday. This is a developer's release so it does not have to be >> completely perfect. >> >> -jason >> -- >> Jason Stajich >> jason.stajich at duke.edu >> http://www.duke.edu/~jes12/ >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From glim at mycybernet.net Mon Jan 24 00:08:16 2005 From: glim at mycybernet.net (Gerard Lim) Date: Mon Jan 24 10:02:14 2005 Subject: [Bioperl-l] Yet Another Perl Conference North America 2005 announces call-for-papers Message-ID: <200501240008.16906.glim@mycybernet.net> YAPC::NA 2005 (Yet Another Perl Conference, North America) has just released its call-for-papers; potential and aspiring speakers can submit a presentation proposal via: http://yapc.org/America/cfp-2005.shtml The dates of the conference are Monday - Wednesday 27-29 June 2005. The location will be in downtown Toronto, Ontario, Canada. (Note that a different date block was previously announced, but has been moved to accomodate venue availability.) The close of the call-for-papers is April 18, 2005 at 11:59 pm. If you have any questions regarding the call-for-papers or speaking at YAPC::NA 2005 please email na-author@yapc.org We would love to hear from potential sponsors. Please contact the organizers at na-sponsor@yapc.org to learn about the benefits of sponsorship. Other information regarding the conference (e.g. venue, registration specifics) will be announced soon. We look forward to your submissions and a great conference! From letondal at pasteur.fr Mon Jan 24 10:28:27 2005 From: letondal at pasteur.fr (Catherine Letondal) Date: Mon Jan 24 10:21:39 2005 Subject: [Bioperl-l] protal2dna and Bio::SimpleAlign In-Reply-To: References: <9628A3F4-6CB9-11D9-8EF9-000A95E515DC@mit.edu> Message-ID: <98709246-6E1C-11D9-894E-000D93B0BD32@pasteur.fr> On Jan 23, 2005, at 3:19 PM, Jason Stajich wrote: > I'm not familiar with the script. Web: http://bioweb.pasteur.fr/seqanal/interfaces/protal2dna.html Man: http://bioweb.pasteur.fr/docs/man/man/protal2dna.1.html Ftp: ftp://ftp.pasteur.fr/pub/GenSoft/unix/alignment/protal2dna > > Bio::Align::Utilities does protein to DNA mapping for an alignment > with the aa_to_dna_aln function. The problem with this function aa_to_dna_aln is that is restricted to frame 1 and to the standard genetic code, right? aa_to_dna_aln Title : aa_to_dna_aln Usage : my $dnaaln = aa_to_dna_aln($aa_aln, \%seqs); Function: Will convert an AA alignment to DNA space given the corresponding DNA sequences. Note that this method expects the DNA sequences to be in frame +1 (GFF frame 0) as it will start to project into coordinates starting at the first base of the DNA sequence, if this alignment represents a different frame for the cDNA you will need to edit the DNA sequences to remove the 1st or 2nd bases (and revcom if things should be). Returns : Bio::Align::AlignI object Args : 2 arguments, the alignment and a hashref. Alignment is a Bio::Align::AlignI of amino acid sequences. The hash reference should have keys which are the display_ids for the aa sequences in the alignment and the values are a Bio::PrimarySeqI object for the corresponding spliced cDNA sequence. The other problem when using tools offering several genetic code (these sequences need a bacterial genetic code), is that the start codon of this code is not the right one. These sequences need: GTG=M (and not V). > > -jason > On Jan 22, 2005, at 4:07 PM, Maureen L Coleman wrote: > >> Hi. >> I'm trying to use the protal2dna script (downloaded from Pasteur >> site) to convert protein alignments back to DNA alignments. It works >> in some cases but not in others. In the cases where it doesn't work, >> it pulls out the same sequence twice instead of pulling out seq1 and >> seq2 from my protein alignment. Then when it tries to match it up >> with the corresponding DNA sequence, it doesn't work - it matches >> prot1 with dna1 (correctly) and prot1 with dna2 (incorrectly). >> >> I suspect this might be related to the name,start,end (nse) method in >> Bio::SimpleAlign. Any suggestions? >> >> Thanks, >> Maureen >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Mon Jan 24 10:41:44 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Jan 24 10:38:03 2005 Subject: [Bioperl-l] protal2dna and Bio::SimpleAlign In-Reply-To: <98709246-6E1C-11D9-894E-000D93B0BD32@pasteur.fr> References: <9628A3F4-6CB9-11D9-8EF9-000A95E515DC@mit.edu> <98709246-6E1C-11D9-894E-000D93B0BD32@pasteur.fr> Message-ID: <72EB0F5C-6E1E-11D9-ABB8-000393C44276@duke.edu> On Jan 24, 2005, at 10:28 AM, Catherine Letondal wrote: > > On Jan 23, 2005, at 3:19 PM, Jason Stajich wrote: > >> I'm not familiar with the script. > > Web: > http://bioweb.pasteur.fr/seqanal/interfaces/protal2dna.html > Man: > http://bioweb.pasteur.fr/docs/man/man/protal2dna.1.html > Ftp: > ftp://ftp.pasteur.fr/pub/GenSoft/unix/alignment/protal2dna > >> >> Bio::Align::Utilities does protein to DNA mapping for an alignment >> with the aa_to_dna_aln function. > > The problem with this function aa_to_dna_aln is that is restricted to > frame 1 and to the standard genetic code, right? > aa_to_dna_aln > This is an alignment mapper routine not an alignment routine itsself. So I think I was just being stupid and not looking at what protal2dna really was doing. You provide it the protein multiple sequence alignment alignment and the coding sequence which gave rise to it. It maps the gaps back in so you have a CDS alignment. Very basic iterating through the alignment. So it has to all be in-frame and already spliced, it should have been called aa_to_cds_aln. The method is intended for getting ready to do Ka/Ks type stuff so that you have aligned the sequences on codon boundaries and with knowledge about conservative aa replacements. apologies for inciting confusion... -j > Title : aa_to_dna_aln > Usage : my $dnaaln = aa_to_dna_aln($aa_aln, \%seqs); > Function: Will convert an AA alignment to DNA space given the > corresponding DNA sequences. Note that this method > expects > the DNA sequences to be in frame +1 (GFF frame 0) as > it will > start to project into coordinates starting at the > first base of > the DNA sequence, if this alignment represents a > different > frame for the cDNA you will need to edit the DNA > sequences > to remove the 1st or 2nd bases (and revcom if things > should be). > Returns : Bio::Align::AlignI object > Args : 2 arguments, the alignment and a hashref. > Alignment is a Bio::Align::AlignI of amino acid > sequences. > The hash reference should have keys which are > the display_ids for the aa > sequences in the alignment and the values are a > Bio::PrimarySeqI object for the corresponding > spliced cDNA sequence. > > > The other problem when using tools offering several genetic code > (these sequences need a bacterial genetic code), is that the start > codon of this code is not the right one. These sequences need: GTG=M > (and not V). > >> >> -jason >> On Jan 22, 2005, at 4:07 PM, Maureen L Coleman wrote: >> >>> Hi. >>> I'm trying to use the protal2dna script (downloaded from Pasteur >>> site) to convert protein alignments back to DNA alignments. It works >>> in some cases but not in others. In the cases where it doesn't >>> work, it pulls out the same sequence twice instead of pulling out >>> seq1 and seq2 from my protein alignment. Then when it tries to >>> match it up with the corresponding DNA sequence, it doesn't work - >>> it matches prot1 with dna1 (correctly) and prot1 with dna2 >>> (incorrectly). >>> >>> I suspect this might be related to the name,start,end (nse) method >>> in Bio::SimpleAlign. Any suggestions? >>> >>> Thanks, >>> Maureen >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> -- >> Jason Stajich >> jason.stajich at duke.edu >> http://www.duke.edu/~jes12/ >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From Marc.Logghe at devgen.com Mon Jan 24 10:46:44 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Mon Jan 24 10:43:00 2005 Subject: [Bioperl-l] protal2dna and Bio::SimpleAlign Message-ID: Guess, this is the bioperl implementation of EMBOSS tranalign ? http://www.rfcgr.mrc.ac.uk/Software/EMBOSS/Apps/tranalign.html ML > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of > Jason Stajich > Sent: Monday, January 24, 2005 4:42 PM > To: Catherine Letondal > Cc: bioperl-l@portal.open-bio.org; Maureen L Coleman > Subject: Re: [Bioperl-l] protal2dna and Bio::SimpleAlign > > > > On Jan 24, 2005, at 10:28 AM, Catherine Letondal wrote: > > > > > On Jan 23, 2005, at 3:19 PM, Jason Stajich wrote: > > > >> I'm not familiar with the script. > > > > Web: > > http://bioweb.pasteur.fr/seqanal/interfaces/protal2dna.html > > Man: > > http://bioweb.pasteur.fr/docs/man/man/protal2dna.1.html > > Ftp: > > ftp://ftp.pasteur.fr/pub/GenSoft/unix/alignment/protal2dna > > > >> > >> Bio::Align::Utilities does protein to DNA mapping for an alignment > >> with the aa_to_dna_aln function. > > > > The problem with this function aa_to_dna_aln is that is > restricted to > > frame 1 and to the standard genetic code, right? > > aa_to_dna_aln > > > This is an alignment mapper routine not an alignment routine > itsself. > So I think I was just being stupid and not looking at what protal2dna > really was doing. > > You provide it the protein multiple sequence alignment alignment and > the coding sequence which gave rise to it. It maps the gaps > back in so > you have a CDS alignment. Very basic iterating through the alignment. > > So it has to all be in-frame and already spliced, it should have been > called aa_to_cds_aln. > > The method is intended for getting ready to do Ka/Ks type > stuff so that > you have aligned the sequences on codon boundaries and with > knowledge > about conservative aa replacements. > > apologies for inciting confusion... > -j > > > Title : aa_to_dna_aln > > Usage : my $dnaaln = aa_to_dna_aln($aa_aln, \%seqs); > > Function: Will convert an AA alignment to DNA space > given the > > corresponding DNA sequences. Note that > this method > > expects > > the DNA sequences to be in frame +1 (GFF > frame 0) as > > it will > > start to project into coordinates starting at the > > first base of > > the DNA sequence, if this alignment represents a > > different > > frame for the cDNA you will need to edit the DNA > > sequences > > to remove the 1st or 2nd bases (and > revcom if things > > should be). > > Returns : Bio::Align::AlignI object > > Args : 2 arguments, the alignment and a hashref. > > Alignment is a Bio::Align::AlignI of amino acid > > sequences. > > The hash reference should have keys which are > > the display_ids for the aa > > sequences in the alignment and the values are a > > Bio::PrimarySeqI object for the corresponding > > spliced cDNA sequence. > > > > > > The other problem when using tools offering several genetic code > > (these sequences need a bacterial genetic code), is that the start > > codon of this code is not the right one. These sequences > need: GTG=M > > (and not V). > > > >> > >> -jason > >> On Jan 22, 2005, at 4:07 PM, Maureen L Coleman wrote: > >> > >>> Hi. > >>> I'm trying to use the protal2dna script (downloaded from Pasteur > >>> site) to convert protein alignments back to DNA > alignments. It works > >>> in some cases but not in others. In the cases where it doesn't > >>> work, it pulls out the same sequence twice instead of pulling out > >>> seq1 and seq2 from my protein alignment. Then when it tries to > >>> match it up with the corresponding DNA sequence, it > doesn't work - > >>> it matches prot1 with dna1 (correctly) and prot1 with dna2 > >>> (incorrectly). > >>> > >>> I suspect this might be related to the name,start,end > (nse) method > >>> in Bio::SimpleAlign. Any suggestions? > >>> > >>> Thanks, > >>> Maureen > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l@portal.open-bio.org > >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> -- > >> Jason Stajich > >> jason.stajich at duke.edu > >> http://www.duke.edu/~jes12/ > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l@portal.open-bio.org > >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From sdavis2 at mail.nih.gov Mon Jan 24 10:59:59 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Mon Jan 24 10:55:54 2005 Subject: [Bioperl-l] 1.5.0 release References: <72D911D2-6DB3-11D9-9F52-000393C44276@duke.edu> <00d701c5021f$23f57de0$7d75f345@WATSON> Message-ID: <001a01c5022d$c205c8b0$7d75f345@WATSON> Jason, As usual, thanks for the fantastic, extensive reply and the brief philosophy lesson. It was more than I imagined could have been said on the subject--it helps immensely to be reminded of some of these details and the roadmap. Sean >> Jason, >> >> I'm sorry to bother, but what is the current CVS tag system with regard >> to bioperl? For example, if I do a CO on Monday of bioperl-live, what do >> I get? I have always worked from bioperl CVS code, so just wanted to >> make sure that the tags weren't going to change and, if so, what is going >> to be what. I wasn't sure if this should go to the list.... > From colemanm at MIT.EDU Mon Jan 24 11:01:59 2005 From: colemanm at MIT.EDU (Maureen L Coleman) Date: Mon Jan 24 10:57:43 2005 Subject: [Bioperl-l] protal2dna and Bio::SimpleAlign In-Reply-To: <72EB0F5C-6E1E-11D9-ABB8-000393C44276@duke.edu> Message-ID: <4766FF3C-6E21-11D9-B1D1-000A95E515DC@mit.edu> Thanks for the responses. The problem (with both protal2dna and tranalign), as Catherine recognized, is that even when I specify Bacterial translation, it doesn't recognize my alternative start codons (gtg,ctg,ttg can all be Met). As the quickest route, I went through and changed all my alternative start codons in the alignments to their "normal" translation. Then protal2dna and tranalign seem to work fine. aa_to_dna_aln should work for me too, since I already have the coding DNA sequences pulled out. thanks again, maureen On Monday, January 24, 2005, at 10:41 AM, Jason Stajich wrote: > > On Jan 24, 2005, at 10:28 AM, Catherine Letondal wrote: > >> >> On Jan 23, 2005, at 3:19 PM, Jason Stajich wrote: >> >>> I'm not familiar with the script. >> >> Web: >> http://bioweb.pasteur.fr/seqanal/interfaces/protal2dna.html >> Man: >> http://bioweb.pasteur.fr/docs/man/man/protal2dna.1.html >> Ftp: >> ftp://ftp.pasteur.fr/pub/GenSoft/unix/alignment/protal2dna >> >>> >>> Bio::Align::Utilities does protein to DNA mapping for an alignment >>> with the aa_to_dna_aln function. >> >> The problem with this function aa_to_dna_aln is that is restricted >> to frame 1 and to the standard genetic code, right? >> aa_to_dna_aln >> > This is an alignment mapper routine not an alignment routine itsself. > So I think I was just being stupid and not looking at what protal2dna > really was doing. > > You provide it the protein multiple sequence alignment alignment and > the coding sequence which gave rise to it. It maps the gaps back in > so you have a CDS alignment. Very basic iterating through the > alignment. > > So it has to all be in-frame and already spliced, it should have been > called aa_to_cds_aln. > > The method is intended for getting ready to do Ka/Ks type stuff so > that you have aligned the sequences on codon boundaries and with > knowledge about conservative aa replacements. > > apologies for inciting confusion... > -j > >> Title : aa_to_dna_aln >> Usage : my $dnaaln = aa_to_dna_aln($aa_aln, \%seqs); >> Function: Will convert an AA alignment to DNA space given the >> corresponding DNA sequences. Note that this method >> expects >> the DNA sequences to be in frame +1 (GFF frame 0) >> as it will >> start to project into coordinates starting at the >> first base of >> the DNA sequence, if this alignment represents a >> different >> frame for the cDNA you will need to edit the DNA >> sequences >> to remove the 1st or 2nd bases (and revcom if >> things should be). >> Returns : Bio::Align::AlignI object >> Args : 2 arguments, the alignment and a hashref. >> Alignment is a Bio::Align::AlignI of amino acid >> sequences. >> The hash reference should have keys which are >> the display_ids for the aa >> sequences in the alignment and the values are a >> Bio::PrimarySeqI object for the corresponding >> spliced cDNA sequence. >> >> >> The other problem when using tools offering several genetic code >> (these sequences need a bacterial genetic code), is that the start >> codon of this code is not the right one. These sequences need: GTG=M >> (and not V). >> >>> >>> -jason >>> On Jan 22, 2005, at 4:07 PM, Maureen L Coleman wrote: >>> >>>> Hi. >>>> I'm trying to use the protal2dna script (downloaded from Pasteur >>>> site) to convert protein alignments back to DNA alignments. It >>>> works in some cases but not in others. In the cases where it >>>> doesn't work, it pulls out the same sequence twice instead of >>>> pulling out seq1 and seq2 from my protein alignment. Then when it >>>> tries to match it up with the corresponding DNA sequence, it >>>> doesn't work - it matches prot1 with dna1 (correctly) and prot1 >>>> with dna2 (incorrectly). >>>> >>>> I suspect this might be related to the name,start,end (nse) method >>>> in Bio::SimpleAlign. Any suggestions? >>>> >>>> Thanks, >>>> Maureen >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l@portal.open-bio.org >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> -- >>> Jason Stajich >>> jason.stajich at duke.edu >>> http://www.duke.edu/~jes12/ >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > From jason.stajich at duke.edu Mon Jan 24 11:08:50 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Jan 24 11:05:35 2005 Subject: [Bioperl-l] protal2dna and Bio::SimpleAlign In-Reply-To: <4766FF3C-6E21-11D9-B1D1-000A95E515DC@mit.edu> References: <4766FF3C-6E21-11D9-B1D1-000A95E515DC@mit.edu> Message-ID: <3C3349C2-6E22-11D9-ABB8-000393C44276@duke.edu> cool - I assume you know you can change the translation table used when you call the 'translate' function in bioperl. So if you start the whole thing from a set of CDS sequences, you shouldn't have to do much messing around. The aa_to_dna_aln doesn't do any fancy checking to insure that your codon actually can translate into the protein you specified. That might be a good sanity check to put in. Title : translate Usage : $protein_seq_obj = $dna_seq_obj->translate #if full CDS expected: $protein_seq_obj = $cds_seq_obj->translate(undef,undef,undef,undef,1); Function: Provides the translation of the DNA sequence using full IUPAC ambiguities in DNA/RNA and amino acid codes. The full CDS translation is identical to EMBL/TREMBL database translation. Note that the trailing terminator character is removed before returning the translation object. Note: if you set $dna_seq_obj->verbose(1) you will get a warning if the first codon is not a valid initiator. Added way of translating using a custom codon table. This has to be the final addition to this overloaded interface! Returns : A Bio::PrimarySeqI implementing object Args : character for terminator (optional) defaults to '*' character for unknown amino acid (optional) defaults to 'X' frame (optional) valid values 0, 1, 2, defaults to 0 codon table id (optional) defaults to 1 complete coding sequence expected, defaults to 0 (false) boolean, throw exception if not complete CDS (true) or defaults to warning (false) codontable, a custom Bio::Tools::CodonTable object, optional -jason On Jan 24, 2005, at 11:01 AM, Maureen L Coleman wrote: > Thanks for the responses. The problem (with both protal2dna and > tranalign), as Catherine recognized, is that even when I specify > Bacterial translation, it doesn't recognize my alternative start > codons (gtg,ctg,ttg can all be Met). > > As the quickest route, I went through and changed all my alternative > start codons in the alignments to their "normal" translation. Then > protal2dna and tranalign seem to work fine. aa_to_dna_aln should work > for me too, since I already have the coding DNA sequences pulled out. > > thanks again, > maureen > > On Monday, January 24, 2005, at 10:41 AM, Jason Stajich wrote: > >> >> On Jan 24, 2005, at 10:28 AM, Catherine Letondal wrote: >> >>> >>> On Jan 23, 2005, at 3:19 PM, Jason Stajich wrote: >>> >>>> I'm not familiar with the script. >>> >>> Web: >>> http://bioweb.pasteur.fr/seqanal/interfaces/protal2dna.html >>> Man: >>> http://bioweb.pasteur.fr/docs/man/man/protal2dna.1.html >>> Ftp: >>> ftp://ftp.pasteur.fr/pub/GenSoft/unix/alignment/protal2dna >>> >>>> >>>> Bio::Align::Utilities does protein to DNA mapping for an alignment >>>> with the aa_to_dna_aln function. >>> >>> The problem with this function aa_to_dna_aln is that is restricted >>> to frame 1 and to the standard genetic code, right? >>> aa_to_dna_aln >>> >> This is an alignment mapper routine not an alignment routine itsself. >> So I think I was just being stupid and not looking at what >> protal2dna really was doing. >> >> You provide it the protein multiple sequence alignment alignment and >> the coding sequence which gave rise to it. It maps the gaps back in >> so you have a CDS alignment. Very basic iterating through the >> alignment. >> >> So it has to all be in-frame and already spliced, it should have been >> called aa_to_cds_aln. >> >> The method is intended for getting ready to do Ka/Ks type stuff so >> that you have aligned the sequences on codon boundaries and with >> knowledge about conservative aa replacements. >> >> apologies for inciting confusion... >> -j >> >>> Title : aa_to_dna_aln >>> Usage : my $dnaaln = aa_to_dna_aln($aa_aln, \%seqs); >>> Function: Will convert an AA alignment to DNA space given the >>> corresponding DNA sequences. Note that this >>> method expects >>> the DNA sequences to be in frame +1 (GFF frame 0) >>> as it will >>> start to project into coordinates starting at the >>> first base of >>> the DNA sequence, if this alignment represents a >>> different >>> frame for the cDNA you will need to edit the DNA >>> sequences >>> to remove the 1st or 2nd bases (and revcom if >>> things should be). >>> Returns : Bio::Align::AlignI object >>> Args : 2 arguments, the alignment and a hashref. >>> Alignment is a Bio::Align::AlignI of amino acid >>> sequences. >>> The hash reference should have keys which are >>> the display_ids for the aa >>> sequences in the alignment and the values are a >>> Bio::PrimarySeqI object for the corresponding >>> spliced cDNA sequence. >>> >>> >>> The other problem when using tools offering several genetic code >>> (these sequences need a bacterial genetic code), is that the start >>> codon of this code is not the right one. These sequences need: GTG=M >>> (and not V). >>> >>>> >>>> -jason >>>> On Jan 22, 2005, at 4:07 PM, Maureen L Coleman wrote: >>>> >>>>> Hi. >>>>> I'm trying to use the protal2dna script (downloaded from Pasteur >>>>> site) to convert protein alignments back to DNA alignments. It >>>>> works in some cases but not in others. In the cases where it >>>>> doesn't work, it pulls out the same sequence twice instead of >>>>> pulling out seq1 and seq2 from my protein alignment. Then when it >>>>> tries to match it up with the corresponding DNA sequence, it >>>>> doesn't work - it matches prot1 with dna1 (correctly) and prot1 >>>>> with dna2 (incorrectly). >>>>> >>>>> I suspect this might be related to the name,start,end (nse) method >>>>> in Bio::SimpleAlign. Any suggestions? >>>>> >>>>> Thanks, >>>>> Maureen >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l@portal.open-bio.org >>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>> -- >>>> Jason Stajich >>>> jason.stajich at duke.edu >>>> http://www.duke.edu/~jes12/ >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l@portal.open-bio.org >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> -- >> Jason Stajich >> jason.stajich at duke.edu >> http://www.duke.edu/~jes12/ >> > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From letondal at pasteur.fr Mon Jan 24 11:13:00 2005 From: letondal at pasteur.fr (Catherine Letondal) Date: Mon Jan 24 11:05:56 2005 Subject: [Bioperl-l] protal2dna and Bio::SimpleAlign In-Reply-To: References: Message-ID: On Jan 24, 2005, at 4:46 PM, Marc Logghe wrote: > Guess, this is the bioperl implementation of EMBOSS tranalign ? > http://www.rfcgr.mrc.ac.uk/Software/EMBOSS/Apps/tranalign.html This old script is indeed very similar to tranalign, except that it offers some quite useful features: - you can specifiy a different genetic code for each DNA sequence (-G option) - you can ask for a mapping of prot/dna sequences by their names instead of their position in the file (-i option) What is now missing is a feature to specify alternate start codons. BTW, I forgot to mention that the script uses the bioperl translate method, to which the code is being passed: my $trans = $dna->translate(undef, undef, $frame, $code); and of course, $dna is a bioperl sequence loaded with the standard Seqio methods: $in_dna_seqs = Bio::SeqIO->newFh (-file => $dna_file, -format => $dna_file_format);http://javascript.internet.com/foldertree/ > > ML > >> -----Original Message----- >> From: bioperl-l-bounces@portal.open-bio.org >> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of >> Jason Stajich >> Sent: Monday, January 24, 2005 4:42 PM >> To: Catherine Letondal >> Cc: bioperl-l@portal.open-bio.org; Maureen L Coleman >> Subject: Re: [Bioperl-l] protal2dna and Bio::SimpleAlign >> >> >> >> On Jan 24, 2005, at 10:28 AM, Catherine Letondal wrote: >> >>> >>> On Jan 23, 2005, at 3:19 PM, Jason Stajich wrote: >>> >>>> I'm not familiar with the script. >>> >>> Web: >>> http://bioweb.pasteur.fr/seqanal/interfaces/protal2dna.html >>> Man: >>> http://bioweb.pasteur.fr/docs/man/man/protal2dna.1.html >>> Ftp: >>> ftp://ftp.pasteur.fr/pub/GenSoft/unix/alignment/protal2dna >>> >>>> >>>> Bio::Align::Utilities does protein to DNA mapping for an alignment >>>> with the aa_to_dna_aln function. >>> >>> The problem with this function aa_to_dna_aln is that is >> restricted to >>> frame 1 and to the standard genetic code, right? >>> aa_to_dna_aln >>> >> This is an alignment mapper routine not an alignment routine >> itsself. >> So I think I was just being stupid and not looking at what protal2dna >> really was doing. >> >> You provide it the protein multiple sequence alignment alignment and >> the coding sequence which gave rise to it. It maps the gaps >> back in so >> you have a CDS alignment. Very basic iterating through the alignment. >> >> So it has to all be in-frame and already spliced, it should have been >> called aa_to_cds_aln. >> >> The method is intended for getting ready to do Ka/Ks type >> stuff so that >> you have aligned the sequences on codon boundaries and with >> knowledge >> about conservative aa replacements. >> >> apologies for inciting confusion... >> -j >> >>> Title : aa_to_dna_aln >>> Usage : my $dnaaln = aa_to_dna_aln($aa_aln, \%seqs); >>> Function: Will convert an AA alignment to DNA space >> given the >>> corresponding DNA sequences. Note that >> this method >>> expects >>> the DNA sequences to be in frame +1 (GFF >> frame 0) as >>> it will >>> start to project into coordinates starting at the >>> first base of >>> the DNA sequence, if this alignment represents a >>> different >>> frame for the cDNA you will need to edit the DNA >>> sequences >>> to remove the 1st or 2nd bases (and >> revcom if things >>> should be). >>> Returns : Bio::Align::AlignI object >>> Args : 2 arguments, the alignment and a hashref. >>> Alignment is a Bio::Align::AlignI of amino acid >>> sequences. >>> The hash reference should have keys which are >>> the display_ids for the aa >>> sequences in the alignment and the values are a >>> Bio::PrimarySeqI object for the corresponding >>> spliced cDNA sequence. >>> >>> >>> The other problem when using tools offering several genetic code >>> (these sequences need a bacterial genetic code), is that the start >>> codon of this code is not the right one. These sequences >> need: GTG=M >>> (and not V). >>> >>>> >>>> -jason >>>> On Jan 22, 2005, at 4:07 PM, Maureen L Coleman wrote: >>>> >>>>> Hi. >>>>> I'm trying to use the protal2dna script (downloaded from Pasteur >>>>> site) to convert protein alignments back to DNA >> alignments. It works >>>>> in some cases but not in others. In the cases where it doesn't >>>>> work, it pulls out the same sequence twice instead of pulling out >>>>> seq1 and seq2 from my protein alignment. Then when it tries to >>>>> match it up with the corresponding DNA sequence, it >> doesn't work - >>>>> it matches prot1 with dna1 (correctly) and prot1 with dna2 >>>>> (incorrectly). >>>>> >>>>> I suspect this might be related to the name,start,end >> (nse) method >>>>> in Bio::SimpleAlign. Any suggestions? >>>>> >>>>> Thanks, >>>>> Maureen >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l@portal.open-bio.org >>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>> -- >>>> Jason Stajich >>>> jason.stajich at duke.edu >>>> http://www.duke.edu/~jes12/ >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l@portal.open-bio.org >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> -- >> Jason Stajich >> jason.stajich at duke.edu >> http://www.duke.edu/~jes12/ >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> From Marc.Logghe at devgen.com Mon Jan 24 11:14:03 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Mon Jan 24 11:10:14 2005 Subject: [Bioperl-l] protal2dna and Bio::SimpleAlign Message-ID: > yep - except you aren't required to have sequences in the > same order - > but require the sequence names to be the same in both (or you do the > mapping of names up-front in the hash you give to the > routine). Hope people don't mind going a little off topic here ;-) The order used to be a problem because most multiple alignment applications, like clustalw, don't preserve the order of the aligned sequences. However, this is possible now by the more recent version of clustalw where you can pass the option -outorder=input. Peter Rice learned me how to cheat emboss' emma: setenv EMBOSS_CLUSTALW "clustalw -outorder=input" Marc From Marc.Logghe at devgen.com Mon Jan 24 11:20:07 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Mon Jan 24 11:19:47 2005 Subject: [Bioperl-l] protal2dna and Bio::SimpleAlign Message-ID: > This old script is indeed very similar to tranalign, except that it > offers some quite useful features: > - you can specifiy a different genetic code for each DNA You can also do that in tranalign with the -table option. BTW, table 1 is standard but with alternative initiation codons. Maureen, does it help when you use that option ? HTH, Marc From gyang at plantbio.uga.edu Mon Jan 24 14:02:33 2005 From: gyang at plantbio.uga.edu (Guojun Yang) Date: Mon Jan 24 13:58:47 2005 Subject: [Bioperl-l] help on large sequence with Bio::Index::Fasta! In-Reply-To: <5AFC029A-6CF6-11D9-A47D-000A959E1622@salmonella.org> Message-ID: <20050124140233.49ac48b6@dogwood.plantbio.uga.edu> Hi, everybody, I got another difficult situation: I am running a local blast and sequence retrieval. The following sub works OK for one of my local DB1, but not for my local DB2. DB1 contains sequences of PACs and BACs (I believe the average size is ~100 or 200 kb), but DB2 contains entries of contigs as large as 30Mb. The error says the $seq object is undefined! I believe the problem is the size of the large entries in DB2. Can we use LargeSeq when we do retrieval? Can anybody help me on how we can use it with Bio::Index::Fasta?. Thank you for your comments in advance! Yang sub getseq { my $id=$_[0]; my $file_name = $_[1]; my $inx=Bio::Index::Fasta->new (-filename => $file_name.".idx", -write_flag => 1); $inx->id_parser(\&get_id); $inx->make_index($file_name); $seq = $inx->fetch($id); return $seq; } From jdw at ou.edu Mon Jan 24 14:48:40 2005 From: jdw at ou.edu (James D. White) Date: Mon Jan 24 14:44:41 2005 Subject: what about the speed on longer seq? Re: [Bioperl-l] regular expression help! In-Reply-To: <20050121223155.bd16abb4@dogwood.plantbio.uga.edu> References: <20050121223155.bd16abb4@dogwood.plantbio.uga.edu> Message-ID: <41F55118.4010800@ou.edu> Your original regex would have required O(n**4) attempts to match, because the unanchored starting position, "\S+", "(\S+)", and ".*" each involve O(n) possibilities. The unanchored starting position and each occurrence of ".*", "\S+", or a repeat range with no upper limit (e.g., "\S{4,}") can multiply by O(n) possible matches to be tested. My regex is O(n**3) for the unanchored starting position, "(\S+)", and ".*". Adding the minimum 4 bases for the first repeat, my regex becomes: =~ m:(\S{4,})(\S{10}).*?(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/; reverse($rev);})\1:i; Your new regex is also O(n**4) for the unanchored starting position, "(\S{4,})", "(\S{10,})", and ".+", but the new regex does recognize longer inverted repeats. The initial uncaptured \S+ is no longer there, but "(\S{10,})" which matches longer inverted repeats has replaced it in bringing the number of matches to be tested back up. I might change ".*" to ".*?" to reduce the need for backtracking, resulting in: =~ /(\S{4,})(\S{10,}).+?(??{sub($2)})\1/i; but this is still O(n**4). If my version is modified to find longer inverted repeats by using (\S{10,}), then it would also become O(n**4), but, I suspect that not calling the sub() avoids some extra overhead not present in your version, but I have not tried to examine any generated code nor run any tests to be sure. Knowledge of upper limits for each repeat length can greatly reduce the number of unproductive matches by changing "*" to "{0,max}", "+" to "{1,max}", and "{min,}" to "{min,max}". But I do not know if any reasonable limits are known for your data. In order to bring the order back down to O(n**3) and still find the longer inverted repeats, let's break the problem up into finding the original 10 base inverted repeat and then extending it. Using \1 and \2 to represent the repeated substrings and revcomp() as the reverse complement of its argument, the matched sequence is "\1\2.*revcomp(\2)\1". If the full \2 is longer than the minimum 10 bases, let's call the first 10 bases \2a and the rest \2b. The matched sequence is now "\1\2a\2b.*revcomp(\2b)revcomp(\2a)\1". Now searching for only a 10 base inverted repeat simplifies to "\1\2a.*revcomp(\2a)\1", which is an O(n**3) operation using my regex. Extending the inverted repeat is O(n), but the combined process is not O(n**3)*O(n), but O(n**3)+O(n) which is still O(n**3). Unfortunately the same process does not work for \1, because "\1a\1b\2.*revcomp(\2)\1a\1b" becomes either "\1a.*\2.*revcomp(\2)\1a" or "\1b\2.*revcomp(\2).*\1b", which is still O(n**4) in either case. So the best I can come up with is (not tested): # add parens to capture .*? $string =~ m:(\S{4,})(\S{10})(.*?)(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/; reverse($rev);})\1:i; # Save repeats and middle. You can also look at @- and @+ # to find the positions of the matching substrings. $direct = $1; $inverted = $2; $middle = $3; # find the length of the extension for the inverted repeat $low = 0; $high = length($middle) - 1; while ($low < $high) { $rc = lc(substr($middle, $high, 1)); $rc =~ tr/atcg/tagc/; last if lc(substr($middle, $low, 1)) ne $rc; $low++; $high--; } # extend the repeat, if necessary if ($low) { $inverted .= substr($middle, 0, $low, ''); substr($middle, -$low) = ''; } I hope this is helpful. If the process is still too slow, then you can extend these ideas by finding fixed length direct and inverted repeats separately using O(n**2) regexes: =~ m/(\S{4})(.*?)\1/i; # to find direct repeats =~ m:(\S{10})(.*?)(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/; reverse($rev);}):i; # to find inverted repeats Then extend them using techniques similar to the above. Once you have the separate direct and inverted repeat locations, then you have to match neighboring repeats to find what you want. This technique is more general and allows you, for example, to find sequences which have a few mismatched bases between the repeat pairs without exploding the complexity of the search. There is however some increase in programming effort. Good luck, Jim White Guojun Yang wrote: > Thank you James for your detailed info. An earlier solution given is to use > =~ /(\S{4,})(\S{10,}).+(??{sub($2)})\1/i; the sub is to do the transliteration and reversion of $2. It works greatly on ~80 bp seq. However, on a seq ~500 bp, it takes forever to do. Is there any similarity in processing time for the regex? I will definitely try it. > Have a great one, > Yang > ----- Original Message ----- > From: James D. White > To: bioperl-l@portal.open-bio.org > Sent: Fri, 21 Jan 2005 11:54:37 -0500 > Subject: Re: [Bioperl-l] regular expression help! > > > >>Sorry about double posting, but I forgot to change the subject before >>sending the first message. >> >> >>>Starting with: >>> >>>$regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})\1.*/i; >>> >>>The slashes in tr/// confused the Perl parser. You need to use >>>different delimiters for the m// operator (the m is implied by //) >>>and the tr/// operator. Also the tr/// operator does not use the >>>i flag, so lower case needs to be handled explicitly. So let's >>>try the following: >>> >>>$regex =~ m:\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ >> >>tr/ATCGatcg/TAGCtagc/);})\1.*:i; >> >>>This gives the error: >>>Can't modify constant item in transliteration (tr///) at (re_eval 1) >>>line 1, near "tr/ATCGatcg/TAGCtagc/)" >>> >>>Inside the (??{ CODE }) sequence, use $1, $2, ..., instead of >>>\1, \2, ... (See Programming Perl, 3rd Edition, "Match-time pattern >>>interpolation", p. 213) Inside the evaluated CODE, \2 is a >>>constant, not the value of the second captured substring. Also I'm >>>not sure what modifying $2 would do, so let's try: >>> >>>$regex =~ m:\S+(\S+)(\S{10}).*(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/; >> >>reverse($rev);})\1.*:i; >> >>>This works, but I would get rid of the leading "\S+" and trailing >>>".*". The ".*" adds nothing useful, so just drop it. You >>>probably don't need the leading "\S+", because the pattern is not >>>anchored to the beginning of the string with "^". The leading >>>"\S+" gobbles up the entire string, forcing the match to backtrack >>>character by character from the end. It also forces the substring >>>match saved in $1 to occur after the first character. Unless you >>>never want $1 to consider the first character, just drop the >>>leading "\S+". If you don't want to search the first character, >>>then just use "\S". This results in: >>> >>>$regex =~ m:(\S+)(\S{10}).*(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/; >> >>reverse($rev);})\1:i; >> >>>Finally I would probably change the remaining ".*" to ".*?". If >>>you search with ".*" on a long sequence which could contain >>>multiple sequences of interest, the ".*" pattern will match the rest >>>of the sequence and force backtracking to match the first occurrence >>>of "$1$2" with the last occurrence of "revcomp($2)$1". If you use >>>".*?", you match the first occurrence of "$1$2" with the nearest >>>occurrence of "revcomp($2)$1". This results in the final regular >>>expression: >>> >>>$regex =~ m:(\S+)(\S{10}).*?(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/; >> >>reverse($rev);})\1:i; >> >>>>Date: Fri, 14 Jan 2005 14:12:46 -0500 >>>>From: Guojun Yang >>>>Subject: [Bioperl-l] regular expression help! >>>>To: bioperl-l@portal.open-bio.org >>>>Message-ID: <20050114141246.94c7cb46@dogwood.plantbio.uga.edu> >>>>Content-Type: text/plain; charset="us-ascii" >>>> >>>>Hi, Everybody, >>>>I was trying to use a regex recognizing a patter of inverted repeat DNA seq >> >>flanked by direct repeats (see below), it returns errors saying "(?{...}) not >>terminated or {...} not balanced. Can anybody help me sorting this out? >> >>>>The regex I have is: >>>>$regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ >> >>tr/ATCG/TAGC/i);})\1.*/i; >> >>>>Thank you, >>>>Yang >>>> >>> >>>-- >>>James D. White (jdw@ou.edu) >>>Director of Bioinformatics >>>Department of Chemistry and Biochemistry/ACGT >>>University of Oklahoma >>>101 David L. Boren Blvd., SRTC 2100 >>>Norman, OK 73019 >>>Phone: (405) 325-4912, FAX: (405) 325-7762 >> >>-- >>James D. White (jdw@ou.edu) >>Director of Bioinformatics >>Department of Chemistry and Biochemistry/ACGT >>University of Oklahoma >>101 David L. Boren Blvd., SRTC 2100 >>Norman, OK 73019 >>Phone: (405) 325-4912, FAX: (405) 325-7762 >> >> >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l@portal.open-bio.org >>http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > > From barry.moore at genetics.utah.edu Mon Jan 24 17:18:39 2005 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Mon Jan 24 17:15:15 2005 Subject: [Fwd: Re: [Bioperl-l] bioperl-run for windows?] Message-ID: <41F5743F.10201@genetics.utah.edu> Some will work, and some won't. I've installed it on Windows, and used it a bit there. One problem you'll run into is that you can't use bioperl to run a program that can't be installed on Windows (EMBOSS for example) so you'll be limited that way, but check out the Pise interface for any of that software. You should be able to get access to alot of non-windows software via bioperl by using the Pise interface ( Bio::Tools::Run::AnalysisFactory::Pise http://www.pasteur.fr/recherche/unites/sis/Pise/ Barry Tim Alcon wrote: > If I just grab it off CPAN, will it work on Windows, or does it use > Unix system calls? > > Tim > > > > Jason Stajich wrote: > >> Is there a PPM on the bioperl site? >> No >> >> Can you install bioperl-run on windows? >> Yes - but you'll have to do it manually, or learn how to build PPMs >> (quite simple really), or encourage someone to produce a PPM for >> bioperl-run. >> >> -jason >> On Jan 22, 2005, at 7:57 PM, Tim Alcon wrote: >> >>> Does a Windows version of bioperl-run exitst? If so, how do I get it? >>> >>> Tim >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> -- >> Jason Stajich >> jason.stajich at duke.edu >> http://www.duke.edu/~jes12/ >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT From allenday at ucla.edu Mon Jan 24 18:54:42 2005 From: allenday at ucla.edu (Allen Day) Date: Mon Jan 24 18:50:38 2005 Subject: [Bioperl-l] struggling with Bio::FeatureIO and Bio::SeqFeature::Annotated In-Reply-To: References: Message-ID: Marc, The problem was that Bio::SeqIO::FTHelper was making calls assuming it had a Bio::SeqFeature::Generic instance. I've updated it to make calls compliant with the Bio::SeqFeatureI interface, and the script below now at least runs using "option 1". "option 2" will not work, at least for now, because Bio::DB::GenBank is creating a SeqIO that holds Bio::SeqFeature::Generic objects, and these difficult to deal with because the internal data structures are different than a Bio::SeqFeature::Annotated. I like the technique used below to bridge to Bio::FeatureIO via a Bio::Tools::GFF intermediary -- very clever. You'll also notice that the GenBank-formatted file output by the script doesn't look quite right, the FEATURES section looks kind of like: FEATURES Location/Qualifiers Bio::Annotation::OntologyTerm=HASH(0xa3d93f8)1..20975 /source="Bio::Annotation::SimpleValue=HASH(0x9bcdbe0)" /mol_type="Bio::Annotation::SimpleValue=HASH(0xa3dab1c)" /seq_id="Bio::Annotation::SimpleValue=HASH(0xa214de0)" /score="Bio::Annotation::SimpleValue=HASH(0xa3d92cc)" /frame="Bio::Annotation::SimpleValue=HASH(0xa439b04)" /chad="Bio::Annotation::Comment=HASH(0xa3da9b4)" /note="score=Bio::Annotation::SimpleValue=HASH(0xa3d92cc)" /note="frame=Bio::Annotation::SimpleValue=HASH(0xa439b04)" /db_xref="Bio::Annotation::SimpleValue=HASH(0xa3daaf8)" /clone="Bio::Annotation::SimpleValue=HASH(0xa3dab28)" /strain="Bio::Annotation::SimpleValue=HASH(0xa3dabb8)" /phase="Bio::Annotation::SimpleValue=HASH(0xa3d935c)" /chromosome="Bio::Annotation::SimpleValue=HASH(0xa3dac00)" /type="Bio::Annotation::OntologyTerm=HASH(0xa3d93f8)" /organism="Bio::Annotation::SimpleValue=HASH(0xa3dac48)" because Bio::SeqFeautre::Annotated holds annotations as objects pointers rather than strings. We can fix this with a stringification overload, but I noticed that the code exists to do this in the Bio::Annotation::* classes but is commented out, and I'm not sure why. Maybe Hilmar can shed some light on this. -Allen On Mon, 24 Jan 2005, Marc Logghe wrote: > Hi all, > I have some problems with Bio::FeatureIO and Bio::SeqFeature::Annotated. But maybe these modules are not designed for the things I had in mind. > My initial goal seemed pretty straightforward. It turned out differently. > I have a gff file containing features of bunch of bioentries sitting in BioSQL. > I wanted to turn the gff into feature objects, add them to the bioentries, and save them back into the database. > As a test I fetch a genbank record, strip the features and convert them to gff. The gff is again converted to features and added to the stripped seq object. > The test script looks like this: > ======================================================== > #!/usr/bin/perl > use strict; > use Bio::SeqIO; > use Bio::Tools::GFF; > use Bio::FeatureIO; > use IO::String; > use Bio::DB::GenBank; > > use Data::Dumper; > > *Bio::SeqFeature::Annotated::all_tags = \*Bio::SeqFeature::Annotated::get_all_tags; > > my $gff; > my $gffio = IO::String->new($gff); > > my $db = Bio::DB::GenBank->new; > my $sout = Bio::SeqIO->new(-fh => \*STDOUT, -format => 'genbank'); > my $seq = $db->get_Seq_by_acc('Z50755'); > > my @feat = $seq->remove_SeqFeatures; > > # writing option 1 > my $fout = Bio::Tools::GFF->new(-fh => $gffio, -gff_version => 3); > # writing option 2 > my $fout = Bio::FeatureIO->new(-fh => $gffio, -format => 'gff', -version => 3); > > $fout->write_feature(@feat); > > $gffio = IO::String->new($gff); > > my $fin = Bio::FeatureIO->new(-fh => $gffio, -format => 'gff', -version => 3); > > while (my $feat = $fin->next_feature) > { > $seq->add_SeqFeature($feat); > } > print Data::Dumper->Dump([$seq],['seq']); > > $sout->write_seq($seq); > ======================================================== > > First, I had an issue when writing the features to gff using Bio::FeatureIO (writing option 2): > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: only Bio::SeqFeature::Annotated objects are writeable > STACK: Error::throw > STACK: Bio::Root::Root::throw /home/marcl/src/bioperl/bioperl-live/Bio/Root/Root.pm:328 > STACK: Bio::FeatureIO::gff::write_feature /home/marcl/src/bioperl/bioperl-live/Bio/FeatureIO/gff.pm:259 > STACK: ./test.pl:25 > ----------------------------------------------------------- > > Therefore, I used Bio::Tools::GFF to write (writing option 1). But then, I run into troubles when it comes to dumping the sequence into genbank format: > Can't locate object method "all_tags" via package "Bio::SeqFeature::Annotated" at /home/marcl/src/bioperl/bioperl-live/Bio/SeqIO/FTHelper.pm line 212, line 52. > > I tried to fix this by adding the line > *Bio::SeqFeature::Annotated::all_tags = \*Bio::SeqFeature::Annotated::get_all_tags; > > But in vain: > Can't locate object method "get_all_tags" via package "Bio::Annotation::Collection" at /home/marcl/src/bioperl/bioperl-live/Bio/SeqFeature/Annotated.pm line 547, line 52. > > Regards, > Marc > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From jason.stajich at duke.edu Mon Jan 24 21:36:39 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Jan 24 21:33:36 2005 Subject: [Bioperl-l] bioperl-1.5.0 released Message-ID: Bioperl 1.5.0 Developer's release is available for download. =============================================== http://bioperl.org/DIST/bioperl-1.5.0.tar.bz2 425ac55ecbb4339b7b532ba6d429bb40 http://bioperl.org/DIST/bioperl-1.5.0.tar.gz 172472f0675de9a583432e21c9b1b5fc http://bioperl.org/DIST/bioperl-1.5.0.zip 3febcd2445a7393c65981a6f9f13a9ed We'll update the website to reflect this new release. The odd-numbered releases are called developer releases and are not deposited on CPAN. Please note that the API in 1.5.0 may change before the 1.6.0 release. which will be consider a stable API. We may do another developer release before 1.6.0 goes out. Lots of people have contributed to this release, I apologize for not naming them all. I'll try to cover some: thanks to Aaron Mackey for getting this release started, Brian Osborne for extensive documentation improvements, Nathan Haigh for volunteering to make a PPM of the release and Barry Moore and Nathan answering many of the windows related questions, Allen Day & Scott Cain & Steffen Grossmann for the work on FeatureIO, GFF3, and SeqFeature::Annotated, Chris Mungall for the work with Unflattener to merge GenBank annotations into GFF3 objects. Please see the AUTHORS file for a complete list of contributors. Jason Stajich on behalf of the Bioperl developers. Here is the info from the Changes file. 1.5 Developer release o Bio::Align::DNAStatistics and Bio::Align::ProteinStatistics provide Jukes-Cantor and Kimura pairwise distance methods, respectively. o Bio::AlignIO support for "po" format of POA, and "maf"; Bio::AlignIO::largemultifasta is a new alternative to Bio::AlignIO::fasta for temporary file-based manipulation of particularly large multiple sequence alignments. o Bio::Assembly::Singlet allows orphan, unassembled sequences to be treated similarly as an assembled contig. o Bio::CodonUsage provides new rare_codon() and probable_codons() methods for identifying particular codons that encode a given amino acid. o Bio::Coordinate::Utils provides new from_align() method to build a Bio::Coordinate pair directly from a Bio::Align::AlignI-conforming object. o Bio::DB::Biblio::eutils is a class for querying NCBI's Eutils. Send a Pubmed, Pubmed Central, Entrez, or other query to NCBI's web service using standard Pubmed query syntax, and retrieve results as XML. o Bio::DB::GFF has various sundry bug fixes. o Bio::FeatureIO is a new SeqIO-style subsystem for writing/reading genomic features to/from files. I/O classes exist for BED, GTF (aka GFF v2.5), and GFF v3. Bio::FeatureIO classes only read/write Bio::SeqFeature::Annotated objects. Notably, the GFF v3 class requires features to be typed into the Sequence Ontology. o Bio::Graph namespace contains new modules for manipulation and analysis of protein interaction graphs. o Bio::Graphics has many bug fixes and shiny new glyphs. o Bio::Index::Hmmer and Bio::Index::Qual provide multiple-file indexing for HMMER reports and FASTA qual files, respectively. o Bio::Map::Clone, Bio::Map::Contig, and Bio::Map::FPCMarker are new objects that can be placed within a Bio::Map::MapI-compliant genetic/physical map; Bio::Map::Physical provides a new physical map type; Bio::MapIO::fpc provides finger-printed clone mapping import. o Bio::Matrix::PSM provide new support for postion-specific (scoring) matrices (e.g. profiles, or "possums"). o Bio::Ontology::Ontology and Bio::Ontology::Term objects can now be instantiated without explicitly using Bio::OntologyIO. This is possible through changes to Bio::Ontology::OntologyStore to download ontology files from the web as necessary. Locations of ontology files are hard-coded into Bio::Ontology::DocumentRegistry. o Bio::PopGen includes many new methods and data types for population genetics analyses. o New constructor to Bio::Range, unions(). Given a list of ranges, returns another list of "flattened" ranges -- overlapping ranges are merged into a single range with the mininum and maximum coordinates of the entire overlapping group. o Bio::Root::IO now supports -url, in addition to -file and -fh. The new -url argument allows one to specify the network address of a file for input. -url currently only works for GET requests, and thus is read-only. o Bio::SearchIO::hmmer now returns individual Hit objects for each domain alignment (thus containing only one HSP); previously separate alignments would be merged into one hit if the domain involved in the alignments was the same, but this only worked when the repeated domain occured without interruption by any other domain, leading to a confusing mixture of Hit and HSP objects. o Bio::Search::Result::ResultI-compliant report objects now implement the "get_statistics" method to access Bio::Search::StatisticsI objects that encapsulate any statistical parameters associated with the search (e.g. Karlin's lambda for BLAST/FASTA). o Bio::Seq::LargeLocatableSeq combines the functionality already found in Bio::Seq::LargeSeq and Bio::LocatableSeq. o Bio::SeqFeature::Annotated is a replacement for Bio::SeqFeature::Generic. It breaks compliance with the Bio::SeqFeatureI interface because the author was sick of dealing with untyped annotation tags. All Bio::SeqFeature::Annotated annotations are Bio::AnnotationI compliant, and accessible through Bio::Annotation::Collection. o Bio::SeqFeature::Primer implements a Tm() method for primer melting point predictions. o Bio::SeqIO now supports AGAVE, BSML (via SAX), CHAOS-XML, InterProScan-XML, TIGR-XML, and NCBI TinySeq formats. o Bio::Taxonomy::Node now implements the methods necessary for Bio::Species interoperability. o Bio::Tools::CodonTable has new reverse_translate_all() and make_iupac_string() methods. o Bio::Tools::dpAlign now provides sequence profile alignments. o Bio::Tools::GFF now parses GFF version 2.5 (a.k.a. GTF). o Bio::Tools::Fgenesh, Bio::Tools::tRNAscanSE are new report parsers. o Bio::Tools::SiRNA includes two new rulesets (Saigo and Tuschl) for designing small inhibitory RNA. o Bio::Tree::DistanceFactory provides NJ and UPGMA tree-building methods based on a distance matrix. o Bio::Tree::Statistics provides an assess_bootstrap() method to calculate bootstrap support values on a guide tree topology, based on provided bootstrap tree topologies. o Bio::TreeIO now supports the Pagel (PAG) tree format. -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 186 bytes Desc: This is a digitally signed message part Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050124/502b810f/PGP.bin From Marc.Logghe at devgen.com Tue Jan 25 04:17:29 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Tue Jan 25 04:15:24 2005 Subject: [Bioperl-l] struggling with Bio::FeatureIO and Bio::SeqFeature::Annotated Message-ID: Hi Allen, Thanks for the fixes ! Like you suggested, I got the tag values when using stringification overload, so that is solved (I don't want to commit that myself though, seems too tricky to me ;-). What is not so nice is that I loose my splitted features: gene join(8311..8422,8852..8887,8940..9090,9142..9233, 9721..9848,10296..10714,10835..10934,11584..11706) /gene="R12H7.1" CDS join(8311..8422,8852..8887,8940..9090,9142..9233, 9721..9848,10296..10714,10835..10934,11584..11706) becomes now: gene 8311..8422 /note="frame=." /gene="R12H7.1" CDS 8311..8422 I tried to solve this issue by using the unflattener, but that did not work out quite well neither :-( My actual question is now: is there a way, using whatever system, to preserve the split feature structure ? That was actually what I was trying to do in the first place: reconstruct the original feature object starting from gff. Any ideas on that ? Also, do you think it will be possible to convert the Bio::SeqFeature::Annotated features into persistent ones so that these can be stored in BioSQL ? I'll try to test that out today. Cheers, Marc > -----Original Message----- > From: Allen Day [mailto:allenday@ucla.edu] > Sent: Tuesday, January 25, 2005 12:55 AM > To: Marc Logghe > Cc: Bioperl (E-mail) > Subject: Re: [Bioperl-l] struggling with Bio::FeatureIO and > Bio::SeqFeature::Annotated > > > Marc, > > The problem was that Bio::SeqIO::FTHelper was making calls > assuming it had > a Bio::SeqFeature::Generic instance. I've updated it to make calls > compliant with the Bio::SeqFeatureI interface, and the script > below now > at least runs using "option 1". > > "option 2" will not work, at least for now, because > Bio::DB::GenBank is > creating a SeqIO that holds Bio::SeqFeature::Generic objects, > and these > difficult to deal with because the internal data structures > are different > than a Bio::SeqFeature::Annotated. I like the technique used below to > bridge to Bio::FeatureIO via a Bio::Tools::GFF intermediary -- very > clever. > > You'll also notice that the GenBank-formatted file output by > the script > doesn't look quite right, the FEATURES section looks kind of like: > > FEATURES Location/Qualifiers > Bio::Annotation::OntologyTerm=HASH(0xa3d93f8)1..20975 > > /source="Bio::Annotation::SimpleValue=HASH(0x9bcdbe0)" > > /mol_type="Bio::Annotation::SimpleValue=HASH(0xa3dab1c)" > > /seq_id="Bio::Annotation::SimpleValue=HASH(0xa214de0)" > > /score="Bio::Annotation::SimpleValue=HASH(0xa3d92cc)" > > /frame="Bio::Annotation::SimpleValue=HASH(0xa439b04)" > /chad="Bio::Annotation::Comment=HASH(0xa3da9b4)" > > /note="score=Bio::Annotation::SimpleValue=HASH(0xa3d92cc)" > > /note="frame=Bio::Annotation::SimpleValue=HASH(0xa439b04)" > > /db_xref="Bio::Annotation::SimpleValue=HASH(0xa3daaf8)" > > /clone="Bio::Annotation::SimpleValue=HASH(0xa3dab28)" > > /strain="Bio::Annotation::SimpleValue=HASH(0xa3dabb8)" > > /phase="Bio::Annotation::SimpleValue=HASH(0xa3d935c)" > > /chromosome="Bio::Annotation::SimpleValue=HASH(0xa3dac00)" > > /type="Bio::Annotation::OntologyTerm=HASH(0xa3d93f8)" > > /organism="Bio::Annotation::SimpleValue=HASH(0xa3dac48)" > > because Bio::SeqFeautre::Annotated holds annotations as > objects pointers > rather than strings. We can fix this with a stringification > overload, but > I noticed that the code exists to do this in the Bio::Annotation::* > classes but is commented out, and I'm not sure why. Maybe > Hilmar can shed > some light on this. > > -Allen > > > > On Mon, 24 Jan 2005, Marc Logghe wrote: > > > Hi all, > > I have some problems with Bio::FeatureIO and > Bio::SeqFeature::Annotated. But maybe these modules are not > designed for the things I had in mind. > > My initial goal seemed pretty straightforward. It turned > out differently. > > I have a gff file containing features of bunch of > bioentries sitting in BioSQL. > > I wanted to turn the gff into feature objects, add them to > the bioentries, and save them back into the database. > > As a test I fetch a genbank record, strip the features and > convert them to gff. The gff is again converted to features > and added to the stripped seq object. > > The test script looks like this: > > ======================================================== > > #!/usr/bin/perl > > use strict; > > use Bio::SeqIO; > > use Bio::Tools::GFF; > > use Bio::FeatureIO; > > use IO::String; > > use Bio::DB::GenBank; > > > > use Data::Dumper; > > > > *Bio::SeqFeature::Annotated::all_tags = > \*Bio::SeqFeature::Annotated::get_all_tags; > > > > my $gff; > > my $gffio = IO::String->new($gff); > > > > my $db = Bio::DB::GenBank->new; > > my $sout = Bio::SeqIO->new(-fh => \*STDOUT, -format => 'genbank'); > > my $seq = $db->get_Seq_by_acc('Z50755'); > > > > my @feat = $seq->remove_SeqFeatures; > > > > # writing option 1 > > my $fout = Bio::Tools::GFF->new(-fh => $gffio, -gff_version => 3); > > # writing option 2 > > my $fout = Bio::FeatureIO->new(-fh => $gffio, -format => > 'gff', -version => 3); > > > > $fout->write_feature(@feat); > > > > $gffio = IO::String->new($gff); > > > > my $fin = Bio::FeatureIO->new(-fh => $gffio, -format => > 'gff', -version => 3); > > > > while (my $feat = $fin->next_feature) > > { > > $seq->add_SeqFeature($feat); > > } > > print Data::Dumper->Dump([$seq],['seq']); > > > > $sout->write_seq($seq); > > ======================================================== > > > > First, I had an issue when writing the features to gff > using Bio::FeatureIO (writing option 2): > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: only Bio::SeqFeature::Annotated objects are writeable > > STACK: Error::throw > > STACK: Bio::Root::Root::throw > /home/marcl/src/bioperl/bioperl-live/Bio/Root/Root.pm:328 > > STACK: Bio::FeatureIO::gff::write_feature > /home/marcl/src/bioperl/bioperl-live/Bio/FeatureIO/gff.pm:259 > > STACK: ./test.pl:25 > > ----------------------------------------------------------- > > > > Therefore, I used Bio::Tools::GFF to write (writing option > 1). But then, I run into troubles when it comes to dumping > the sequence into genbank format: > > Can't locate object method "all_tags" via package > "Bio::SeqFeature::Annotated" at > /home/marcl/src/bioperl/bioperl-live/Bio/SeqIO/FTHelper.pm > line 212, line 52. > > > > I tried to fix this by adding the line > > *Bio::SeqFeature::Annotated::all_tags = > \*Bio::SeqFeature::Annotated::get_all_tags; > > > > But in vain: > > Can't locate object method "get_all_tags" via package > "Bio::Annotation::Collection" at > /home/marcl/src/bioperl/bioperl-live/Bio/SeqFeature/Annotated. > pm line 547, line 52. > > > > Regards, > > Marc > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > From allenday at ucla.edu Tue Jan 25 04:45:28 2005 From: allenday at ucla.edu (Allen Day) Date: Tue Jan 25 04:41:30 2005 Subject: [Bioperl-l] struggling with Bio::FeatureIO and Bio::SeqFeature::Annotated In-Reply-To: References: Message-ID: On Tue, 25 Jan 2005, Marc Logghe wrote: > Hi Allen, > Thanks for the fixes ! no problem. let me know if you find more stuff like this, i'm trying to clean up all the calls to SeqFeatureI inheritors to use the interface methods rather than subclass-specific methods. > Like you suggested, I got the tag values when using stringification overload, so that is solved (I don't want to commit that myself though, seems too tricky to me ;-). > What is not so nice is that I loose my splitted features: > gene join(8311..8422,8852..8887,8940..9090,9142..9233, > 9721..9848,10296..10714,10835..10934,11584..11706) > /gene="R12H7.1" > CDS join(8311..8422,8852..8887,8940..9090,9142..9233, > 9721..9848,10296..10714,10835..10934,11584..11706) > > > becomes now: > > gene 8311..8422 > /note="frame=." > /gene="R12H7.1" > CDS 8311..8422 > > I tried to solve this issue by using the unflattener, but that did not work out quite well neither :-( > My actual question is now: is there a way, using whatever system, to preserve the split feature structure ? That was actually what I was trying to do in the first place: reconstruct the original feature object starting from gff. Any ideas on that ? oh. i don't know anything about this. never had to deal with split locations before. is this concept equivalent to a GFF3 Target attribute? maybe Scott Cain or Chris Mungall have something to say here. i think Scott is back from vacation tomorrow. > > Also, do you think it will be possible to convert the Bio::SeqFeature::Annotated features into persistent ones so that these can be stored in BioSQL ? I'll try to test that out today. no idea. my guess is not without substantial effort. -allen > Cheers, > Marc > > > > > > -----Original Message----- > > From: Allen Day [mailto:allenday@ucla.edu] > > Sent: Tuesday, January 25, 2005 12:55 AM > > To: Marc Logghe > > Cc: Bioperl (E-mail) > > Subject: Re: [Bioperl-l] struggling with Bio::FeatureIO and > > Bio::SeqFeature::Annotated > > > > > > Marc, > > > > The problem was that Bio::SeqIO::FTHelper was making calls > > assuming it had > > a Bio::SeqFeature::Generic instance. I've updated it to make calls > > compliant with the Bio::SeqFeatureI interface, and the script > > below now > > at least runs using "option 1". > > > > "option 2" will not work, at least for now, because > > Bio::DB::GenBank is > > creating a SeqIO that holds Bio::SeqFeature::Generic objects, > > and these > > difficult to deal with because the internal data structures > > are different > > than a Bio::SeqFeature::Annotated. I like the technique used below to > > bridge to Bio::FeatureIO via a Bio::Tools::GFF intermediary -- very > > clever. > > > > You'll also notice that the GenBank-formatted file output by > > the script > > doesn't look quite right, the FEATURES section looks kind of like: > > > > FEATURES Location/Qualifiers > > Bio::Annotation::OntologyTerm=HASH(0xa3d93f8)1..20975 > > > > /source="Bio::Annotation::SimpleValue=HASH(0x9bcdbe0)" > > > > /mol_type="Bio::Annotation::SimpleValue=HASH(0xa3dab1c)" > > > > /seq_id="Bio::Annotation::SimpleValue=HASH(0xa214de0)" > > > > /score="Bio::Annotation::SimpleValue=HASH(0xa3d92cc)" > > > > /frame="Bio::Annotation::SimpleValue=HASH(0xa439b04)" > > /chad="Bio::Annotation::Comment=HASH(0xa3da9b4)" > > > > /note="score=Bio::Annotation::SimpleValue=HASH(0xa3d92cc)" > > > > /note="frame=Bio::Annotation::SimpleValue=HASH(0xa439b04)" > > > > /db_xref="Bio::Annotation::SimpleValue=HASH(0xa3daaf8)" > > > > /clone="Bio::Annotation::SimpleValue=HASH(0xa3dab28)" > > > > /strain="Bio::Annotation::SimpleValue=HASH(0xa3dabb8)" > > > > /phase="Bio::Annotation::SimpleValue=HASH(0xa3d935c)" > > > > /chromosome="Bio::Annotation::SimpleValue=HASH(0xa3dac00)" > > > > /type="Bio::Annotation::OntologyTerm=HASH(0xa3d93f8)" > > > > /organism="Bio::Annotation::SimpleValue=HASH(0xa3dac48)" > > > > because Bio::SeqFeautre::Annotated holds annotations as > > objects pointers > > rather than strings. We can fix this with a stringification > > overload, but > > I noticed that the code exists to do this in the Bio::Annotation::* > > classes but is commented out, and I'm not sure why. Maybe > > Hilmar can shed > > some light on this. > > > > -Allen > > > > > > > > On Mon, 24 Jan 2005, Marc Logghe wrote: > > > > > Hi all, > > > I have some problems with Bio::FeatureIO and > > Bio::SeqFeature::Annotated. But maybe these modules are not > > designed for the things I had in mind. > > > My initial goal seemed pretty straightforward. It turned > > out differently. > > > I have a gff file containing features of bunch of > > bioentries sitting in BioSQL. > > > I wanted to turn the gff into feature objects, add them to > > the bioentries, and save them back into the database. > > > As a test I fetch a genbank record, strip the features and > > convert them to gff. The gff is again converted to features > > and added to the stripped seq object. > > > The test script looks like this: > > > ======================================================== > > > #!/usr/bin/perl > > > use strict; > > > use Bio::SeqIO; > > > use Bio::Tools::GFF; > > > use Bio::FeatureIO; > > > use IO::String; > > > use Bio::DB::GenBank; > > > > > > use Data::Dumper; > > > > > > *Bio::SeqFeature::Annotated::all_tags = > > \*Bio::SeqFeature::Annotated::get_all_tags; > > > > > > my $gff; > > > my $gffio = IO::String->new($gff); > > > > > > my $db = Bio::DB::GenBank->new; > > > my $sout = Bio::SeqIO->new(-fh => \*STDOUT, -format => 'genbank'); > > > my $seq = $db->get_Seq_by_acc('Z50755'); > > > > > > my @feat = $seq->remove_SeqFeatures; > > > > > > # writing option 1 > > > my $fout = Bio::Tools::GFF->new(-fh => $gffio, -gff_version => 3); > > > # writing option 2 > > > my $fout = Bio::FeatureIO->new(-fh => $gffio, -format => > > 'gff', -version => 3); > > > > > > $fout->write_feature(@feat); > > > > > > $gffio = IO::String->new($gff); > > > > > > my $fin = Bio::FeatureIO->new(-fh => $gffio, -format => > > 'gff', -version => 3); > > > > > > while (my $feat = $fin->next_feature) > > > { > > > $seq->add_SeqFeature($feat); > > > } > > > print Data::Dumper->Dump([$seq],['seq']); > > > > > > $sout->write_seq($seq); > > > ======================================================== > > > > > > First, I had an issue when writing the features to gff > > using Bio::FeatureIO (writing option 2): > > > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > MSG: only Bio::SeqFeature::Annotated objects are writeable > > > STACK: Error::throw > > > STACK: Bio::Root::Root::throw > > /home/marcl/src/bioperl/bioperl-live/Bio/Root/Root.pm:328 > > > STACK: Bio::FeatureIO::gff::write_feature > > /home/marcl/src/bioperl/bioperl-live/Bio/FeatureIO/gff.pm:259 > > > STACK: ./test.pl:25 > > > ----------------------------------------------------------- > > > > > > Therefore, I used Bio::Tools::GFF to write (writing option > > 1). But then, I run into troubles when it comes to dumping > > the sequence into genbank format: > > > Can't locate object method "all_tags" via package > > "Bio::SeqFeature::Annotated" at > > /home/marcl/src/bioperl/bioperl-live/Bio/SeqIO/FTHelper.pm > > line 212, line 52. > > > > > > I tried to fix this by adding the line > > > *Bio::SeqFeature::Annotated::all_tags = > > \*Bio::SeqFeature::Annotated::get_all_tags; > > > > > > But in vain: > > > Can't locate object method "get_all_tags" via package > > "Bio::Annotation::Collection" at > > /home/marcl/src/bioperl/bioperl-live/Bio/SeqFeature/Annotated. > > pm line 547, line 52. > > > > > > Regards, > > > Marc > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > From jrm at compbio.dundee.ac.uk Tue Jan 25 06:52:58 2005 From: jrm at compbio.dundee.ac.uk (Jon manning) Date: Tue Jan 25 06:52:55 2005 Subject: [Bioperl-l] Bio::Seq objects from BLAST hits Message-ID: <1106653979.3777.8.camel@tick.compbio.dundee.ac.uk> Hi all, I have previously used Bio::DB::SwissProt to retrieve sequence objects using accession numbers derived from a BLAST search, which I subsequently aligned. Now I'm using BLAST to search the PDB (though I'm only interested in sequences), so don't have Bio::DB module for that. What would be the best way to derive a Bio::Seq object from the Bio::Hit::BlastHit objects? Thanks, Jon From nathanhaigh at ukonline.co.uk Tue Jan 25 07:42:19 2005 From: nathanhaigh at ukonline.co.uk (Nathan Spencer Haigh) Date: Tue Jan 25 07:38:24 2005 Subject: [Bioperl-l] Bioperl CVS release differences In-Reply-To: <000d01c4b76c$5bb3fd90$3cf4cdd9@Desktop> References: <000d01c4b76c$5bb3fd90$3cf4cdd9@Desktop> Message-ID: <1106656939.41f63eab97590@webmail.ukonline.net> I was wondering if it is at all possible to do the following with cvs: I would like to obtain a copy of all the files that are new/changed in bioperl-1.5 compared to the 1.4 release. The reason i want to do this is that i'd like to package up a perl program with (some of) these files so i only need to request that bioperl-1.4 be installed on the clients computer. Thanks Nathan ---------------------------------------------- This mail sent through http://www.ukonline.net From cjfields at uiuc.edu Tue Jan 25 09:40:17 2005 From: cjfields at uiuc.edu (Chris Fields) Date: Tue Jan 25 09:38:17 2005 Subject: [Fwd: Re: [Bioperl-l] bioperl-run for windows?] In-Reply-To: <41F5743F.10201@genetics.utah.edu> References: <41F5743F.10201@genetics.utah.edu> Message-ID: <6.1.1.1.2.20050125083723.01a6d358@express.cites.uiuc.edu> There is an EMBOSS release for Windows, believe it or not. It is currently at v. 2.7.1 and can be found at: http://perso.wanadoo.fr/ablavier/embosswin/embosswin.html I have no idea if it will work with Bioperl, though. Might be interesting to try at some point. Chris At 04:18 PM 1/24/2005, you wrote: >Some will work, and some won't. I've installed it on Windows, and used it >a bit there. One problem you'll run into is that you can't use bioperl >to run a program that can't be installed on Windows (EMBOSS for example) >so you'll >be limited that way, but check out the Pise interface for any of that >software. You should be able to get access to alot of non-windows >software via bioperl by using the Pise interface ( > >Bio::Tools::Run::AnalysisFactory::Pise > >http://www.pasteur.fr/recherche/unites/sis/Pise/ > > >Barry > >Tim Alcon wrote: > > > If I just grab it off CPAN, will it work on Windows, or does it use > > Unix system calls? > > > > Tim > > > > > > > > Jason Stajich wrote: > > > >> Is there a PPM on the bioperl site? > >> No > >> > >> Can you install bioperl-run on windows? > >> Yes - but you'll have to do it manually, or learn how to build PPMs > >> (quite simple really), or encourage someone to produce a PPM for > >> bioperl-run. > >> > >> -jason > >> On Jan 22, 2005, at 7:57 PM, Tim Alcon wrote: > >> > >>> Does a Windows version of bioperl-run exitst? If so, how do I get it? > >>> > >>> Tim > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l@portal.open-bio.org > >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> -- > >> Jason Stajich > >> jason.stajich at duke.edu > >> http://www.duke.edu/~jes12/ > >> > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > >-- >Barry Moore >Dept. of Human Genetics >University of Utah >Salt Lake City, UT > > > >-- >Barry Moore >Dept. of Human Genetics >University of Utah >Salt Lake City, UT > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l __________________________________ Chris Fields - Postdoctoral Researcher Lab of Dr. Robert Switzer Address: University of Illinois at Urbana-Champaign Dept. of Biochemistry - 323 RAL 600 S. Mathews Ave. Urbana, IL 61801 Phone : (217) 333-7098 Fax : (217) 244-5858 From palmeida at igc.gulbenkian.pt Tue Jan 25 12:20:14 2005 From: palmeida at igc.gulbenkian.pt (Paulo Almeida) Date: Tue Jan 25 12:17:30 2005 Subject: [Bioperl-l] Bioperl CVS release differences In-Reply-To: <1106656939.41f63eab97590@webmail.ukonline.net> References: <000d01c4b76c$5bb3fd90$3cf4cdd9@Desktop> <1106656939.41f63eab97590@webmail.ukonline.net> Message-ID: <20050125172014.GA6071@bioinf.igc.gulbenkian.pt> I don't know if it's possible with CVS, but you could do something like: diff -rq ~/Test/bioperl-1.5.0-RC2/Bio /usr/share/perl5/Bio (where those directories are the location of BioPerl 1.4 and 1.5) and feed the output to a script that copies the files that are new, or different, to a new directory. -Paulo On Tue, Jan 25, 2005 at 12:42:19PM +0000, Nathan Spencer Haigh wrote: > I was wondering if it is at all possible to do the following with cvs: > I would like to obtain a copy of all the files that are new/changed in > bioperl-1.5 compared to the 1.4 release. The reason i want to do this is that > i'd like to package up a perl program with (some of) these files so i only need > to request that bioperl-1.4 be installed on the clients computer. > > Thanks > Nathan -- Paulo Almeida Instituto Gulbenkian de Ciencia Apartado 14, 2781-901, Oeiras, PORTUGAL tel +351 21 446 46 35 fax +351 21 440 79 70 http://www.igc.gulbenkian.pt From palmeida at igc.gulbenkian.pt Tue Jan 25 12:20:14 2005 From: palmeida at igc.gulbenkian.pt (Paulo Almeida) Date: Tue Jan 25 12:17:51 2005 Subject: [Bioperl-l] Bioperl CVS release differences In-Reply-To: <1106656939.41f63eab97590@webmail.ukonline.net> References: <000d01c4b76c$5bb3fd90$3cf4cdd9@Desktop> <1106656939.41f63eab97590@webmail.ukonline.net> Message-ID: <20050125172014.GA6071@bioinf.igc.gulbenkian.pt> I don't know if it's possible with CVS, but you could do something like: diff -rq ~/Test/bioperl-1.5.0-RC2/Bio /usr/share/perl5/Bio (where those directories are the location of BioPerl 1.4 and 1.5) and feed the output to a script that copies the files that are new, or different, to a new directory. -Paulo On Tue, Jan 25, 2005 at 12:42:19PM +0000, Nathan Spencer Haigh wrote: > I was wondering if it is at all possible to do the following with cvs: > I would like to obtain a copy of all the files that are new/changed in > bioperl-1.5 compared to the 1.4 release. The reason i want to do this is that > i'd like to package up a perl program with (some of) these files so i only need > to request that bioperl-1.4 be installed on the clients computer. > > Thanks > Nathan -- Paulo Almeida Instituto Gulbenkian de Ciencia Apartado 14, 2781-901, Oeiras, PORTUGAL tel +351 21 446 46 35 fax +351 21 440 79 70 http://www.igc.gulbenkian.pt From garrettsorensen at gmail.com Tue Jan 25 12:45:35 2005 From: garrettsorensen at gmail.com (Garrett Sorensen) Date: Tue Jan 25 12:41:29 2005 Subject: [Bioperl-l] Restriction::Analysis strange error - please help Message-ID: Hello, I'm new to the mailing list. Thanks in advance for any help. I'm really stumped by the following error when running restriction analysis on large numbers of seq objects. This only occurs sometimes when dealing with large numbers of sequences. ------------- EXCEPTION ------------- MSG: Bad start,end parameters. Start [2002] has to be less than end [2001] STACK Bio::PrimarySeq::subseq /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:362 STACK Bio::Seq::subseq /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/Seq.pm:636 STACK Bio::Restriction::Analysis::fragment_maps /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/Restriction/Analysis.pm:552 STACK toplevel Restriction_analyser_multi_CpG_a.pl:182 Incase it helps here is what my program is doing: -Reads in multiple fasta sequences (~7kb average size) on at a time and creates a SeqIO object for each. -Restriction sites for a particular enzyme are determined for each SeqIO object and then a fragment 2kb in size is created around that site. -A new Seq object is created using the above fragment using "$upStreamSeqobj = Bio::Seq->new (-seq => $upStreamSeq);" -This new Seq object is fed into Restriction Analysis to generate fragments for another enzyme. Thanks so much for any help, best regards, Garrett From jason.stajich at duke.edu Tue Jan 25 15:21:01 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Jan 25 15:17:11 2005 Subject: [Bioperl-l] bioperl-1.5.0 released In-Reply-To: References: Message-ID: I just don't have the time to do this right now. It has not really been tested for all tests passing. if someone else wants to volunteer to work on validating it for release that would be great. -jason On Jan 25, 2005, at 12:59 PM, Nathan Haigh wrote: > Will there also be 1.5 releases for bioperl-run etc? > > Nathan > >> -----Original Message----- >> From: bioperl-l-bounces@portal.open-bio.org >> [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Jason >> Stajich >> Sent: 25 January 2005 02:37 >> To: 'bioperl-l@bioperl.org' List; bioperl-announce-l@bioperl.org >> Subject: [Bioperl-l] bioperl-1.5.0 released >> >> Bioperl 1.5.0 Developer's release is available for download. >> =============================================== >> >> http://bioperl.org/DIST/bioperl-1.5.0.tar.bz2 >> 425ac55ecbb4339b7b532ba6d429bb40 >> http://bioperl.org/DIST/bioperl-1.5.0.tar.gz >> 172472f0675de9a583432e21c9b1b5fc >> http://bioperl.org/DIST/bioperl-1.5.0.zip >> 3febcd2445a7393c65981a6f9f13a9ed >> >> We'll update the website to reflect this new release. >> >> The odd-numbered releases are called developer releases and are not >> deposited on CPAN. Please note that the API in 1.5.0 may change >> before >> the 1.6.0 release. which will be consider a stable API. We may do >> another developer release before 1.6.0 goes out. >> >> Lots of people have contributed to this release, I apologize for not >> naming them all. I'll try to cover some: thanks to Aaron Mackey for >> getting this release started, Brian Osborne for extensive >> documentation >> improvements, Nathan Haigh for volunteering to make a PPM of the >> release and Barry Moore and Nathan answering many of the windows >> related questions, Allen Day & Scott Cain & Steffen Grossmann for the >> work on FeatureIO, GFF3, and SeqFeature::Annotated, Chris Mungall for >> the work with Unflattener to merge GenBank annotations into GFF3 >> objects. >> >> Please see the AUTHORS file for a complete list of contributors. >> >> Jason Stajich on behalf of the Bioperl developers. >> >> >> Here is the info from the Changes file. >> 1.5 Developer release >> >> o Bio::Align::DNAStatistics and Bio::Align::ProteinStatistics >> provide Jukes-Cantor and Kimura pairwise distance methods, >> respectively. >> >> o Bio::AlignIO support for "po" format of POA, and "maf"; >> Bio::AlignIO::largemultifasta is a new alternative to >> Bio::AlignIO::fasta for temporary file-based manipulation of >> particularly large multiple sequence alignments. >> >> o Bio::Assembly::Singlet allows orphan, unassembled sequences to >> be treated similarly as an assembled contig. >> >> o Bio::CodonUsage provides new rare_codon() and probable_codons() >> methods for identifying particular codons that encode a given >> amino acid. >> >> o Bio::Coordinate::Utils provides new from_align() method to >> build >> a Bio::Coordinate pair directly from a >> Bio::Align::AlignI-conforming object. >> >> o Bio::DB::Biblio::eutils is a class for querying NCBI's Eutils. >> Send a Pubmed, Pubmed Central, Entrez, or other query to NCBI's >> web service using standard Pubmed query syntax, and retrieve >> results as XML. >> >> o Bio::DB::GFF has various sundry bug fixes. >> >> o Bio::FeatureIO is a new SeqIO-style subsystem for >> writing/reading genomic features to/from files. I/O classes >> exist for BED, GTF (aka GFF v2.5), and GFF v3. Bio::FeatureIO >> classes only read/write Bio::SeqFeature::Annotated objects. >> Notably, the GFF v3 class requires features to be typed into >> the >> Sequence Ontology. >> >> o Bio::Graph namespace contains new modules for manipulation and >> analysis of protein interaction graphs. >> >> o Bio::Graphics has many bug fixes and shiny new glyphs. >> >> o Bio::Index::Hmmer and Bio::Index::Qual provide multiple-file >> indexing for HMMER reports and FASTA qual files, respectively. >> >> o Bio::Map::Clone, Bio::Map::Contig, and Bio::Map::FPCMarker are >> new objects that can be placed within a >> Bio::Map::MapI-compliant >> genetic/physical map; Bio::Map::Physical provides a new >> physical >> map type; Bio::MapIO::fpc provides finger-printed clone mapping >> import. >> >> o Bio::Matrix::PSM provide new support for postion-specific >> (scoring) matrices (e.g. profiles, or "possums"). >> >> o Bio::Ontology::Ontology and Bio::Ontology::Term objects can now >> be instantiated without explicitly using Bio::OntologyIO. This >> is possible through changes to Bio::Ontology::OntologyStore to >> download ontology files from the web as necessary. Locations >> of >> ontology files are hard-coded into >> Bio::Ontology::DocumentRegistry. >> >> o Bio::PopGen includes many new methods and data types for >> population genetics analyses. >> >> o New constructor to Bio::Range, unions(). Given a list of >> ranges, returns another list of "flattened" ranges -- >> overlapping ranges are merged into a single range with the >> mininum and maximum coordinates of the entire overlapping >> group. >> >> o Bio::Root::IO now supports -url, in addition to -file and -fh. >> The new -url argument allows one to specify the network address >> of a file for input. -url currently only works for GET >> requests, and thus is read-only. >> >> o Bio::SearchIO::hmmer now returns individual Hit objects for >> each >> domain alignment (thus containing only one HSP); previously >> separate alignments would be merged into one hit if the domain >> involved in the alignments was the same, but this only worked >> when the repeated domain occured without interruption by any >> other domain, leading to a confusing mixture of Hit and HSP >> objects. >> >> o Bio::Search::Result::ResultI-compliant report objects now >> implement the "get_statistics" method to access >> Bio::Search::StatisticsI objects that encapsulate any >> statistical parameters associated with the search (e.g. >> Karlin's >> lambda for BLAST/FASTA). >> >> o Bio::Seq::LargeLocatableSeq combines the functionality already >> found in Bio::Seq::LargeSeq and Bio::LocatableSeq. >> >> o Bio::SeqFeature::Annotated is a replacement for >> Bio::SeqFeature::Generic. It breaks compliance with the >> Bio::SeqFeatureI interface because the author was sick of >> dealing with untyped annotation tags. All >> Bio::SeqFeature::Annotated annotations are Bio::AnnotationI >> compliant, and accessible through Bio::Annotation::Collection. >> >> o Bio::SeqFeature::Primer implements a Tm() method for primer >> melting point predictions. >> >> o Bio::SeqIO now supports AGAVE, BSML (via SAX), CHAOS-XML, >> InterProScan-XML, TIGR-XML, and NCBI TinySeq formats. >> >> o Bio::Taxonomy::Node now implements the methods necessary for >> Bio::Species interoperability. >> >> o Bio::Tools::CodonTable has new reverse_translate_all() and >> make_iupac_string() methods. >> >> o Bio::Tools::dpAlign now provides sequence profile alignments. >> >> o Bio::Tools::GFF now parses GFF version 2.5 (a.k.a. GTF). >> >> o Bio::Tools::Fgenesh, Bio::Tools::tRNAscanSE are new report >> parsers. >> >> o Bio::Tools::SiRNA includes two new rulesets (Saigo and Tuschl) >> for designing small inhibitory RNA. >> >> o Bio::Tree::DistanceFactory provides NJ and UPGMA tree-building >> methods based on a distance matrix. >> >> o Bio::Tree::Statistics provides an assess_bootstrap() method to >> calculate bootstrap support values on a guide tree topology, >> based on provided bootstrap tree topologies. >> >> o Bio::TreeIO now supports the Pagel (PAG) tree format. >> >> -- >> Jason Stajich >> jason.stajich at duke.edu >> http://www.duke.edu/~jes12/ >> --- >> avast! Antivirus: Inbound message clean. >> Virus Database (VPS): 0503-2, 21/01/2005 >> Tested on: 25/01/2005 17:41:57 >> avast! is copyright (c) 2000-2003 ALWIL Software. >> http://www.avast.com >> >> >> >> > --- > avast! Antivirus: Outbound message clean. > Virus Database (VPS): 0504-0, 25/01/2005 > Tested on: 25/01/2005 17:59:00 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > > > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From jason.stajich at duke.edu Tue Jan 25 16:09:50 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Jan 25 16:06:55 2005 Subject: [Bioperl-l] Bioperl CVS release differences In-Reply-To: <1106656939.41f63eab97590@webmail.ukonline.net> References: <000d01c4b76c$5bb3fd90$3cf4cdd9@Desktop> <1106656939.41f63eab97590@webmail.ukonline.net> Message-ID: <7379A361-6F15-11D9-90EA-000393C44276@duke.edu> $ cvs -dYADDAYADDA co -r bioperl-release-1-5-0 -d bioperl-1.5.0 bioperl-live $ cd bioperl-1.5.0 -- see what is different from the 1.4 branch (this includes bugs fixes that were made on that branch when we thought we were going to release a 1.4.1) $ cvs diff -r branch-1.4 -- see what is difference since the 1.4.0 release $ cvs diff -r bioperl-release-1-4-0 The FeatureIO and SeqFeature::Annotated is a BIG difference between 1.4 and may not necessarily be part of the stable 1.6.0 depending on the backwards compatibility and different views on how to develop. -jason On Jan 25, 2005, at 7:42 AM, Nathan Spencer Haigh wrote: > I was wondering if it is at all possible to do the following with cvs: > I would like to obtain a copy of all the files that are new/changed in > bioperl-1.5 compared to the 1.4 release. The reason i want to do this > is that > i'd like to package up a perl program with (some of) these files so i > only need > to request that bioperl-1.4 be installed on the clients computer. > > Thanks > Nathan > > ---------------------------------------------- > This mail sent through http://www.ukonline.net > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From e.mugerwa at mtc.com.bh Tue Jan 25 09:37:23 2005 From: e.mugerwa at mtc.com.bh (Edward Mugerwa Buyondo) Date: Tue Jan 25 17:34:41 2005 Subject: [Bioperl-l] Volunteers needed !! Message-ID: <41F659A3.5060308@mtc.com.bh> Dear bperl; Iam interested in volunteering in testing and otherwise. Edward Mugerwa Manama, Bahrain -----------------------------Disclaimer------------------------------ This communication is intended for the above named person and is confidential and / or legally privileged. Any opinion(s) expressed in this communication are not necessarily those of the MTC Vodafone Bahrain. If it has come to you in error you must take no action based upon it, nor must you print it, copy it, forward it, or show it to anyone. Please delete and destroy the e-mail and any attachments and inform the sender immediately. Thank you. MTC Vodafone Bahrain is not responsible for the political, religious, racial or partisan opinion in any correspondence conducted by its domain users. Therefore, any such opinion expressed, whether explicitly or implicitly, in any said correspondence is not to be interpreted as that of MTC Vodafone Bahrain. MTC Vodafone Bahrain may monitor all incoming and outgoing e-mails in line with MTC Vodafone Bahrain business practice. Although MTC Vodafone Bahrain has taken steps to ensure that e-mails and attachments are free from any virus, we advise that, in keeping with best business practice, the recipient must ensure they are actually virus free. ---------------------------------------------------------------------- From florian.iragne at labri.fr Tue Jan 25 13:17:04 2005 From: florian.iragne at labri.fr (Florian) Date: Tue Jan 25 17:35:17 2005 Subject: [Bioperl-l] bug in bl2seq parser? Message-ID: <41F68D20.8010708@labri.fr> Hello everybody, i've searched in the archives to find the solution to my problem, and couldn't find a solution, so i post... ok, here is the part of my code that doesn't work: ###################################################################### my $bl2temp = "/tmp/bl2seq.$$.out"; use Bio::Tools::Run::StandAloneBlast; my $factory = Bio::Tools::Run::StandAloneBlast->new( 'outfile' => "$bl2temp", 'program' => 'blastp', 'REPORT_TYPE' => 'BLASTP' ); my $bl2 = $factory->bl2seq( $seqaa1, $seqaa2 ); my $str = Bio::AlignIO->new( '-file' => "$bl2temp", '-format' => 'bl2seq', '-report_type' => 'blastp' ); my $aln = $str->next_aln(); ####################################################################### the program crash on the line : "my $aln = $str->next_aln();" with the following message : Can't call method "querySeq" on an undefined value at /usr/lib/perl5/site_perl/5.8.0/Bio/AlignIO/bl2seq.pm line 137 This error happens each time the alignment between my 2 sequences is not possible. I can't figure out how to test this case, since the script crash on the method that is supposed to get the alignment. I expected that this method ( next_align() ) would return an empty object if there is no alignment, but it seems not to be the case. Does anybody have a solution for this kind of problem? thanks Florian From lstein at cshl.edu Tue Jan 25 16:18:11 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Tue Jan 25 17:35:32 2005 Subject: [Bioperl-l] help on large sequence with Bio::Index::Fasta! In-Reply-To: <20050124140233.49ac48b6@dogwood.plantbio.uga.edu> References: <20050124140233.49ac48b6@dogwood.plantbio.uga.edu> Message-ID: <200501251618.12231.lstein@cshl.edu> As far as I know Bio::Index::Fasta works fine with large sequences. I've used it with worm chromosomes up to 20 MB. You might try Bio::DB::Fasta in a pinch, since it stores the data differently. Lincoln On Monday 24 January 2005 02:02 pm, Guojun Yang wrote: > Hi, everybody, > I got another difficult situation: > I am running a local blast and sequence retrieval. The following > sub works OK for one of my local DB1, but not for my local DB2. DB1 > contains sequences of PACs and BACs (I believe the average size is > ~100 or 200 kb), but DB2 contains entries of contigs as large as > 30Mb. The error says the $seq object is undefined! I believe the > problem is the size of the large entries in DB2. Can we use > LargeSeq when we do retrieval? Can anybody help me on how we can > use it with Bio::Index::Fasta?. Thank you for your comments in > advance! Yang > > > > sub getseq { > my $id=$_[0]; > my $file_name = $_[1]; > my $inx=Bio::Index::Fasta->new (-filename => $file_name.".idx", > -write_flag => 1); > $inx->id_parser(\&get_id); > $inx->make_index($file_name); > $seq = $inx->fetch($id); > return $seq; > } > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 NOTE: Please copy Sandra Michelsen on all emails regarding scheduling and other time-critical topics. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050125/b53329ac/attachment.bin From barry.moore at genetics.utah.edu Tue Jan 25 18:15:52 2005 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Tue Jan 25 18:13:15 2005 Subject: [Bioperl-l] load_seqdatabase.pl running SLOW! Message-ID: <41F6D328.7090402@genetics.utah.edu> Hilmar (or others)- I've set up a biosql based database using PostgreSQL 7.2 on a PC with an Intel Pentium 4 3.0 GHz processor, 800 MHz system Bus. 1 GB of RAM, and Linux (2.2 kernel - Debian woody distro). Onto that I am loading ~352,000 sequences from RefSeq complete rna collection using load_seqdatabase.pl. It's running kind of slow - loding on average about 1 sequence every 2-5 seconds. In the archives I've read your comments to a previous question like this suggesting two fast processors, a couple gigs of memory and 2-3 drives to really make things fly and while my system isn't that good, it seems like I should be doing better. I got to experimenting on another (slower) system while waiting for things to load, and found that running the same script to load the same file goes about 3X faster on a 266MHz Intel processor with 192 Mb RAM. Same installation of PostgreSQL (both installed from deb package with defaults), and same installation of Debian Linux (except that the kernel on the older slow machine has been updated to 2.4) Another difference I noticed between the two is that the old 266 MHz machine is using about 75% CPU resources for perl and about 25% for postmaster whereas the faster 3 GHz machine (but slower running load_seqdatabase.pl) is using 95% of it's CPU resources for postmaster and about 3% for perl. Both systems are using up most of their memory, but little to no swap. Could the kernel upgrade really be making the difference? Any thoughts? As it's going now I can wait over a week for all these sequences to load, or build the database on our dinosaur server in a couple of days and dump it across to our sexy new 3 GHz server. Talk about bass ackwards! Barry -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT From smarkel at scitegic.com Tue Jan 25 18:25:44 2005 From: smarkel at scitegic.com (Scott Markel) Date: Tue Jan 25 18:22:27 2005 Subject: [Fwd: Re: [Bioperl-l] bioperl-run for windows?] In-Reply-To: <6.1.1.1.2.20050125083723.01a6d358@express.cites.uiuc.edu> References: <41F5743F.10201@genetics.utah.edu> <6.1.1.1.2.20050125083723.01a6d358@express.cites.uiuc.edu> Message-ID: <41F6D578.4010206@scitegic.com> Chris, I use EMBOSSwin with BioPerl. Mostly it runs fine. Two things to watch out for. The first is the use of /dev/null for stderr. The second is that Bio::Factory::EMBOSS::_program_list specifically fails if the OS is MSWin or Mac. Scott Chris Fields wrote: > There is an EMBOSS release for Windows, believe it or not. It is > currently at v. 2.7.1 and can be found at: > > http://perso.wanadoo.fr/ablavier/embosswin/embosswin.html > > I have no idea if it will work with Bioperl, though. Might be > interesting to try at some point. > > Chris > > At 04:18 PM 1/24/2005, you wrote: > >> Some will work, and some won't. I've installed it on Windows, and used it >> a bit there. One problem you'll run into is that you can't use bioperl >> to run a program that can't be installed on Windows (EMBOSS for >> example) so you'll >> be limited that way, but check out the Pise interface for any of that >> software. You should be able to get access to alot of non-windows >> software via bioperl by using the Pise interface ( >> >> Bio::Tools::Run::AnalysisFactory::Pise >> >> http://www.pasteur.fr/recherche/unites/sis/Pise/ >> >> >> Barry >> >> Tim Alcon wrote: >> >> > If I just grab it off CPAN, will it work on Windows, or does it use >> > Unix system calls? >> > >> > Tim >> > >> > >> > >> > Jason Stajich wrote: >> > >> >> Is there a PPM on the bioperl site? >> >> No >> >> >> >> Can you install bioperl-run on windows? >> >> Yes - but you'll have to do it manually, or learn how to build PPMs >> >> (quite simple really), or encourage someone to produce a PPM for >> >> bioperl-run. >> >> >> >> -jason >> >> On Jan 22, 2005, at 7:57 PM, Tim Alcon wrote: >> >> >> >>> Does a Windows version of bioperl-run exitst? If so, how do I get it? >> >>> >> >>> Tim >> >>> >> >>> >> >>> _______________________________________________ >> >>> Bioperl-l mailing list >> >>> Bioperl-l@portal.open-bio.org >> >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >>> >> >>> >> >> -- >> >> Jason Stajich >> >> jason.stajich at duke.edu >> >> http://www.duke.edu/~jes12/ >> >> >> >> >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l@portal.open-bio.org >> > http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> >> -- >> Barry Moore >> Dept. of Human Genetics >> University of Utah >> Salt Lake City, UT >> >> >> >> -- >> Barry Moore >> Dept. of Human Genetics >> University of Utah >> Salt Lake City, UT >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > __________________________________ > > Chris Fields - Postdoctoral Researcher > Lab of Dr. Robert Switzer > > Address: > > University of Illinois at Urbana-Champaign > Dept. of Biochemistry - 323 RAL > 600 S. Mathews Ave. > Urbana, IL 61801 > > Phone : (217) 333-7098 > Fax : (217) 244-5858 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel@scitegic.com SciTegic Inc. mobile: +1 858 205 3653 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 San Diego, CA 92123 fax: +1 858 279 8804 USA web: http://www.scitegic.com From hlapp at gnf.org Tue Jan 25 20:04:49 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Jan 25 20:00:54 2005 Subject: [Bioperl-l] RE: load_seqdatabase.pl running SLOW! Message-ID: To be honest I've never loaded a large file into a Pg installation. The problem that I'd expect you to run into is that if you started with a fresh database the lookup queries will become slower and slower in the absence of the stats being recomputed on a frequent basis through vacuum (which the load script won't do). I believe in more recent releases you can actually vacuum the database concurrent to write access; not sure whether 7.2.x will allow this already. You should strongly consider upgrading to at least 7.3 if not 7.4 or even 8.x. The Pg developers may not even answer questions to 7.2 anymore ... Your obvservation that the slower machine with the later kernel would be faster leaves me puzzled. If blind-tested I would have suggested that the machine appearing faster has had the database vacuumed. Not sure this is very helpful ... -hilmar -----Original Message----- From: Barry Moore [mailto:barry.moore@genetics.utah.edu] Sent: Tue 1/25/2005 3:15 PM To: Bioperl list; Hilmar Lapp Cc: Subject: load_seqdatabase.pl running SLOW! Hilmar (or others)- I've set up a biosql based database using PostgreSQL 7.2 on a PC with an Intel Pentium 4 3.0 GHz processor, 800 MHz system Bus. 1 GB of RAM, and Linux (2.2 kernel - Debian woody distro). Onto that I am loading ~352,000 sequences from RefSeq complete rna collection using load_seqdatabase.pl. It's running kind of slow - loding on average about 1 sequence every 2-5 seconds. In the archives I've read your comments to a previous question like this suggesting two fast processors, a couple gigs of memory and 2-3 drives to really make things fly and while my system isn't that good, it seems like I should be doing better. I got to experimenting on another (slower) system while waiting for things to load, and found that running the same script to load the same file goes about 3X faster on a 266MHz Intel processor with 192 Mb RAM. Same installation of PostgreSQL (both installed from deb package with defaults), and same installation of Debian Linux (except that the kernel on the older slow machine has been updated to 2.4) Another difference I noticed between the two is that the old 266 MHz machine is using about 75% CPU resources for perl and about 25% for postmaster whereas the faster 3 GHz machine (but slower running load_seqdatabase.pl) is using 95% of it's CPU resources for postmaster and about 3% for perl. Both systems are using up most of their memory, but little to no swap. Could the kernel upgrade really be making the difference? Any thoughts? As it's going now I can wait over a week for all these sequences to load, or build the database on our dinosaur server in a couple of days and dump it across to our sexy new 3 GHz server. Talk about bass ackwards! Barry -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT From rob at salmonella.org Tue Jan 25 20:20:47 2005 From: rob at salmonella.org (Rob Edwards) Date: Tue Jan 25 20:17:53 2005 Subject: [Bioperl-l] Restriction::Analysis strange error - please help In-Reply-To: References: Message-ID: <820FADCE-6F38-11D9-A47D-000A959E1622@salmonella.org> It is hard to locate the exact error without more information. The error is caused because at some point you are trying to get a sequence that starts at position 2002, but the sequence is only 2001 nt long (hence the error that 2002 must be < 2001). I would suggest that the error is at some point where you are taking the 2kb fragment around the site. The most obvious thing to start with is what are the start/end coordinates that are called immediately before the error, and do they make sense given the length of the sequence? Rob On Jan 25, 2005, at 9:45 AM, Garrett Sorensen wrote: > Hello, I'm new to the mailing list. Thanks in advance for any help. > > I'm really stumped by the following error when running restriction > analysis on large numbers of seq objects. This only occurs sometimes > when dealing with large numbers of sequences. > > > ------------- EXCEPTION ------------- > MSG: Bad start,end parameters. Start [2002] has to be less than end > [2001] > STACK Bio::PrimarySeq::subseq > /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:362 > STACK Bio::Seq::subseq > /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/Seq.pm:636 > STACK Bio::Restriction::Analysis::fragment_maps > /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/Restriction/ > Analysis.pm:552 > STACK toplevel Restriction_analyser_multi_CpG_a.pl:182 > > > Incase it helps here is what my program is doing: > -Reads in multiple fasta sequences (~7kb average size) on at a time > and creates a SeqIO object for each. > -Restriction sites for a particular enzyme are determined for each > SeqIO object and then a fragment 2kb in size is created around that > site. > -A new Seq object is created using the above fragment using > "$upStreamSeqobj = Bio::Seq->new (-seq => $upStreamSeq);" > -This new Seq object is fed into Restriction Analysis to generate > fragments for another enzyme. > > Thanks so much for any help, best regards, > Garrett > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From billk at iinet.net.au Tue Jan 25 20:40:50 2005 From: billk at iinet.net.au (William Kenworthy) Date: Tue Jan 25 20:36:47 2005 Subject: [Bioperl-l] Something I am confused about and have not seen explained in the docs: Message-ID: <1106703650.19499.2.camel@rattus.Localdomain> Something I am confused about and have not seen explained in the docs: Is bioperl-run complimentary, a subset of or a self-contained package of separate functions compared to bioperl? And what is, and how does bioperl-live fit into this picture? BillK -- William Kenworthy Home! From jiangs at mail.nih.gov Tue Jan 25 20:57:09 2005 From: jiangs at mail.nih.gov (Jiang, Shan (NIH/NCI)) Date: Tue Jan 25 20:53:02 2005 Subject: [Bioperl-l] BIOperl release plan for 2005? Message-ID: <16A0583FB1644E4DB8C0A0265028B6FDFC9196@nihexchange13.nih.gov> Hi! Is there a release schedule for BIOperl for 2005? Can someone also give me an overview on how far ahead I should have the code completed in order for it to make a certain release? I am planning on contributing code to BIOperl to integrate with the Perl version of caBIO, a cancer bioinformatices application here at the National Cancer Institue Center for Bioinformatics (NCICB) in NIH (http://ncicb.nci.nih.gov/core/caBIO). Thanks a lot for your help! Shan Jiang (Contractor) From garrettsorensen at gmail.com Tue Jan 25 21:12:37 2005 From: garrettsorensen at gmail.com (Garrett Sorensen) Date: Tue Jan 25 21:08:49 2005 Subject: [Bioperl-l] Restriction::Analysis strange error - please help In-Reply-To: <820FADCE-6F38-11D9-A47D-000A959E1622@salmonella.org> References: <820FADCE-6F38-11D9-A47D-000A959E1622@salmonella.org> Message-ID: Thanks for the suggestion Rob but still having issues. I've boiled the code down to just reading in fasta sequences and digesting the whole sequences - opposed to digesting a subsequence as done initially. It digests a few hundred sequences without issue and then runs into the same error, but reports different corrdinates of course. If it is only reading in a fasta sequence and digesting it how is it calculating wrong start/end coordinates for itself? And the fact that it will work great on many sequences but then calculates wrong coordinates for one seems strange... Any ideas? I've tried feeding Restriction::Analysis both a SeqIO object and a PrimarySeq with the same result. Here is the code: use strict; use Bio::SeqIO; use Bio::PrimarySeq; use Bio::Restriction::Analysis; use Bio::PrimarySeq; use Data::Dumper; my $in = Bio::SeqIO->new(-file => "$file", -format => 'fasta'); while ( my $seq = $in->next_seq ) { my $ra=Bio::Restriction::Analysis->new(-seq=>$seq); my @fragments = $ra->fragments('NlaIII'); print join ("\n\n", @fragments); } exit Thanks for any help or suggestions, best regards, Garrett On Tue, 25 Jan 2005 17:20:47 -0800, Rob Edwards wrote: > It is hard to locate the exact error without more information. The > error is caused because at some point you are trying to get a sequence > that starts at position 2002, but the sequence is only 2001 nt long > (hence the error that 2002 must be < 2001). I would suggest that the > error is at some point where you are taking the 2kb fragment around the > site. The most obvious thing to start with is what are the start/end > coordinates that are called immediately before the error, and do they > make sense given the length of the sequence? > > Rob > > On Jan 25, 2005, at 9:45 AM, Garrett Sorensen wrote: > > > Hello, I'm new to the mailing list. Thanks in advance for any help. > > > > I'm really stumped by the following error when running restriction > > analysis on large numbers of seq objects. This only occurs sometimes > > when dealing with large numbers of sequences. > > > > > > ------------- EXCEPTION ------------- > > MSG: Bad start,end parameters. Start [2002] has to be less than end > > [2001] > > STACK Bio::PrimarySeq::subseq > > /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:362 > > STACK Bio::Seq::subseq > > /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/Seq.pm:636 > > STACK Bio::Restriction::Analysis::fragment_maps > > /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/Restriction/ > > Analysis.pm:552 > > STACK toplevel Restriction_analyser_multi_CpG_a.pl:182 > > > > > > Incase it helps here is what my program is doing: > > -Reads in multiple fasta sequences (~7kb average size) on at a time > > and creates a SeqIO object for each. > > -Restriction sites for a particular enzyme are determined for each > > SeqIO object and then a fragment 2kb in size is created around that > > site. > > -A new Seq object is created using the above fragment using > > "$upStreamSeqobj = Bio::Seq->new (-seq => $upStreamSeq);" > > -This new Seq object is fed into Restriction Analysis to generate > > fragments for another enzyme. > > > > Thanks so much for any help, best regards, > > Garrett > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > From allenday at ucla.edu Wed Jan 26 01:27:00 2005 From: allenday at ucla.edu (Allen Day) Date: Wed Jan 26 01:22:56 2005 Subject: [Bioperl-l] RPMs for bioperl Message-ID: Hi, I've put together a set of RPMs for Bioperl, Bioperl-DB, Bioperl-Run, and GBrowse. It's still a work in progress, but you can see the current state here: http://sumo.genetics.ucla.edu/~allenday/flute/bioperl-1.5/i686/ There are some related directories rooted here: http://sumo.genetics.ucla.edu/~allenday/flute/ The RPMs don't install clean. This is because I'm using an automated tool to build the RPMs, and it looks through each downloaded tarball from CPAN to see what that tarball depends on. Sometimes there are dependencies on libraries that don't exist on CPAN, or might be altogether non-existent. These are the problem libraries and binaries: % rpm -Uvh --test *.rpm error: Failed dependencies: perl(Ace::Browser::LocalSiteDefs) is needed by perl-AcePerl-1.87-allenday perl(Bio::Das::ProServer::SourceHydra) is needed by perl-Bio-Das-0.99-allenday perl(IndexSupport) is needed by perl-Bio-Das-0.99-allenday perl(srsperl) is needed by perl-bioperl-1.5.0-allenday perl(Bio::DB::BioDB) is needed by perl-Generic-Genome-Browser-1.62-allenday perl(Bio::DB::Query::BioQuery) is needed by perl-Generic-Genome-Browser-1.62-allenday perl(GuessDirectories) is needed by perl-Generic-Genome-Browser-1.62-allenday perl(MOBY::Client::Central) is needed by perl-Generic-Genome-Browser-1.62-allenday perl(MOBY::Client::Service) is needed by perl-Generic-Genome-Browser-1.62-allenday perl(MOBY::CommonSubs) is needed by perl-Generic-Genome-Browser-1.62-allenday perl(MOBY::MobyXMLConstants) is needed by perl-Generic-Genome-Browser-1.62-allenday perl(PPM::Archive) is needed by perl-Generic-Genome-Browser-1.62-allenday perl(MQClient::MQSeries) is needed by perl-SOAP-Lite-0.60-allenday perl(MQSeries) is needed by perl-SOAP-Lite-0.60-allenday perl(MQSeries::Message) is needed by perl-SOAP-Lite-0.60-allenday perl(MQSeries::Queue) is needed by perl-SOAP-Lite-0.60-allenday perl(MQSeries::QueueManager) is needed by perl-SOAP-Lite-0.60-allenday /bin/perl is needed by perl-Tk-804.027-allenday /usr/local/bin/perl is needed by perl-Tk-804.027-allenday perl(Tk::LabRadio) is needed by perl-Tk-804.027-allenday perl(Tk::TextReindex) is needed by perl-Tk-804.027-allenday perl(XML::LibXML) >= 1.57 is needed by perl-XML-LibXSLT-1.57-allenday perl(XML::SAX::PurePerl::DTDDecls) is needed by perl-XML-SAX-0.12-allenday perl(XML::SAX::PurePerl::DocType) is needed by perl-XML-SAX-0.12-allenday perl(XML::SAX::PurePerl::EncodingDetect) is needed by perl-XML-SAX-0.12-allenday perl(XML::SAX::PurePerl::XMLDecl) is needed by perl-XML-SAX-0.12-allenday Lincoln, I'm guessing you can help me with: * Ace::Browser::LocalSiteDefs * Bio::Das::ProServer::SourceHydra * GuessDirectories * IndexSupport * MOBY::* Hilmar, do you know about: * Bio::DB::BioDB * Bio::DB::Query::BioQuery I'm sure someone on this list knows where to get * srsperl * PPM::Archive If anyone can shed light on where any of these libraries can be found, I'd appreciate it. Thanks. -Allen From nathanhaigh at ukonline.co.uk Wed Jan 26 03:58:06 2005 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Wed Jan 26 03:54:05 2005 Subject: [Bioperl-l] Something I am confused about and have not seenexplained in the docs: In-Reply-To: <1106703650.19499.2.camel@rattus.Localdomain> Message-ID: The bioperl-core modules distributed as bioperl-1.4, bioperl-1.5 etc consist of the core modules and if additional functionality is required, you can install one/more of the following bioperl packages: the run package (bioperl-run), Ext (bioperl-ext), microarray (bioperl-microarray) etc. They all depend on the core bioperl package to be installed, but add additional functionality. They can be seen at: http://www.bioperl.org/Core/Latest/index.shtml bioperl-live is the name given to the cutting-edge versions of all the bioperl files, available via CVS. Bioperl is open-source many different people contribute to its development from just reporting bugs/errors to writing entirely new modules that extend bioperl's functionality. As a result, the Concurrent Versions System (CVS) is used to track all the modifications of all the bioperl files, so a developer can make a bugfix etc to a file and commit it to CVS. This results in the continual evolution of bioperl even after an official release of bioperl; for example, v1.4 and 1.5 once released do not change - EVER) but updates to files are recorded using CVS and would be included in future releases i.e. 1.5.1 or 1.6. Access to this cutting-edge code is available for those who want it using CVS or by navigating the links at http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/?cvsroot=bioperl and selecting the Download Tarball at the appropriate page. Hope this helps Nathan > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of William Kenworthy > Sent: 26 January 2005 01:41 > To: BioPerl List > Subject: [Bioperl-l] Something I am confused about and have not seenexplained in the docs: > > Something I am confused about and have not seen explained in the docs: > > Is bioperl-run complimentary, a subset of or a self-contained package of > separate functions compared to bioperl? And what is, and how does > bioperl-live fit into this picture? > > BillK > > -- > William Kenworthy > Home! > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0504-0, 25/01/2005 Tested on: 26/01/2005 08:55:51 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com From nathanhaigh at ukonline.co.uk Wed Jan 26 03:59:54 2005 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Wed Jan 26 03:55:53 2005 Subject: [Bioperl-l] bioperl-1.5.0 released In-Reply-To: Message-ID: Would it be helpful for me to make the ppd file for bioperl-run 1.4 so people can install it easily using PPM? Nathan > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich@duke.edu] > Sent: 25 January 2005 20:21 > To: nathanhaigh@ukonline.co.uk > Cc: Bioperl list > Subject: Re: [Bioperl-l] bioperl-1.5.0 released > > I just don't have the time to do this right now. It has not really been > tested for all tests passing. > > if someone else wants to volunteer to work on validating it for release > that would be great. > > -jason > On Jan 25, 2005, at 12:59 PM, Nathan Haigh wrote: > > > Will there also be 1.5 releases for bioperl-run etc? > > > > Nathan > > > >> -----Original Message----- > >> From: bioperl-l-bounces@portal.open-bio.org > >> [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Jason > >> Stajich > >> Sent: 25 January 2005 02:37 > >> To: 'bioperl-l@bioperl.org' List; bioperl-announce-l@bioperl.org > >> Subject: [Bioperl-l] bioperl-1.5.0 released > >> > >> Bioperl 1.5.0 Developer's release is available for download. > >> =============================================== > >> > >> http://bioperl.org/DIST/bioperl-1.5.0.tar.bz2 > >> 425ac55ecbb4339b7b532ba6d429bb40 > >> http://bioperl.org/DIST/bioperl-1.5.0.tar.gz > >> 172472f0675de9a583432e21c9b1b5fc > >> http://bioperl.org/DIST/bioperl-1.5.0.zip > >> 3febcd2445a7393c65981a6f9f13a9ed > >> > >> We'll update the website to reflect this new release. > >> > >> The odd-numbered releases are called developer releases and are not > >> deposited on CPAN. Please note that the API in 1.5.0 may change > >> before > >> the 1.6.0 release. which will be consider a stable API. We may do > >> another developer release before 1.6.0 goes out. > >> > >> Lots of people have contributed to this release, I apologize for not > >> naming them all. I'll try to cover some: thanks to Aaron Mackey for > >> getting this release started, Brian Osborne for extensive > >> documentation > >> improvements, Nathan Haigh for volunteering to make a PPM of the > >> release and Barry Moore and Nathan answering many of the windows > >> related questions, Allen Day & Scott Cain & Steffen Grossmann for the > >> work on FeatureIO, GFF3, and SeqFeature::Annotated, Chris Mungall for > >> the work with Unflattener to merge GenBank annotations into GFF3 > >> objects. > >> > >> Please see the AUTHORS file for a complete list of contributors. > >> > >> Jason Stajich on behalf of the Bioperl developers. > >> > >> > >> Here is the info from the Changes file. > >> 1.5 Developer release > >> > >> o Bio::Align::DNAStatistics and Bio::Align::ProteinStatistics > >> provide Jukes-Cantor and Kimura pairwise distance methods, > >> respectively. > >> > >> o Bio::AlignIO support for "po" format of POA, and "maf"; > >> Bio::AlignIO::largemultifasta is a new alternative to > >> Bio::AlignIO::fasta for temporary file-based manipulation of > >> particularly large multiple sequence alignments. > >> > >> o Bio::Assembly::Singlet allows orphan, unassembled sequences to > >> be treated similarly as an assembled contig. > >> > >> o Bio::CodonUsage provides new rare_codon() and probable_codons() > >> methods for identifying particular codons that encode a given > >> amino acid. > >> > >> o Bio::Coordinate::Utils provides new from_align() method to > >> build > >> a Bio::Coordinate pair directly from a > >> Bio::Align::AlignI-conforming object. > >> > >> o Bio::DB::Biblio::eutils is a class for querying NCBI's Eutils. > >> Send a Pubmed, Pubmed Central, Entrez, or other query to NCBI's > >> web service using standard Pubmed query syntax, and retrieve > >> results as XML. > >> > >> o Bio::DB::GFF has various sundry bug fixes. > >> > >> o Bio::FeatureIO is a new SeqIO-style subsystem for > >> writing/reading genomic features to/from files. I/O classes > >> exist for BED, GTF (aka GFF v2.5), and GFF v3. Bio::FeatureIO > >> classes only read/write Bio::SeqFeature::Annotated objects. > >> Notably, the GFF v3 class requires features to be typed into > >> the > >> Sequence Ontology. > >> > >> o Bio::Graph namespace contains new modules for manipulation and > >> analysis of protein interaction graphs. > >> > >> o Bio::Graphics has many bug fixes and shiny new glyphs. > >> > >> o Bio::Index::Hmmer and Bio::Index::Qual provide multiple-file > >> indexing for HMMER reports and FASTA qual files, respectively. > >> > >> o Bio::Map::Clone, Bio::Map::Contig, and Bio::Map::FPCMarker are > >> new objects that can be placed within a > >> Bio::Map::MapI-compliant > >> genetic/physical map; Bio::Map::Physical provides a new > >> physical > >> map type; Bio::MapIO::fpc provides finger-printed clone mapping > >> import. > >> > >> o Bio::Matrix::PSM provide new support for postion-specific > >> (scoring) matrices (e.g. profiles, or "possums"). > >> > >> o Bio::Ontology::Ontology and Bio::Ontology::Term objects can now > >> be instantiated without explicitly using Bio::OntologyIO. This > >> is possible through changes to Bio::Ontology::OntologyStore to > >> download ontology files from the web as necessary. Locations > >> of > >> ontology files are hard-coded into > >> Bio::Ontology::DocumentRegistry. > >> > >> o Bio::PopGen includes many new methods and data types for > >> population genetics analyses. > >> > >> o New constructor to Bio::Range, unions(). Given a list of > >> ranges, returns another list of "flattened" ranges -- > >> overlapping ranges are merged into a single range with the > >> mininum and maximum coordinates of the entire overlapping > >> group. > >> > >> o Bio::Root::IO now supports -url, in addition to -file and -fh. > >> The new -url argument allows one to specify the network address > >> of a file for input. -url currently only works for GET > >> requests, and thus is read-only. > >> > >> o Bio::SearchIO::hmmer now returns individual Hit objects for > >> each > >> domain alignment (thus containing only one HSP); previously > >> separate alignments would be merged into one hit if the domain > >> involved in the alignments was the same, but this only worked > >> when the repeated domain occured without interruption by any > >> other domain, leading to a confusing mixture of Hit and HSP > >> objects. > >> > >> o Bio::Search::Result::ResultI-compliant report objects now > >> implement the "get_statistics" method to access > >> Bio::Search::StatisticsI objects that encapsulate any > >> statistical parameters associated with the search (e.g. > >> Karlin's > >> lambda for BLAST/FASTA). > >> > >> o Bio::Seq::LargeLocatableSeq combines the functionality already > >> found in Bio::Seq::LargeSeq and Bio::LocatableSeq. > >> > >> o Bio::SeqFeature::Annotated is a replacement for > >> Bio::SeqFeature::Generic. It breaks compliance with the > >> Bio::SeqFeatureI interface because the author was sick of > >> dealing with untyped annotation tags. All > >> Bio::SeqFeature::Annotated annotations are Bio::AnnotationI > >> compliant, and accessible through Bio::Annotation::Collection. > >> > >> o Bio::SeqFeature::Primer implements a Tm() method for primer > >> melting point predictions. > >> > >> o Bio::SeqIO now supports AGAVE, BSML (via SAX), CHAOS-XML, > >> InterProScan-XML, TIGR-XML, and NCBI TinySeq formats. > >> > >> o Bio::Taxonomy::Node now implements the methods necessary for > >> Bio::Species interoperability. > >> > >> o Bio::Tools::CodonTable has new reverse_translate_all() and > >> make_iupac_string() methods. > >> > >> o Bio::Tools::dpAlign now provides sequence profile alignments. > >> > >> o Bio::Tools::GFF now parses GFF version 2.5 (a.k.a. GTF). > >> > >> o Bio::Tools::Fgenesh, Bio::Tools::tRNAscanSE are new report > >> parsers. > >> > >> o Bio::Tools::SiRNA includes two new rulesets (Saigo and Tuschl) > >> for designing small inhibitory RNA. > >> > >> o Bio::Tree::DistanceFactory provides NJ and UPGMA tree-building > >> methods based on a distance matrix. > >> > >> o Bio::Tree::Statistics provides an assess_bootstrap() method to > >> calculate bootstrap support values on a guide tree topology, > >> based on provided bootstrap tree topologies. > >> > >> o Bio::TreeIO now supports the Pagel (PAG) tree format. > >> > >> -- > >> Jason Stajich > >> jason.stajich at duke.edu > >> http://www.duke.edu/~jes12/ > >> --- > >> avast! Antivirus: Inbound message clean. > >> Virus Database (VPS): 0503-2, 21/01/2005 > >> Tested on: 25/01/2005 17:41:57 > >> avast! is copyright (c) 2000-2003 ALWIL Software. > >> http://www.avast.com > >> > >> > >> > >> > > --- > > avast! Antivirus: Outbound message clean. > > Virus Database (VPS): 0504-0, 25/01/2005 > > Tested on: 25/01/2005 17:59:00 > > avast! is copyright (c) 2000-2003 ALWIL Software. > > http://www.avast.com > > > > > > > > > > > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > --- > avast! Antivirus: Inbound message clean. > Virus Database (VPS): 0504-0, 25/01/2005 > Tested on: 26/01/2005 08:32:25 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0504-0, 25/01/2005 Tested on: 26/01/2005 08:59:46 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com From billk at iinet.net.au Wed Jan 26 04:34:15 2005 From: billk at iinet.net.au (William Kenworthy) Date: Wed Jan 26 04:32:47 2005 Subject: [Bioperl-l] Something I am confused about and have not seenexplained in the docs: In-Reply-To: References: Message-ID: <1106732055.19499.56.camel@rattus.Localdomain> Thanks, none of this seems to be documented anywhere and perhaps should be. Perhaps the best way to get a relatively bug-free version (as I am suffering from some 1.4 bugs that apparently have been long fixed) is to go cvs for all the packages at the same time and at least be at the working edge, rather than have a missmatched hodge podge of "stable, but buggy and way too old" versions. BillK On Wed, 2005-01-26 at 08:58 +0000, Nathan Haigh wrote: > The bioperl-core modules distributed as bioperl-1.4, bioperl-1.5 etc consist of the core modules and if additional functionality is > required, you can install one/more of the following bioperl packages: the run package (bioperl-run), Ext (bioperl-ext), microarray > (bioperl-microarray) etc. They all depend on the core bioperl package to be installed, but add additional functionality. They can be > seen at: > http://www.bioperl.org/Core/Latest/index.shtml > Home! From palmeida at igc.gulbenkian.pt Wed Jan 26 06:25:06 2005 From: palmeida at igc.gulbenkian.pt (Paulo Almeida) Date: Wed Jan 26 06:24:20 2005 Subject: [Bioperl-l] Get tag value into a variable Message-ID: <20050126112506.GD6071@bioinf.igc.gulbenkian.pt> Hi, This is probably a perl problem, rather than a bioperl problem, but I'm having trouble storing a tag from a feature in a variable. I do this: print $feat->get_tag_values($tag) , "\n" if $tag eq 'coded_by'; my $coded = $feat->get_tag_values($tag); print $coded , "\n" if $tag eq 'coded_by'; and the output is this: AK021294.1:<1..381 1 The first line is correct, and I suppose the second takes the value '1' because $feat->get_tag_values($tag) was successful, but how can I put the actual tag in a variable, for later use? (my current solution is to print the tag to a file and then read it from there, which is less than elegant, to say the least). I'm attaching the full code, in case someone wants to test it. Thanks, Paulo -- Paulo Almeida Instituto Gulbenkian de Ciencia Apartado 14, 2781-901, Oeiras, PORTUGAL tel +351 21 446 46 35 fax +351 21 440 79 70 http://www.igc.gulbenkian.pt From Marc.Logghe at devgen.com Wed Jan 26 06:30:12 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Wed Jan 26 06:26:22 2005 Subject: [Bioperl-l] Get tag value into a variable Message-ID: > This is probably a perl problem, rather than a bioperl > problem, but I'm > having trouble storing a tag from a feature in a variable. I do this: > > print $feat->get_tag_values($tag) , "\n" if $tag eq 'coded_by'; > my $coded = $feat->get_tag_values($tag); > print $coded , "\n" if $tag eq 'coded_by'; > You have to do that in list context, because there might be multiple values for the same key (eg. multiple note tags) so it should read (if you are only interested in the first one, or when there is only 1): my ($coded) = $feat->get_tag_values($tag); HTH, Marc From sanges at biogem.it Wed Jan 26 06:34:20 2005 From: sanges at biogem.it (Remo Sanges) Date: Wed Jan 26 06:30:37 2005 Subject: [Bioperl-l] Restriction::Analysis strange error - please help In-Reply-To: References: <820FADCE-6F38-11D9-A47D-000A959E1622@salmonella.org> Message-ID: <8f683cdabefdee9bdf6630d37c008f79@biogem.it> On Jan 26, 2005, at 3:12 AM, Garrett Sorensen wrote: > Thanks for the suggestion Rob but still having issues. I've boiled > the code down to just reading in fasta sequences and digesting the > whole sequences - opposed to digesting a subsequence as done > initially. It digests a few hundred sequences without issue and then > runs into the same error, but reports different corrdinates of course. > > If it is only reading in a fasta sequence and digesting it how is it > calculating wrong start/end coordinates for itself? And the fact that > it will work great on many sequences but then calculates wrong > coordinates for one seems strange... Any ideas? > > I've tried feeding Restriction::Analysis both a SeqIO object and a > PrimarySeq with the same result. > > Here is the code: > > use strict; > use Bio::SeqIO; > use Bio::PrimarySeq; > use Bio::Restriction::Analysis; > use Bio::PrimarySeq; > use Data::Dumper; > > my $in = Bio::SeqIO->new(-file => "$file", -format => 'fasta'); > while ( my $seq = $in->next_seq ) { > my $ra=Bio::Restriction::Analysis->new(-seq=>$seq); > my @fragments = $ra->fragments('NlaIII'); > print join ("\n\n", @fragments); > } > exit > Garrett, this code that you post isn' t really very helpful... You should have a problem at a certain point in your code where you ask for a ->subseq with the start bigger than end... The problem should come from your calculations, not from module's errors.... HTH Remo > On Tue, 25 Jan 2005 17:20:47 -0800, Rob Edwards > wrote: >> It is hard to locate the exact error without more information. The >> error is caused because at some point you are trying to get a sequence >> that starts at position 2002, but the sequence is only 2001 nt long >> (hence the error that 2002 must be < 2001). I would suggest that the >> error is at some point where you are taking the 2kb fragment around >> the >> site. The most obvious thing to start with is what are the start/end >> coordinates that are called immediately before the error, and do they >> make sense given the length of the sequence? >> >> Rob >> >> On Jan 25, 2005, at 9:45 AM, Garrett Sorensen wrote: >> >>> Hello, I'm new to the mailing list. Thanks in advance for any help. >>> >>> I'm really stumped by the following error when running restriction >>> analysis on large numbers of seq objects. This only occurs sometimes >>> when dealing with large numbers of sequences. >>> >>> >>> ------------- EXCEPTION ------------- >>> MSG: Bad start,end parameters. Start [2002] has to be less than end >>> [2001] >>> STACK Bio::PrimarySeq::subseq >>> /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:362 >>> STACK Bio::Seq::subseq >>> /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/Seq.pm:636 >>> STACK Bio::Restriction::Analysis::fragment_maps >>> /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/Restriction/ >>> Analysis.pm:552 >>> STACK toplevel Restriction_analyser_multi_CpG_a.pl:182 >>> >>> >>> Incase it helps here is what my program is doing: >>> -Reads in multiple fasta sequences (~7kb average size) on at a time >>> and creates a SeqIO object for each. >>> -Restriction sites for a particular enzyme are determined for each >>> SeqIO object and then a fragment 2kb in size is created around that >>> site. >>> -A new Seq object is created using the above fragment using >>> "$upStreamSeqobj = Bio::Seq->new (-seq => $upStreamSeq);" >>> -This new Seq object is fed into Restriction Analysis to generate >>> fragments for another enzyme. >>> >>> Thanks so much for any help, best regards, >>> Garrett >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From nathanhaigh at ukonline.co.uk Wed Jan 26 06:47:38 2005 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Wed Jan 26 06:43:34 2005 Subject: [Bioperl-l] Bioperl CVS release differences In-Reply-To: <7379A361-6F15-11D9-90EA-000393C44276@duke.edu> Message-ID: Thanks Jason With your help and some internet searching I found the following worked brilliantly, and thought I'd share it with people in case it is of some use! Login to CVS: $ cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/bioperl login PASSWD: cvs Checkout Bioperl-1.5 into the local directory bioperl-1.5.0: $ cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/bioperl -r bioperl-release-1-5-0 -d bioperl-1.5.0 $cd bioperl-1.5.0 Get the list of changed/new files in bioperl-1.5.0 compared with release 1.4 cvs -q diff --brief -N -r bioperl-release-1-4-0 | grep "^RCS" However, this doesn't specify if the file was modified/added/removed. Something like the following works on WinXP: cvs -q diff --brief -r bioperl-release-1-4-0 2>&1 | grep "^\(RCS\|cvs diff\)" | sort > files.log Example line from files.log of a modified file between 1.4 and 1.5 releases: RCS file: /home/repository/bioperl/bioperl-live/Bio/Species.pm,v Example line from files.log of a new file added since the 1.4 release: cvs diff: tag bioperl-release-1-4-0 is not in file Bio/FeatureIO.pm Example line from files.log of a file that was deleted since 1.4 release: cvs diff: doc/howto/html/e-novative.css no longer exists, no comparison available Nathan > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich@duke.edu] > Sent: 25 January 2005 21:10 > To: Nathan Spencer Haigh > Cc: 'Bioperl' > Subject: Re: [Bioperl-l] Bioperl CVS release differences > > $ cvs -dYADDAYADDA co -r bioperl-release-1-5-0 -d bioperl-1.5.0 > bioperl-live > $ cd bioperl-1.5.0 > > -- see what is different from the 1.4 branch (this includes bugs fixes > that were made on that branch when we thought we were going to release > a 1.4.1) > $ cvs diff -r branch-1.4 > > -- see what is difference since the 1.4.0 release > $ cvs diff -r bioperl-release-1-4-0 > > The FeatureIO and SeqFeature::Annotated is a BIG difference between 1.4 > and may not necessarily be part of the stable 1.6.0 depending on the > backwards compatibility and different views on how to develop. > > -jason > > On Jan 25, 2005, at 7:42 AM, Nathan Spencer Haigh wrote: > > > I was wondering if it is at all possible to do the following with cvs: > > I would like to obtain a copy of all the files that are new/changed in > > bioperl-1.5 compared to the 1.4 release. The reason i want to do this > > is that > > i'd like to package up a perl program with (some of) these files so i > > only need > > to request that bioperl-1.4 be installed on the clients computer. > > > > Thanks > > Nathan > > > > ---------------------------------------------- > > This mail sent through http://www.ukonline.net > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > --- > avast! Antivirus: Inbound message clean. > Virus Database (VPS): 0504-0, 25/01/2005 > Tested on: 26/01/2005 08:32:25 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0504-0, 25/01/2005 Tested on: 26/01/2005 11:41:29 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0504-0, 25/01/2005 Tested on: 26/01/2005 11:46:31 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com From palmeida at igc.gulbenkian.pt Wed Jan 26 07:51:56 2005 From: palmeida at igc.gulbenkian.pt (Paulo Almeida) Date: Wed Jan 26 07:46:53 2005 Subject: [Bioperl-l] Get tag value into a variable In-Reply-To: References: Message-ID: <20050126125156.GF6071@bioinf.igc.gulbenkian.pt> Thanks! That did it. -Paulo On Wed, Jan 26, 2005 at 12:30:12PM +0100, Marc Logghe wrote: > > > This is probably a perl problem, rather than a bioperl > > problem, but I'm > > having trouble storing a tag from a feature in a variable. I do this: > > > > print $feat->get_tag_values($tag) , "\n" if $tag eq 'coded_by'; > > my $coded = $feat->get_tag_values($tag); > > print $coded , "\n" if $tag eq 'coded_by'; > > > You have to do that in list context, because there might be multiple values for the same key (eg. multiple note tags) > so it should read (if you are only interested in the first one, or when there is only 1): > > my ($coded) = $feat->get_tag_values($tag); > HTH, > Marc From garrettsorensen at gmail.com Wed Jan 26 09:57:51 2005 From: garrettsorensen at gmail.com (Garrett Sorensen) Date: Wed Jan 26 09:53:56 2005 Subject: [Bioperl-l] Restriction::Analysis strange error - please help In-Reply-To: <8f683cdabefdee9bdf6630d37c008f79@biogem.it> References: <820FADCE-6F38-11D9-A47D-000A959E1622@salmonella.org> <8f683cdabefdee9bdf6630d37c008f79@biogem.it> Message-ID: Thanks Remo.. To test this module that is the only code I'm using right now... I'm no longer grabbing a subsequence so it can't be calculation error. To test all I'm trying to do is read in sequences from a fasta file and digest them. It runs fine for a few hundred sequences generating fragments as it should, then out of nowhere it will run into the same error, but with different coordinates. Possibly this module isn't working properly for me? ------------- EXCEPTION ------------- MSG: Bad start,end parameters. Start [2002] has to be less than end [2001] STACK Bio::PrimarySeq::subseq /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:362 STACK Bio::Seq::subseq /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/Seq.pm:636 STACK Bio::Restriction::Analysis::fragment_maps /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/Restriction/ Analysis.pm:552 STACK toplevel Restriction_analyser_multi_CpG_a.pl:182 On Wed, 26 Jan 2005 12:34:20 +0100, Remo Sanges wrote: > On Jan 26, 2005, at 3:12 AM, Garrett Sorensen wrote: > > > Thanks for the suggestion Rob but still having issues. I've boiled > > the code down to just reading in fasta sequences and digesting the > > whole sequences - opposed to digesting a subsequence as done > > initially. It digests a few hundred sequences without issue and then > > runs into the same error, but reports different corrdinates of course. > > > > If it is only reading in a fasta sequence and digesting it how is it > > calculating wrong start/end coordinates for itself? And the fact that > > it will work great on many sequences but then calculates wrong > > coordinates for one seems strange... Any ideas? > > > > I've tried feeding Restriction::Analysis both a SeqIO object and a > > PrimarySeq with the same result. > > > > Here is the code: > > > > use strict; > > use Bio::SeqIO; > > use Bio::PrimarySeq; > > use Bio::Restriction::Analysis; > > use Bio::PrimarySeq; > > use Data::Dumper; > > > > my $in = Bio::SeqIO->new(-file => "$file", -format => 'fasta'); > > while ( my $seq = $in->next_seq ) { > > my $ra=Bio::Restriction::Analysis->new(-seq=>$seq); > > my @fragments = $ra->fragments('NlaIII'); > > print join ("\n\n", @fragments); > > } > > exit > > > > Garrett, > > this code that you post isn' t really very helpful... > You should have a problem at a certain point in your code > where you ask for a ->subseq with the start bigger than end... > The problem should come from your calculations, not from > module's errors.... > > HTH > > Remo > > > On Tue, 25 Jan 2005 17:20:47 -0800, Rob Edwards > > wrote: > >> It is hard to locate the exact error without more information. The > >> error is caused because at some point you are trying to get a sequence > >> that starts at position 2002, but the sequence is only 2001 nt long > >> (hence the error that 2002 must be < 2001). I would suggest that the > >> error is at some point where you are taking the 2kb fragment around > >> the > >> site. The most obvious thing to start with is what are the start/end > >> coordinates that are called immediately before the error, and do they > >> make sense given the length of the sequence? > >> > >> Rob > >> > >> On Jan 25, 2005, at 9:45 AM, Garrett Sorensen wrote: > >> > >>> Hello, I'm new to the mailing list. Thanks in advance for any help. > >>> > >>> I'm really stumped by the following error when running restriction > >>> analysis on large numbers of seq objects. This only occurs sometimes > >>> when dealing with large numbers of sequences. > >>> > >>> > >>> ------------- EXCEPTION ------------- > >>> MSG: Bad start,end parameters. Start [2002] has to be less than end > >>> [2001] > >>> STACK Bio::PrimarySeq::subseq > >>> /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:362 > >>> STACK Bio::Seq::subseq > >>> /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/Seq.pm:636 > >>> STACK Bio::Restriction::Analysis::fragment_maps > >>> /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/Restriction/ > >>> Analysis.pm:552 > >>> STACK toplevel Restriction_analyser_multi_CpG_a.pl:182 > >>> > >>> > >>> Incase it helps here is what my program is doing: > >>> -Reads in multiple fasta sequences (~7kb average size) on at a time > >>> and creates a SeqIO object for each. > >>> -Restriction sites for a particular enzyme are determined for each > >>> SeqIO object and then a fragment 2kb in size is created around that > >>> site. > >>> -A new Seq object is created using the above fragment using > >>> "$upStreamSeqobj = Bio::Seq->new (-seq => $upStreamSeq);" > >>> -This new Seq object is fed into Restriction Analysis to generate > >>> fragments for another enzyme. > >>> > >>> Thanks so much for any help, best regards, > >>> Garrett > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l@portal.open-bio.org > >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > From nathanhaigh at ukonline.co.uk Wed Jan 26 10:51:43 2005 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Wed Jan 26 10:47:45 2005 Subject: [Bioperl-l] bioperl development Message-ID: I was wondering who plans the development of bioperl and how is this organised? The reason I ask, is that I've recently opened a project at sourceforge.net and was surprised by the amount of tools that are available for organising project development. For example you are able to organise project tasks, has CVS support, mailing lists, discussion forum (public and private), tracker system for bugs, support requests, patches and feature requests and web space. It seems to me that some of these features could benefit bioperl. I'm not sure about the setup of bioperl servers and websites, but would it be possible to implement some of these development tools for bioperl? Nathan --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0504-0, 25/01/2005 Tested on: 26/01/2005 15:51:33 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com From hollandr at gis.a-star.edu.sg Wed Jan 26 02:54:34 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Wed Jan 26 11:18:56 2005 Subject: [Bioperl-l] PatternHunter parsing Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D56015B2E28@BIONIC.biopolis.one-north.com> Are there any BioJava or BioPerl modules for parsing PatternHunter output? It's very similar to Blast output, so if there isn't one already, would other people be interested in using one if I wrote one? cheers, Richard Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- From palmeida at igc.gulbenkian.pt Wed Jan 26 06:26:15 2005 From: palmeida at igc.gulbenkian.pt (Paulo Almeida) Date: Wed Jan 26 11:19:10 2005 Subject: [Bioperl-l] Get feature tag - forgotten attachment Message-ID: <20050126112615.GE6071@bioinf.igc.gulbenkian.pt> -- Paulo Almeida Instituto Gulbenkian de Ciencia Apartado 14, 2781-901, Oeiras, PORTUGAL tel +351 21 446 46 35 fax +351 21 440 79 70 http://www.igc.gulbenkian.pt -------------- next part -------------- A non-text attachment was scrubbed... Name: testget.pl Type: text/x-perl Size: 395 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050126/2335d3d6/testget.bin From ak at ebi.ac.uk Wed Jan 26 11:32:57 2005 From: ak at ebi.ac.uk (Andreas Kahari) Date: Wed Jan 26 11:30:24 2005 Subject: [Bioperl-l] bioperl development In-Reply-To: References: Message-ID: <20050126163257.GB2193@ebi.ac.uk> Looking at the bioperl.org site I can see references to CVS, Bugzilla, mailing lists, FAQ & HOWTOs and a lot of other things. Do you think a move into sourceforge, away from the open-bio foundation resources (which also hosts biopython and biojava etc.), would be worth it and be beneficial to the development of the project? Sorry, but I don't think so. Andreas On Wed, Jan 26, 2005 at 03:51:43PM -0000, Nathan Haigh wrote: > I was wondering who plans the development of bioperl and how is this organised? > > > > The reason I ask, is that I've recently opened a project at sourceforge.net and was surprised by the amount of tools that are > available for organising project development. For example you are able to organise project tasks, has CVS support, mailing lists, > discussion forum (public and private), tracker system for bugs, support requests, patches and feature requests and web space. It > seems to me that some of these features could benefit bioperl. I'm not sure about the setup of bioperl servers and websites, but > would it be possible to implement some of these development tools for bioperl? -- Andreas K?h?ri EMBL-EBI/ensembl 1024D/C2E163CB From nathanhaigh at ukonline.co.uk Wed Jan 26 11:54:47 2005 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Wed Jan 26 11:50:49 2005 Subject: [Bioperl-l] bioperl development In-Reply-To: <20050126163257.GB2193@ebi.ac.uk> Message-ID: Sorry, I wasn't suggesting any sort of move, I was thinking more of just implementing similar tools that would allow developers to better organise and delegate tasks to other developers/users. This way, people who would like to help could view a list of tasks that they would like to take part in. I just thought that the example of sourceforge towards the development of open-source software by organising and coordinating effort from project admin/developers/users was a good one, and that certain aspects could be useful for coordinating efforts for bioperl. Nathan > -----Original Message----- > From: Andreas Kahari [mailto:ak@ebi.ac.uk] > Sent: 26 January 2005 16:33 > To: Nathan Haigh > Cc: 'Bioperl' > Subject: Re: [Bioperl-l] bioperl development > > Looking at the bioperl.org site I can see references to CVS, > Bugzilla, mailing lists, FAQ & HOWTOs and a lot of other things. > Do you think a move into sourceforge, away from the open-bio > foundation resources (which also hosts biopython and biojava > etc.), would be worth it and be beneficial to the development of > the project? > > Sorry, but I don't think so. > > > Andreas > > On Wed, Jan 26, 2005 at 03:51:43PM -0000, Nathan Haigh wrote: > > I was wondering who plans the development of bioperl and how is this organised? > > > > > > > > The reason I ask, is that I've recently opened a project at sourceforge.net and was surprised by the amount of tools that are > > available for organising project development. For example you are able to organise project tasks, has CVS support, mailing lists, > > discussion forum (public and private), tracker system for bugs, support requests, patches and feature requests and web space. It > > seems to me that some of these features could benefit bioperl. I'm not sure about the setup of bioperl servers and websites, but > > would it be possible to implement some of these development tools for bioperl? > > -- > Andreas K?h?ri > EMBL-EBI/ensembl > > 1024D/C2E163CB --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0504-0, 25/01/2005 Tested on: 26/01/2005 16:54:44 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com From jason.stajich at duke.edu Wed Jan 26 12:56:11 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Jan 26 12:52:14 2005 Subject: [Bioperl-l] bioperl development In-Reply-To: References: Message-ID: <9053D49D-6FC3-11D9-BF87-000393C44276@duke.edu> On Jan 26, 2005, at 11:54 AM, Nathan Haigh wrote: > Sorry, I wasn't suggesting any sort of move, I was thinking more of > just implementing similar tools that would allow developers to > better organise and delegate tasks to other developers/users. This > way, people who would like to help could view a list of tasks > that they would like to take part in. > Sure - even if we did a similar thing to the Mozilla site with their first bugs page http://www.mozilla.org/contribute/hacking/first-bugs/ & http://www.mozilla.org/developer/ that would be good. If we can deal with a content management system which made updating these pages really easy then it will be actually be used (and therefore useful). If we could get content-management and RSS feeds to be easy to update and edit that might make sense. If we moved a majority of the web site over to something like moveable-type. This is what in fact the biopython.org site is now done with and how the news.open-bio.org site is run. We used to have a wikiweb setup for bioperl but it was really buggy and just didn't get used that much. A new wiki would be nice I expect. The hard part is always have TOO many places for documentation and keeping it all organized. We already have a hard enough time keeping the modules organized and documented. > I just thought that the example of sourceforge towards the development > of open-source software by organising and coordinating effort > from project admin/developers/users was a good one, and that certain > aspects could be useful for coordinating efforts for bioperl. > These are good thoughts - as always it take some energy and time to put into place a new system. We really welcome anyone trying to make this a better system. At some level it is hard for the core developers to be project managers, developers, and system administrators. So any help is really much appreciated. > Nathan > > >> -----Original Message----- >> From: Andreas Kahari [mailto:ak@ebi.ac.uk] >> Sent: 26 January 2005 16:33 >> To: Nathan Haigh >> Cc: 'Bioperl' >> Subject: Re: [Bioperl-l] bioperl development >> >> Looking at the bioperl.org site I can see references to CVS, >> Bugzilla, mailing lists, FAQ & HOWTOs and a lot of other things. >> Do you think a move into sourceforge, away from the open-bio >> foundation resources (which also hosts biopython and biojava >> etc.), would be worth it and be beneficial to the development of >> the project? >> >> Sorry, but I don't think so. >> >> >> Andreas >> >> On Wed, Jan 26, 2005 at 03:51:43PM -0000, Nathan Haigh wrote: >>> I was wondering who plans the development of bioperl and how is this >>> organised? >>> >>> >>> >>> The reason I ask, is that I've recently opened a project at >>> sourceforge.net and was surprised by the amount of tools that are >>> available for organising project development. For example you are >>> able to organise project tasks, has CVS support, mailing > lists, >>> discussion forum (public and private), tracker system for bugs, >>> support requests, patches and feature requests and web space. It >>> seems to me that some of these features could benefit bioperl. I'm >>> not sure about the setup of bioperl servers and websites, but >>> would it be possible to implement some of these development tools >>> for bioperl? >> >> -- >> Andreas K?h?ri >> EMBL-EBI/ensembl >> >> 1024D/C2E163CB > --- > avast! Antivirus: Outbound message clean. > Virus Database (VPS): 0504-0, 25/01/2005 > Tested on: 26/01/2005 16:54:44 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From sanges at biogem.it Wed Jan 26 13:05:52 2005 From: sanges at biogem.it (Remo Sanges) Date: Wed Jan 26 13:02:04 2005 Subject: [Bioperl-l] Restriction::Analysis strange error - please help In-Reply-To: References: <820FADCE-6F38-11D9-A47D-000A959E1622@salmonella.org> <8f683cdabefdee9bdf6630d37c008f79@biogem.it> Message-ID: <145446a61b72364ba730f2f89b075d99@biogem.it> On Jan 26, 2005, at 3:57 PM, Garrett Sorensen wrote: > Thanks Remo.. To test this module that is the only code I'm using > right now... I'm no longer grabbing a subsequence so it can't be > calculation error. To test all I'm trying to do is read in sequences > from a fasta file and digest them. It runs fine for a few hundred > sequences generating fragments as it should, then out of nowhere it > will run into the same error, but with different coordinates. > > Possibly this module isn't working properly for me? > > ------------- EXCEPTION ------------- > MSG: Bad start,end parameters. Start [2002] has to be less than end > [2001] > STACK Bio::PrimarySeq::subseq > /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:362 > STACK Bio::Seq::subseq > /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/Seq.pm:636 > STACK Bio::Restriction::Analysis::fragment_maps > /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/Restriction/ > Analysis.pm:552 > STACK toplevel Restriction_analyser_multi_CpG_a.pl:182 OK, it seems to be a bug, you should submit it to http://bugzilla.bioperl.org/enter_bug.cgi?product=Bioperl Basically it happens when you have a site for a blunt-end cutter at the end of your sequence. I think you are not interested in that site because is a cut at the end of a non-circular sequence that don' t produce fragments.... If so you can simply change line 552 in your Analysis.pm module from this: $seq{$start}=$self->{'_seq'}->subseq($start, $stop); to this: $seq{$start}=$self->{'_seq'}->subseq($start, $stop) unless $start > $stop; HTH Remo From smarkel at scitegic.com Wed Jan 26 13:50:35 2005 From: smarkel at scitegic.com (Scott Markel) Date: Wed Jan 26 13:48:24 2005 Subject: [Bioperl-l] Bio::SearchIO::blast parsing problem with long hit scores Message-ID: <41F7E67B.9090804@scitegic.com> I have a blastn result with a hit score that's in exponential notation. Bio::SearchIO::blast truncates "2.741e+004" to "004". I get the same result in 1.4 and 1.5. Sequences producing significant alignments: (bits) Value emb|AJ010957.1|HAAJ10957 Hippopotamus amphibius complete mitocho... 2.741e+004 0.0 gb|U31048.1|PRU31048 Pronolagus rupestris, Donkerpoort, South Af... 305 1e-080 Scott -- Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel@scitegic.com SciTegic Inc. mobile: +1 858 205 3653 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 San Diego, CA 92123 fax: +1 858 279 8804 USA web: http://www.scitegic.com From palmeida at igc.gulbenkian.pt Wed Jan 26 14:23:11 2005 From: palmeida at igc.gulbenkian.pt (Paulo Almeida) Date: Wed Jan 26 14:20:59 2005 Subject: [Bioperl-l] bioperl development In-Reply-To: <9053D49D-6FC3-11D9-BF87-000393C44276@duke.edu> References: <9053D49D-6FC3-11D9-BF87-000393C44276@duke.edu> Message-ID: <20050126192311.GA13534@bioinf.igc.gulbenkian.pt> You might want to consider gforge (http://gforge.org), which was created as a branch of the Sourceforge code, when that ceased to be Open Source. That could bring SourceForge's features to the existing infrastructure, giving you the best of both worlds. I never used it, but I wouldn't mind finding out how it works, if you would be interested. -Paulo On Wed, Jan 26, 2005 at 12:56:11PM -0500, Jason Stajich wrote: > > If we could get content-management and RSS feeds to be easy to update > and edit that might make sense. If we moved a majority of the web site > over to something like moveable-type. This is what in fact the > biopython.org site is now done with and how the news.open-bio.org site > is run. > > These are good thoughts - as always it take some energy and time to put > into place a new system. We really welcome anyone trying to make this > a better system. At some level it is hard for the core developers to > be project managers, developers, and system administrators. So any > help is really much appreciated. From palmeida at igc.gulbenkian.pt Wed Jan 26 14:23:11 2005 From: palmeida at igc.gulbenkian.pt (Paulo Almeida) Date: Wed Jan 26 14:21:06 2005 Subject: [Bioperl-l] bioperl development In-Reply-To: <9053D49D-6FC3-11D9-BF87-000393C44276@duke.edu> References: <9053D49D-6FC3-11D9-BF87-000393C44276@duke.edu> Message-ID: <20050126192311.GA13534@bioinf.igc.gulbenkian.pt> You might want to consider gforge (http://gforge.org), which was created as a branch of the Sourceforge code, when that ceased to be Open Source. That could bring SourceForge's features to the existing infrastructure, giving you the best of both worlds. I never used it, but I wouldn't mind finding out how it works, if you would be interested. -Paulo On Wed, Jan 26, 2005 at 12:56:11PM -0500, Jason Stajich wrote: > > If we could get content-management and RSS feeds to be easy to update > and edit that might make sense. If we moved a majority of the web site > over to something like moveable-type. This is what in fact the > biopython.org site is now done with and how the news.open-bio.org site > is run. > > These are good thoughts - as always it take some energy and time to put > into place a new system. We really welcome anyone trying to make this > a better system. At some level it is hard for the core developers to > be project managers, developers, and system administrators. So any > help is really much appreciated. From allenday at ucla.edu Wed Jan 26 14:51:38 2005 From: allenday at ucla.edu (Allen Day) Date: Wed Jan 26 14:47:29 2005 Subject: [Bioperl-l] bioperl development In-Reply-To: <20050126192311.GA13534@bioinf.igc.gulbenkian.pt> References: <9053D49D-6FC3-11D9-BF87-000393C44276@duke.edu> <20050126192311.GA13534@bioinf.igc.gulbenkian.pt> Message-ID: i really like having projects on sourceforge for all the reasons mentioned in this thread. i'd try to use a sourceforge or gforge site, if it was available. -allen On Wed, 26 Jan 2005, Paulo Almeida wrote: > You might want to consider gforge (http://gforge.org), which was created > as a branch of the Sourceforge code, when that ceased to be Open Source. > That could bring SourceForge's features to the existing infrastructure, > giving you the best of both worlds. I never used it, but I wouldn't mind > finding out how it works, if you would be interested. > > -Paulo > > On Wed, Jan 26, 2005 at 12:56:11PM -0500, Jason Stajich wrote: > > > > If we could get content-management and RSS feeds to be easy to update > > and edit that might make sense. If we moved a majority of the web site > > over to something like moveable-type. This is what in fact the > > biopython.org site is now done with and how the news.open-bio.org site > > is run. > > > > These are good thoughts - as always it take some energy and time to put > > into place a new system. We really welcome anyone trying to make this > > a better system. At some level it is hard for the core developers to > > be project managers, developers, and system administrators. So any > > help is really much appreciated. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From allenday at ucla.edu Wed Jan 26 14:51:38 2005 From: allenday at ucla.edu (Allen Day) Date: Wed Jan 26 14:47:34 2005 Subject: [Bioperl-l] bioperl development In-Reply-To: <20050126192311.GA13534@bioinf.igc.gulbenkian.pt> References: <9053D49D-6FC3-11D9-BF87-000393C44276@duke.edu> <20050126192311.GA13534@bioinf.igc.gulbenkian.pt> Message-ID: i really like having projects on sourceforge for all the reasons mentioned in this thread. i'd try to use a sourceforge or gforge site, if it was available. -allen On Wed, 26 Jan 2005, Paulo Almeida wrote: > You might want to consider gforge (http://gforge.org), which was created > as a branch of the Sourceforge code, when that ceased to be Open Source. > That could bring SourceForge's features to the existing infrastructure, > giving you the best of both worlds. I never used it, but I wouldn't mind > finding out how it works, if you would be interested. > > -Paulo > > On Wed, Jan 26, 2005 at 12:56:11PM -0500, Jason Stajich wrote: > > > > If we could get content-management and RSS feeds to be easy to update > > and edit that might make sense. If we moved a majority of the web site > > over to something like moveable-type. This is what in fact the > > biopython.org site is now done with and how the news.open-bio.org site > > is run. > > > > These are good thoughts - as always it take some energy and time to put > > into place a new system. We really welcome anyone trying to make this > > a better system. At some level it is hard for the core developers to > > be project managers, developers, and system administrators. So any > > help is really much appreciated. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gnf.org Wed Jan 26 15:10:34 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Wed Jan 26 15:09:28 2005 Subject: [Bioperl-l] Re: RPMs for bioperl In-Reply-To: References: Message-ID: <56220C54-6FD6-11D9-9E4D-000A95AE92B0@gnf.org> On Jan 25, 2005, at 10:27 PM, Allen Day wrote: > Hilmar, do you know about: > > * Bio::DB::BioDB > * Bio::DB::Query::BioQuery These come with (are modules in) bioperl-db. If you have bioperl-db the dependency should be satisfied. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From jason.stajich at duke.edu Wed Jan 26 15:54:19 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Jan 26 15:50:14 2005 Subject: [Bioperl-l] Re: [GMOD-devel] RPMs for bioperl In-Reply-To: References: Message-ID: <84383400f40db30e628067a6867d4c75@duke.edu> srsperl is only for people with srs you should have it be ignored. if you are using cpan2rpm I would tell it to ignore certain dependancies. When I built rpms for our internal machines I just had it ignore the non-essential ones like Ace, etc. -jason On Jan 26, 2005, at 1:27 AM, Allen Day wrote: > Hi, > > I've put together a set of RPMs for Bioperl, Bioperl-DB, Bioperl-Run, > and > GBrowse. It's still a work in progress, but you can see the current > state > here: > http://sumo.genetics.ucla.edu/~allenday/flute/bioperl-1.5/i686/ > There are some related directories rooted here: > http://sumo.genetics.ucla.edu/~allenday/flute/ > > The RPMs don't install clean. This is because I'm using an automated > tool > to build the RPMs, and it looks through each downloaded tarball from > CPAN > to see what that tarball depends on. Sometimes there are dependencies > on > libraries that don't exist on CPAN, or might be altogether > non-existent. > These are the problem libraries and binaries: > > % rpm -Uvh --test *.rpm > error: Failed dependencies: > perl(Ace::Browser::LocalSiteDefs) is needed by > perl-AcePerl-1.87-allenday > perl(Bio::Das::ProServer::SourceHydra) is needed by > perl-Bio-Das-0.99-allenday > perl(IndexSupport) is needed by perl-Bio-Das-0.99-allenday > perl(srsperl) is needed by perl-bioperl-1.5.0-allenday > perl(Bio::DB::BioDB) is needed by > perl-Generic-Genome-Browser-1.62-allenday > perl(Bio::DB::Query::BioQuery) is needed by > perl-Generic-Genome-Browser-1.62-allenday > perl(GuessDirectories) is needed by > perl-Generic-Genome-Browser-1.62-allenday > perl(MOBY::Client::Central) is needed by > perl-Generic-Genome-Browser-1.62-allenday > perl(MOBY::Client::Service) is needed by > perl-Generic-Genome-Browser-1.62-allenday > perl(MOBY::CommonSubs) is needed by > perl-Generic-Genome-Browser-1.62-allenday > perl(MOBY::MobyXMLConstants) is needed by > perl-Generic-Genome-Browser-1.62-allenday > perl(PPM::Archive) is needed by > perl-Generic-Genome-Browser-1.62-allenday > perl(MQClient::MQSeries) is needed by > perl-SOAP-Lite-0.60-allenday > perl(MQSeries) is needed by perl-SOAP-Lite-0.60-allenday > perl(MQSeries::Message) is needed by > perl-SOAP-Lite-0.60-allenday > perl(MQSeries::Queue) is needed by perl-SOAP-Lite-0.60-allenday > perl(MQSeries::QueueManager) is needed by > perl-SOAP-Lite-0.60-allenday > /bin/perl is needed by perl-Tk-804.027-allenday > /usr/local/bin/perl is needed by perl-Tk-804.027-allenday > perl(Tk::LabRadio) is needed by perl-Tk-804.027-allenday > perl(Tk::TextReindex) is needed by perl-Tk-804.027-allenday > perl(XML::LibXML) >= 1.57 is needed by > perl-XML-LibXSLT-1.57-allenday > perl(XML::SAX::PurePerl::DTDDecls) is needed by > perl-XML-SAX-0.12-allenday > perl(XML::SAX::PurePerl::DocType) is needed by > perl-XML-SAX-0.12-allenday > perl(XML::SAX::PurePerl::EncodingDetect) is needed by > perl-XML-SAX-0.12-allenday > perl(XML::SAX::PurePerl::XMLDecl) is needed by > perl-XML-SAX-0.12-allenday > > Lincoln, I'm guessing you can help me with: > > * Ace::Browser::LocalSiteDefs > * Bio::Das::ProServer::SourceHydra > * GuessDirectories > * IndexSupport > * MOBY::* > > Hilmar, do you know about: > > * Bio::DB::BioDB > * Bio::DB::Query::BioQuery > > I'm sure someone on this list knows where to get > > * srsperl > * PPM::Archive > > If anyone can shed light on where any of these libraries can be found, > I'd > appreciate it. Thanks. > > -Allen > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting > Tool for open source databases. Create drag-&-drop reports. Save time > by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. > Download a FREE copy at http://www.intelliview.com/go/osdn_nl > _______________________________________________ > Gmod-devel mailing list > Gmod-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-devel > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From jason.stajich at duke.edu Wed Jan 26 15:55:06 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Jan 26 15:50:58 2005 Subject: [Bioperl-l] Bio::SearchIO::blast parsing problem with long hit scores In-Reply-To: <41F7E67B.9090804@scitegic.com> References: <41F7E67B.9090804@scitegic.com> Message-ID: <28d24e7f297bd7a1c6aaf2a14fa7ce8c@duke.edu> Can you put an example report with the bug report on bugzilla? I think I have a fix but want to test it out on the real data. -jason On Jan 26, 2005, at 1:50 PM, Scott Markel wrote: > I have a blastn result with a hit score that's in exponential > notation. Bio::SearchIO::blast truncates "2.741e+004" to "004". > I get the same result in 1.4 and 1.5. > > Sequences producing significant alignments: > (bits) Value > > emb|AJ010957.1|HAAJ10957 Hippopotamus amphibius complete mitocho... > 2.741e+004 0.0 > gb|U31048.1|PRU31048 Pronolagus rupestris, Donkerpoort, South Af... > 305 1e-080 > > Scott > > -- > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel@scitegic.com > SciTegic Inc. mobile: +1 858 205 3653 > 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 > San Diego, CA 92123 fax: +1 858 279 8804 > USA web: http://www.scitegic.com > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From garrettsorensen at gmail.com Wed Jan 26 16:12:36 2005 From: garrettsorensen at gmail.com (Garrett Sorensen) Date: Wed Jan 26 16:09:36 2005 Subject: [Bioperl-l] Restriction::Analysis strange error - please help In-Reply-To: <145446a61b72364ba730f2f89b075d99@biogem.it> References: <820FADCE-6F38-11D9-A47D-000A959E1622@salmonella.org> <8f683cdabefdee9bdf6630d37c008f79@biogem.it> <145446a61b72364ba730f2f89b075d99@biogem.it> Message-ID: Thanks so much Remo, the modification to Analysis.pm worked beautifully. I will submit the bug report as you suggested. Many thanks, Garrett On Wed, 26 Jan 2005 19:05:52 +0100, Remo Sanges wrote: > > On Jan 26, 2005, at 3:57 PM, Garrett Sorensen wrote: > > > Thanks Remo.. To test this module that is the only code I'm using > > right now... I'm no longer grabbing a subsequence so it can't be > > calculation error. To test all I'm trying to do is read in sequences > > from a fasta file and digest them. It runs fine for a few hundred > > sequences generating fragments as it should, then out of nowhere it > > will run into the same error, but with different coordinates. > > > > Possibly this module isn't working properly for me? > > > > ------------- EXCEPTION ------------- > > MSG: Bad start,end parameters. Start [2002] has to be less than end > > [2001] > > STACK Bio::PrimarySeq::subseq > > /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:362 > > STACK Bio::Seq::subseq > > /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/Seq.pm:636 > > STACK Bio::Restriction::Analysis::fragment_maps > > /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/Restriction/ > > Analysis.pm:552 > > STACK toplevel Restriction_analyser_multi_CpG_a.pl:182 > > OK, > > it seems to be a bug, you should submit it to > http://bugzilla.bioperl.org/enter_bug.cgi?product=Bioperl > > Basically it happens when you have a site for a blunt-end cutter > at the end of your sequence. > > I think you are not interested in that site because > is a cut at the end of a non-circular sequence that > don' t produce fragments.... > > If so you can simply change line 552 in your Analysis.pm > module from this: > > $seq{$start}=$self->{'_seq'}->subseq($start, $stop); > > to this: > > $seq{$start}=$self->{'_seq'}->subseq($start, $stop) unless $start > > $stop; > > HTH > > Remo > > From allenday at ucla.edu Wed Jan 26 18:01:28 2005 From: allenday at ucla.edu (Allen Day) Date: Wed Jan 26 17:57:22 2005 Subject: [Bioperl-l] Re: RPMs for bioperl In-Reply-To: <56220C54-6FD6-11D9-9E4D-000A95AE92B0@gnf.org> References: <56220C54-6FD6-11D9-9E4D-000A95AE92B0@gnf.org> Message-ID: Hilmar, I see that these are only in the bioperl-db cvs, not in the 0.1 tarball here: http://www.bioperl.org/Core/Latest/index.shtml . Can we increment the version in the bioperl-db repository to 0.2 so I can RPM this? -Allen and it did not contain either of these packages On Wed, 26 Jan 2005, Hilmar Lapp wrote: > > On Jan 25, 2005, at 10:27 PM, Allen Day wrote: > > > Hilmar, do you know about: > > > > * Bio::DB::BioDB > > * Bio::DB::Query::BioQuery > > These come with (are modules in) bioperl-db. If you have bioperl-db the > dependency should be satisfied. > > -hilmar > From allenday at ucla.edu Wed Jan 26 18:03:25 2005 From: allenday at ucla.edu (Allen Day) Date: Wed Jan 26 17:59:18 2005 Subject: [Bioperl-l] Re: [GMOD-devel] RPMs for bioperl In-Reply-To: <84383400f40db30e628067a6867d4c75@duke.edu> References: <84383400f40db30e628067a6867d4c75@duke.edu> Message-ID: i can do that, but i'd rather just include all optional modules are prerequisites rather than making a custom specfile. any idea where i can find srsperl? a google search didn't turn anything up. -allen On Wed, 26 Jan 2005, Jason Stajich wrote: > srsperl is only for people with srs you should have it be ignored. > > if you are using cpan2rpm I would tell it to ignore certain > dependancies. When I built rpms for our internal machines I just had it > ignore the non-essential ones like Ace, etc. > > -jason > On Jan 26, 2005, at 1:27 AM, Allen Day wrote: > > > Hi, > > > > I've put together a set of RPMs for Bioperl, Bioperl-DB, Bioperl-Run, > > and > > GBrowse. It's still a work in progress, but you can see the current > > state > > here: > > http://sumo.genetics.ucla.edu/~allenday/flute/bioperl-1.5/i686/ > > There are some related directories rooted here: > > http://sumo.genetics.ucla.edu/~allenday/flute/ > > > > The RPMs don't install clean. This is because I'm using an automated > > tool > > to build the RPMs, and it looks through each downloaded tarball from > > CPAN > > to see what that tarball depends on. Sometimes there are dependencies > > on > > libraries that don't exist on CPAN, or might be altogether > > non-existent. > > These are the problem libraries and binaries: > > > > % rpm -Uvh --test *.rpm > > error: Failed dependencies: > > perl(Ace::Browser::LocalSiteDefs) is needed by > > perl-AcePerl-1.87-allenday > > perl(Bio::Das::ProServer::SourceHydra) is needed by > > perl-Bio-Das-0.99-allenday > > perl(IndexSupport) is needed by perl-Bio-Das-0.99-allenday > > perl(srsperl) is needed by perl-bioperl-1.5.0-allenday > > perl(Bio::DB::BioDB) is needed by > > perl-Generic-Genome-Browser-1.62-allenday > > perl(Bio::DB::Query::BioQuery) is needed by > > perl-Generic-Genome-Browser-1.62-allenday > > perl(GuessDirectories) is needed by > > perl-Generic-Genome-Browser-1.62-allenday > > perl(MOBY::Client::Central) is needed by > > perl-Generic-Genome-Browser-1.62-allenday > > perl(MOBY::Client::Service) is needed by > > perl-Generic-Genome-Browser-1.62-allenday > > perl(MOBY::CommonSubs) is needed by > > perl-Generic-Genome-Browser-1.62-allenday > > perl(MOBY::MobyXMLConstants) is needed by > > perl-Generic-Genome-Browser-1.62-allenday > > perl(PPM::Archive) is needed by > > perl-Generic-Genome-Browser-1.62-allenday > > perl(MQClient::MQSeries) is needed by > > perl-SOAP-Lite-0.60-allenday > > perl(MQSeries) is needed by perl-SOAP-Lite-0.60-allenday > > perl(MQSeries::Message) is needed by > > perl-SOAP-Lite-0.60-allenday > > perl(MQSeries::Queue) is needed by perl-SOAP-Lite-0.60-allenday > > perl(MQSeries::QueueManager) is needed by > > perl-SOAP-Lite-0.60-allenday > > /bin/perl is needed by perl-Tk-804.027-allenday > > /usr/local/bin/perl is needed by perl-Tk-804.027-allenday > > perl(Tk::LabRadio) is needed by perl-Tk-804.027-allenday > > perl(Tk::TextReindex) is needed by perl-Tk-804.027-allenday > > perl(XML::LibXML) >= 1.57 is needed by > > perl-XML-LibXSLT-1.57-allenday > > perl(XML::SAX::PurePerl::DTDDecls) is needed by > > perl-XML-SAX-0.12-allenday > > perl(XML::SAX::PurePerl::DocType) is needed by > > perl-XML-SAX-0.12-allenday > > perl(XML::SAX::PurePerl::EncodingDetect) is needed by > > perl-XML-SAX-0.12-allenday > > perl(XML::SAX::PurePerl::XMLDecl) is needed by > > perl-XML-SAX-0.12-allenday > > > > Lincoln, I'm guessing you can help me with: > > > > * Ace::Browser::LocalSiteDefs > > * Bio::Das::ProServer::SourceHydra > > * GuessDirectories > > * IndexSupport > > * MOBY::* > > > > Hilmar, do you know about: > > > > * Bio::DB::BioDB > > * Bio::DB::Query::BioQuery > > > > I'm sure someone on this list knows where to get > > > > * srsperl > > * PPM::Archive > > > > If anyone can shed light on where any of these libraries can be found, > > I'd > > appreciate it. Thanks. > > > > -Allen > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting > > Tool for open source databases. Create drag-&-drop reports. Save time > > by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. > > Download a FREE copy at http://www.intelliview.com/go/osdn_nl > > _______________________________________________ > > Gmod-devel mailing list > > Gmod-devel@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/gmod-devel > > > > > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > From hlapp at gnf.org Wed Jan 26 18:16:54 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Wed Jan 26 18:13:01 2005 Subject: [Bioperl-l] Re: RPMs for bioperl In-Reply-To: Message-ID: <5DE987EC-6FF0-11D9-834C-000A959EB4C4@gnf.org> The 0.1 tarball is out-dated since a long time. Do you want me to introduce a specific tag? -hilmar On Wednesday, January 26, 2005, at 03:01 PM, Allen Day wrote: > Hilmar, > > I see that these are only in the bioperl-db cvs, not in the 0.1 tarball > here: http://www.bioperl.org/Core/Latest/index.shtml . > > Can we increment the version in the bioperl-db repository to 0.2 so I > can > RPM this? > > -Allen > > > and it did not contain either of these packages > > On Wed, 26 Jan 2005, Hilmar Lapp wrote: > >> >> On Jan 25, 2005, at 10:27 PM, Allen Day wrote: >> >>> Hilmar, do you know about: >>> >>> * Bio::DB::BioDB >>> * Bio::DB::Query::BioQuery >> >> These come with (are modules in) bioperl-db. If you have bioperl-db >> the >> dependency should be satisfied. >> >> -hilmar >> >> -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From allenday at ucla.edu Wed Jan 26 18:53:10 2005 From: allenday at ucla.edu (Allen Day) Date: Wed Jan 26 18:49:06 2005 Subject: [Bioperl-l] RPMs for bioperl (... and GBrowse, and lsid-perl, and biomoby) In-Reply-To: <200501261643.10657.lstein@cshl.edu> References: <200501261643.10657.lstein@cshl.edu> Message-ID: On Wed, 26 Jan 2005, Lincoln Stein wrote: > I'm glad you're doing this. > > Can you simply turn off the warning messages about > Ace::Browser::LocalSiteDefs, Bio::Das::ProServer::SourceHydra, and > MOBY::*? They are all optional and won't do anything useful without > a lot of extra configuration. I could do that by creating the specfiles by hand, but I'd rather not do this. Is it difficult to just add the Bio::* and Ace::* modules to their respective CPAN modules? ========== % grep -r 'Hydra' ./Bio-Das-0.99/* ./Bio-Das-0.99/Das/ProServer/Config.pm:use Bio::Das::ProServer::SourceHydra; ./Bio-Das-0.99/Das/ProServer/Config.pm:# build all known SourceAdaptors (including those Hydra-based) ./Bio-Das-0.99/Das/ProServer/Config.pm:# build SourceHydra for a given dsn/hydraname ./Bio-Das-0.99/Das/ProServer/Config.pm: my $hydraimpl = "Bio::Das::ProServer::SourceHydra::".$self->{'adaptors'}->{$hydraname}->{'hydra'}; ========== Missing SourceHydra.pm ========== % grep -r 'Ace::Browser::LocalSiteDefs' ./AcePerl-1.87/* ./AcePerl-1.87/Ace/Browser/SiteDefs.pm:use Ace::Browser::LocalSiteDefs '$SITE_DEFS'; ./AcePerl-1.87/acebrowser/conf/moviedb.pm:use Ace::Browser::LocalSiteDefs '$HTML_PATH'; ./AcePerl-1.87/acebrowser/conf/default.pm:use Ace::Browser::LocalSiteDefs '$HTML_PATH'; ./AcePerl-1.87/acebrowser/conf/simple.pm:use Ace::Browser::LocalSiteDefs '$HTML_PATH'; ./AcePerl-1.87/Makefile.PL: eval 'use Ace::Browser::LocalSiteDefs qw($SITE_DEFS $CGI_PATH $HTML_PATH)'; ./AcePerl-1.87/Makefile.PL:package Ace::Browser::LocalSiteDefs; ./AcePerl-1.87/Makefile.PL:Ace::Browser::LocalSiteDefs - Master Configuration file for AceBrowser ./AcePerl-1.87/Makefile.PL: use Ace::Browser::LocalSiteDefs qw($SITE_DEFS $HTML_PATH $CGI_PATH); ./AcePerl-1.87/README.ACEBROWSER:Ace::Browser::LocalSiteDefs, typically somewhere inside the ./AcePerl-1.87/README.ACEBROWSER: perl -MAce::Browser::LocalSiteDefs \ ./AcePerl-1.87/README.ACEBROWSER: -e 'print $Ace::Browser::LocalSiteDefs::SITE_DEFS,"\n"' ========== LocalSiteDefs.pm does't exist, because package Ace::Browser::LocalSiteDefs is, interestingly, defined in Makefile.PL and not installed. Can we make this a separate file, or at least define it in a file that is installed? MOBY::* If I fetch the biomoby and lsid-perl tarballs here: http://biomoby.org/releases/, and here: http://www-124.ibm.com/developerworks/oss/lsid/reference/tutorials/100/ I can resolve most of the MOBY::* requirements with this file. These are what's left: perl(MOBY::lsid::authority::dbConfigure) is needed by perl-biomoby-0.8.1-allenday perl(MOBY::MobyXMLConstants) is needed by perl-Generic-Genome-Browser-1.62-allenday Is it possible to get releases onto CPAN of both biomoby and lsid-perl, and to add these two MOBY files to MOBY? Win32::Registry This was introduced by lsid-perl by way of a Net::DNS requirement. I'm not sure what to do here. Looks like I may need to handcode a specfile here... PPM::Archive Where can I get this? Looks like I may need to handcode a specfile here... > GuessDirectories.pm is a GBrowse install utility that is part of the > package. I don't know why the RPM tool is complaining about it. i think it has to do with the way the rpm build process identifies what modules are "use"d. from what i surmise, two lists are included in the RPM, (1) a list of modules the package depends on, and (2) a list of modules the package provides. So in the case of Gbrowse's dependency on GuessDirectories, my guess is that it isn't correctly detecting GuessDirectories is provided by the package. A quick look in the Generic-Genome-Browser checkout shows the module is referenced in a few places, but not actually included in the distribution: ========== % grep -r 'GuessDirectories' Generic-Genome-Browser/* Generic-Genome-Browser/install_util/CVS/Entries:/GuessDirectories.pm/1.1/Sun Jun 8 23:29:48 2003//Generic-Genome-Browser/Makefile.PL:use GuessDirectories; Generic-Genome-Browser/Makefile.PL: $OPTIONS{CONF} = GuessDirectories->conf || "$OPTIONS{APACHE}/conf"; Generic-Genome-Browser/Makefile.PL: $OPTIONS{HTDOCS} = GuessDirectories->htdocs || "$OPTIONS{APACHE}/htdocs"; Generic-Genome-Browser/Makefile.PL: $OPTIONS{CGIBIN} = GuessDirectories->cgibin || "$OPTIONS{APACHE}/cgi-bin"; Generic-Genome-Browser/MANIFEST:install_util/GuessDirectories.pm ========== notice that there isn't a file containing 'package GuessDirectories;'. can you please add install_util/GuessDirectories.pm to the repository? > I don't know about IndexSupport. What is it? A download of Bio::Das 0.99 from: http://search.cpan.org/~lds/Bio-Das-0.99/ reveals: % grep -r IndexSupport Bio-Das-0.99 Bio-Das-0.99/Das/ProServer/SourceAdaptor/haplotype.pm:use IndexSupport; Bio-Das-0.99/Das/ProServer/SourceAdaptor/haplotype.pm: my $conf = IndexSupport->new("$root/conf",'','Homo_sapiens'); Bio-Das-0.99/Das/ProServer/SourceAdaptor/snp.pm:use IndexSupport;Bio-Das-0.99/Das/ProServer/SourceAdaptor/snp.pm: my $conf = IndexSupport->new("$root/conf",'','Homo_sapiens'); Bio-Das-0.99/Das/ProServer/SourceAdaptor/sts.pm:use IndexSupport; Bio-Das-0.99/Das/ProServer/SourceAdaptor/sts.pm: my $conf = IndexSupport->new("$root/conf",'','Homo_sapiens'); Bio-Das-0.99/Das/ProServer/SourceAdaptor/trace.pm:use IndexSupport; Bio-Das-0.99/Das/ProServer/SourceAdaptor/trace.pm: my $conf = IndexSupport->new("$root/conf",'','Homo_sapiens'); notice that there isn't a file containing 'package IndexSupport;'. can you please add IndexSupport.pm to the CPAN released module? interestingly, there is no reference to IndexSupport in the open-bio das repository. -allen > > Lincoln > > On Wednesday 26 January 2005 01:27 am, Allen Day wrote: > > Hi, > > > > I've put together a set of RPMs for Bioperl, Bioperl-DB, > > Bioperl-Run, and GBrowse. It's still a work in progress, but you > > can see the current state here: > > http://sumo.genetics.ucla.edu/~allenday/flute/bioperl-1.5/i686/ > > There are some related directories rooted here: > > http://sumo.genetics.ucla.edu/~allenday/flute/ > > > > The RPMs don't install clean. This is because I'm using an > > automated tool to build the RPMs, and it looks through each > > downloaded tarball from CPAN to see what that tarball depends on. > > Sometimes there are dependencies on libraries that don't exist on > > CPAN, or might be altogether non-existent. These are the problem > > libraries and binaries: > > > > % rpm -Uvh --test *.rpm > > error: Failed dependencies: > > perl(Ace::Browser::LocalSiteDefs) is needed by > > perl-AcePerl-1.87-allenday perl(Bio::Das::ProServer::SourceHydra) > > is needed by perl-Bio-Das-0.99-allenday perl(IndexSupport) is > > needed by perl-Bio-Das-0.99-allenday perl(srsperl) is needed by > > perl-bioperl-1.5.0-allenday perl(Bio::DB::BioDB) is needed by > > perl-Generic-Genome-Browser-1.62-allenday > > perl(Bio::DB::Query::BioQuery) is needed by > > perl-Generic-Genome-Browser-1.62-allenday perl(GuessDirectories) is > > needed by perl-Generic-Genome-Browser-1.62-allenday > > perl(MOBY::Client::Central) is needed by > > perl-Generic-Genome-Browser-1.62-allenday > > perl(MOBY::Client::Service) is needed by > > perl-Generic-Genome-Browser-1.62-allenday perl(MOBY::CommonSubs) is > > needed by perl-Generic-Genome-Browser-1.62-allenday > > perl(MOBY::MobyXMLConstants) is needed by > > perl-Generic-Genome-Browser-1.62-allenday perl(PPM::Archive) is > > needed by perl-Generic-Genome-Browser-1.62-allenday > > perl(MQClient::MQSeries) is needed by perl-SOAP-Lite-0.60-allenday > > perl(MQSeries) is needed by perl-SOAP-Lite-0.60-allenday > > perl(MQSeries::Message) is needed by perl-SOAP-Lite-0.60-allenday > > perl(MQSeries::Queue) is needed by perl-SOAP-Lite-0.60-allenday > > perl(MQSeries::QueueManager) is needed by > > perl-SOAP-Lite-0.60-allenday /bin/perl is needed by > > perl-Tk-804.027-allenday > > /usr/local/bin/perl is needed by perl-Tk-804.027-allenday > > perl(Tk::LabRadio) is needed by perl-Tk-804.027-allenday > > perl(Tk::TextReindex) is needed by perl-Tk-804.027-allenday > > perl(XML::LibXML) >= 1.57 is needed by > > perl-XML-LibXSLT-1.57-allenday perl(XML::SAX::PurePerl::DTDDecls) > > is needed by perl-XML-SAX-0.12-allenday > > perl(XML::SAX::PurePerl::DocType) is needed by > > perl-XML-SAX-0.12-allenday perl(XML::SAX::PurePerl::EncodingDetect) > > is needed by perl-XML-SAX-0.12-allenday > > perl(XML::SAX::PurePerl::XMLDecl) is needed by > > perl-XML-SAX-0.12-allenday > > > > Lincoln, I'm guessing you can help me with: > > > > * Ace::Browser::LocalSiteDefs > > * Bio::Das::ProServer::SourceHydra > > * GuessDirectories > > * IndexSupport > > * MOBY::* > > > > Hilmar, do you know about: > > > > * Bio::DB::BioDB > > * Bio::DB::Query::BioQuery > > > > I'm sure someone on this list knows where to get > > > > * srsperl > > * PPM::Archive > > > > If anyone can shed light on where any of these libraries can be > > found, I'd appreciate it. Thanks. > > > > -Allen > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by: IntelliVIEW -- Interactive > > Reporting Tool for open source databases. Create drag-&-drop > > reports. Save time by over 75%! Publish reports on the web. Export > > to DOC, XLS, RTF, etc. Download a FREE copy at > > http://www.intelliview.com/go/osdn_nl > > _______________________________________________ > > Gmod-devel mailing list > > Gmod-devel@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/gmod-devel > > From allenday at ucla.edu Wed Jan 26 18:57:25 2005 From: allenday at ucla.edu (Allen Day) Date: Wed Jan 26 18:53:15 2005 Subject: [Bioperl-l] Re: RPMs for bioperl In-Reply-To: <5DE987EC-6FF0-11D9-834C-000A959EB4C4@gnf.org> References: <5DE987EC-6FF0-11D9-834C-000A959EB4C4@gnf.org> Message-ID: Whatever works for you. My main aim is to have a tarball that contains the missing packages. Ideally this will be available in the form of a bioperl-db release on CPAN to make the dependency automatically resolveable, but I'd settle for a tarball at http://www.bioperl.org The version number and/or tag don't really matter to me, as long as I have something downloadable. Thanks. -Allen On Wed, 26 Jan 2005, Hilmar Lapp wrote: > The 0.1 tarball is out-dated since a long time. > > Do you want me to introduce a specific tag? > > -hilmar > > On Wednesday, January 26, 2005, at 03:01 PM, Allen Day wrote: > > > Hilmar, > > > > I see that these are only in the bioperl-db cvs, not in the 0.1 tarball > > here: http://www.bioperl.org/Core/Latest/index.shtml . > > > > Can we increment the version in the bioperl-db repository to 0.2 so I > > can > > RPM this? > > > > -Allen > > > > > > and it did not contain either of these packages > > > > On Wed, 26 Jan 2005, Hilmar Lapp wrote: > > > >> > >> On Jan 25, 2005, at 10:27 PM, Allen Day wrote: > >> > >>> Hilmar, do you know about: > >>> > >>> * Bio::DB::BioDB > >>> * Bio::DB::Query::BioQuery > >> > >> These come with (are modules in) bioperl-db. If you have bioperl-db > >> the > >> dependency should be satisfied. > >> > >> -hilmar > >> > >> > From allenday at ucla.edu Wed Jan 26 18:58:42 2005 From: allenday at ucla.edu (Allen Day) Date: Wed Jan 26 18:54:34 2005 Subject: [Bioperl-l] bioperl development In-Reply-To: References: <9053D49D-6FC3-11D9-BF87-000393C44276@duke.edu> <20050126192311.GA13534@bioinf.igc.gulbenkian.pt> Message-ID: maybe this is worth considering for the bioperl-(ng/noveau/2.0) project, which doesn't yet have a home? On Wed, 26 Jan 2005, Allen Day wrote: > i really like having projects on sourceforge for all the reasons mentioned > in this thread. i'd try to use a sourceforge or gforge site, if it was > available. > > -allen > > > On Wed, 26 Jan 2005, Paulo Almeida wrote: > > > You might want to consider gforge (http://gforge.org), which was created > > as a branch of the Sourceforge code, when that ceased to be Open Source. > > That could bring SourceForge's features to the existing infrastructure, > > giving you the best of both worlds. I never used it, but I wouldn't mind > > finding out how it works, if you would be interested. > > > > -Paulo > > > > On Wed, Jan 26, 2005 at 12:56:11PM -0500, Jason Stajich wrote: > > > > > > If we could get content-management and RSS feeds to be easy to update > > > and edit that might make sense. If we moved a majority of the web site > > > over to something like moveable-type. This is what in fact the > > > biopython.org site is now done with and how the news.open-bio.org site > > > is run. > > > > > > These are good thoughts - as always it take some energy and time to put > > > into place a new system. We really welcome anyone trying to make this > > > a better system. At some level it is hard for the core developers to > > > be project managers, developers, and system administrators. So any > > > help is really much appreciated. > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From allenday at ucla.edu Wed Jan 26 18:58:42 2005 From: allenday at ucla.edu (Allen Day) Date: Wed Jan 26 18:54:40 2005 Subject: [Bioperl-l] bioperl development In-Reply-To: References: <9053D49D-6FC3-11D9-BF87-000393C44276@duke.edu> <20050126192311.GA13534@bioinf.igc.gulbenkian.pt> Message-ID: maybe this is worth considering for the bioperl-(ng/noveau/2.0) project, which doesn't yet have a home? On Wed, 26 Jan 2005, Allen Day wrote: > i really like having projects on sourceforge for all the reasons mentioned > in this thread. i'd try to use a sourceforge or gforge site, if it was > available. > > -allen > > > On Wed, 26 Jan 2005, Paulo Almeida wrote: > > > You might want to consider gforge (http://gforge.org), which was created > > as a branch of the Sourceforge code, when that ceased to be Open Source. > > That could bring SourceForge's features to the existing infrastructure, > > giving you the best of both worlds. I never used it, but I wouldn't mind > > finding out how it works, if you would be interested. > > > > -Paulo > > > > On Wed, Jan 26, 2005 at 12:56:11PM -0500, Jason Stajich wrote: > > > > > > If we could get content-management and RSS feeds to be easy to update > > > and edit that might make sense. If we moved a majority of the web site > > > over to something like moveable-type. This is what in fact the > > > biopython.org site is now done with and how the news.open-bio.org site > > > is run. > > > > > > These are good thoughts - as always it take some energy and time to put > > > into place a new system. We really welcome anyone trying to make this > > > a better system. At some level it is hard for the core developers to > > > be project managers, developers, and system administrators. So any > > > help is really much appreciated. > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From perlguy at hotmail.com Wed Jan 26 19:47:11 2005 From: perlguy at hotmail.com (Philip Parker) Date: Wed Jan 26 19:43:59 2005 Subject: [Bioperl-l] Interested in helping... Message-ID: My name is Philip and I'm interested in doing work with BioPerl but I am not a bioinformatician or biologist. I *am* interested in helping and learning in the process. I do have past professional experience with Perl. Philip Parker - perlguy ~at- hotmail.com From farid at vt.edu Wed Jan 26 20:03:28 2005 From: farid at vt.edu (Merchant Farid) Date: Wed Jan 26 19:59:26 2005 Subject: [Bioperl-l] Automate Fasta34.exe In-Reply-To: Message-ID: <000001c5040c$02c71f90$23af52c6@Merchant> Hi guys. I am trying to find the exhaustive homologous match of a given sequence against a given library. I run the fasta34.exe from my perl script,input the sequence & library file name which is stored in the perl folder and other input value.i got a output in the fasta format, which I parse thru and get the best sequence match above a given threshold value and store the match each at a time in a file. Now use this extracted sequence file to run the fasta again against the same library and inputs and try to call the fasta34 again and repeat the same procedure till u get the homologous match. If u find any new sequence comparded to the original one, append to the original fasta file. Using my code I am able to extract the sequence from the orginal file which fits the criteria above a threshold but callign the fasta34.exe would means that I have to sequence file name manually each time the fasta34.exe is called Can anybody please help me to solve the problem Following is my code #!usr/bin/perl -w print"\n********Running Fasta34********** \n \n"; print"\n Please enter the file sequence and library present in your path \n\n"; $result = system("c:/perl/sam/fasta34.exe"); print"\n\n*****enter the output file name you have given***\n"; $output = ; open(FASTA,"c:/perl/sam/$output") or die "cant open the output file \n"; my $seqname; my $iteration; print "Enter your cut off percentage"; $cut = ; while () { #compare the line of matching sequence if(m/>>(.{4,6})(.*)/) { $seqname = $1; $laterhalf = $2; } #print "SKIP A LINE iF A MATCH \n"; next if /^ini/; if (m/(\d+\.\d+)% identity/) { $per = $1; #check if match is above the cutoff percentage if ($per > $cut) { #print "\n\n$seqname$laterhalf \n";\ #print "identity match $per % \n"; #store the first line of the input file open(ORG,">>c://perl/sam/orginal.fasta"); open(OUT2IN,">c:/perl/sam/output1.aa"); print OUT2IN "$seqname$laterhalf \n"; print ORG "$seqname$laterhalf \n"; while() { if (m/^($seqname)(.*)/) { #store the match sequence print OUT2IN "$2 \n"; print ORG "$2 \n"; } if(m/>>/) { seek(FASTA,-100,1); last; } }close(OUT2IN); close(ORG); } } } close (FASTA); print $result; From brian_osborne at cognia.com Wed Jan 26 21:20:53 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Wed Jan 26 21:20:24 2005 Subject: [Bioperl-l] Reading all sequences using Bio::DB::Flat inSwissProtfile In-Reply-To: Message-ID: Chris and Kenny, Bio::Index::Swissprot has an id_parser() method now but the uniqueness of the key will be a concern, yes. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Chris Mungall Sent: Friday, January 21, 2005 12:33 PM To: Brian Osborne Cc: Daily, Kenneth Michael; bioperl-l@portal.open-bio.org Subject: RE: [Bioperl-l] Reading all sequences using Bio::DB::Flat inSwissProtfile Brian, Unfortunately the id_parser method isn't supported in Bio::Index::Swissprot Even if it was I don't think it would be sufficient here - Kenny needs to index using the feature fields. This implies that the search key wouldn't be unique. Bio::Index::Abstract requires a unique key for the index. Flexible indexing and retrieval such as this is best handled using some generic non-bioperl specific solution - RDB, XMLDB, SRS, Lucene, LuceGene etc I forgot to mention Don Gilbert's LuceGene in my original reply - it's a fairly sane open-source alternative to SRS. It handles lots of bioinformatics file formats (not sure about swissprot but I'm sure it could be added) See: http://www.gmod.org/lucegene/index.shtml Cheers Chris On Fri, 21 Jan 2005, Brian Osborne wrote: > Kenny, > > Did you take a look at Bio/Index/Swissprot.pm? What's important for you will > be building the index using the keys you're interested in as opposed to the > default key, using the id_parser method. See the Bio::Index section in the > bptutorial for an example. > > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Daily, > Kenneth Michael > Sent: Wednesday, January 19, 2005 11:49 AM > To: bioperl-l@portal.open-bio.org > Subject: [Bioperl-l] Reading all sequences using Bio::DB::Flat in > SwissProtfile > > > I want to work with a local copy of the SwissProt database, and need to > search through all of the entries. I only see methods to return sequences by > accession. However, I cannot use just FASTA format of the SwissProt records, > as I need to use the feature fields. What I need to learn is how to do a DB > search on the features field of the SwissProt records, if its possible. > Would there be any advantage do doing it with the DB instead of just using > SeqIO as an input stream? I think it might, since every time I want to do a > search I must read in the entire file again, which is very costly. Thank > you. > > Kenny Daily > Indiana University > School of Informatics > kmdaily [at] indiana [dot] edu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From d.humphreys at victorchang.unsw.edu.au Wed Jan 26 20:55:12 2005 From: d.humphreys at victorchang.unsw.edu.au (David Humphreys) Date: Wed Jan 26 23:58:05 2005 Subject: [Bioperl-l] Help can't solve an internal 500 error Message-ID: Hi bioperl-groovers, Has anyone ever seen the following error message? MSG: WebDBSeqI Request error: 500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (Bad protocol 'tcp') I have traced previous threads from the archives but they appear to be slightly different internal 500 error. What is confusing me the most is that the script that leaves this error worked perfectly on my older machine (running win XP) but not on my colleagues machine (also running XP). The errors cascade from deep within bioperl and I have a feeling it has something to do with the way the machine is setup rather than the scripts themselves. Any ideas? thanks in advance Dave From nathanhaigh at ukonline.co.uk Thu Jan 27 03:36:22 2005 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Thu Jan 27 03:32:44 2005 Subject: [Bioperl-l] Help can't solve an internal 500 error In-Reply-To: Message-ID: Would you be able to supply the script that produces this error, so that we may be able to reproduce the error. Nathan > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of David Humphreys > Sent: 27 January 2005 01:55 > To: bioperl-l@portal.open-bio.org > Subject: [Bioperl-l] Help can't solve an internal 500 error > > Hi bioperl-groovers, > > Has anyone ever seen the following error message? > > > MSG: WebDBSeqI Request error: > 500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (Bad protocol 'tcp') > > I have traced previous threads from the archives but they appear to > be slightly different internal 500 error. What is confusing me the > most is that the script that leaves this error worked perfectly on my > older machine (running win XP) but not on my colleagues machine (also > running XP). The errors cascade from deep within bioperl and I have a > feeling it has something to do with the way the machine is setup > rather than the scripts themselves. Any ideas? > > thanks in advance > > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0504-1, 27/01/2005 Tested on: 27/01/2005 08:36:21 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com From palmeida at igc.gulbenkian.pt Thu Jan 27 05:12:45 2005 From: palmeida at igc.gulbenkian.pt (Paulo Almeida) Date: Thu Jan 27 05:07:44 2005 Subject: [Bioperl-l] Help can't solve an internal 500 error In-Reply-To: References: Message-ID: <20050127101245.GB13534@bioinf.igc.gulbenkian.pt> It might be a problem with the tcp protocol in that computer. If that is the case, you can try re-installing it as this page explains: http://www.petri.co.il/reinstall_tcp_ip_on_windows_xp.htm -Paulo On Thu, Jan 27, 2005 at 12:55:12PM +1100, David Humphreys wrote: > Hi bioperl-groovers, > > Has anyone ever seen the following error message? > > > MSG: WebDBSeqI Request error: > 500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (Bad protocol 'tcp') > > I have traced previous threads from the archives but they appear to > be slightly different internal 500 error. What is confusing me the > most is that the script that leaves this error worked perfectly on my > older machine (running win XP) but not on my colleagues machine (also > running XP). The errors cascade from deep within bioperl and I have a > feeling it has something to do with the way the machine is setup > rather than the scripts themselves. Any ideas? > > thanks in advance > > Dave -- Paulo Almeida Instituto Gulbenkian de Ciencia Apartado 14, 2781-901, Oeiras, PORTUGAL tel +351 21 446 46 35 fax +351 21 440 79 70 http://www.igc.gulbenkian.pt From davidg at lsi.upc.edu Thu Jan 27 09:21:27 2005 From: davidg at lsi.upc.edu (=?iso-8859-1?Q?David_Garc=EDa_Cort=E9s?=) Date: Thu Jan 27 09:18:14 2005 Subject: [Bioperl-l] BPpsilite possible bug? Message-ID: <00c401c5047b$8058f810$fb1e5393@Davidg> Hello. I'm using BPpsilite to parse a PsiBlast results file, and I've noticed something strange that seems to be a bug. The thing is: it doesn't get the HSP length in some concrete cases, while it works correctly in others. I obtain the HSP length this way: while ( (my $sbjct = $last_iteration ->nextSbjct) ) { while (my $hsp = $sbjct->nextHSP) { my $hlength = $hsp->length; print "$hlength"; } } And it works fine for many cases, but in other ones it doesn't. I've seen that, when parsing result files where there are more than one sequence producing significant alignments versus the query sequence, everything works OK. But when there's only one sequence producing significant alignments, then it $hsp->length doesn't get the HSP size correctly. For example, when parsing the results file I include at the end of this mail, the HSP lenghts are wrong. Is it a bug or am I doing something wrong? Thanks in advance. -- David Garc?a Cort?s Instituto Nacional de Bioinform?tica (INB) Nodo Computacional GNHC-2 UPC-CIRI c/. Jordi Girona 1-3 Modul C6-E201 Tel. : 934 011 650 E-08034 Barcelona Fax : 934 017 014 Catalunya (Spain) e-mail: davidg@lsi.upc.edu RESULTS FILE: *********************** BLASTP 2.2.6 [Apr-09-2003] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= gi|18676612|dbj|BAB84958.1| (359 letters) Database: nr-0.fa 175 sequences; 100,812 total letters Searching.........done Results from round 1 Score E Sequences producing significant alignments: (bits) Value dbj|BAB84958.1| FLJ00205 protein [Homo sapiens] 710 0.0 >dbj|BAB84958.1| FLJ00205 protein [Homo sapiens] Length = 359 Score = 710 bits (1832), Expect = 0.0 Identities = 359/359 (100%), Positives = 359/359 (100%) Query: 1 LLQAVALVLAALVLLPNVGLWALYRERQPDGTPGGSGAAVAPAAGQGSHSRQKKTFFLGD 60 LLQAVALVLAALVLLPNVGLWALYRERQPDGTPGGSGAAVAPAAGQGSHSRQKKTFFLGD Sbjct: 1 LLQAVALVLAALVLLPNVGLWALYRERQPDGTPGGSGAAVAPAAGQGSHSRQKKTFFLGD 60 Query: 61 GQKLKDWHDKEAIRRDAQRVGNGEQGRPYPMTDAERVDQAYRENGFNIYVSDKISLNRSL 120 GQKLKDWHDKEAIRRDAQRVGNGEQGRPYPMTDAERVDQAYRENGFNIYVSDKISLNRSL Sbjct: 61 GQKLKDWHDKEAIRRDAQRVGNGEQGRPYPMTDAERVDQAYRENGFNIYVSDKISLNRSL 120 Query: 121 PDIRHPNCNSKRYLETLPNTSIIIPFHNEGWSSLLRTVHSVLNRSPPELVAEIVLVDDFS 180 PDIRHPNCNSKRYLETLPNTSIIIPFHNEGWSSLLRTVHSVLNRSPPELVAEIVLVDDFS Sbjct: 121 PDIRHPNCNSKRYLETLPNTSIIIPFHNEGWSSLLRTVHSVLNRSPPELVAEIVLVDDFS 180 Query: 181 DREHLKKPLEDYMALFPSVRILRTKKREGLIRTRMLGASVATGDVITFLDSHCEANVNWL 240 DREHLKKPLEDYMALFPSVRILRTKKREGLIRTRMLGASVATGDVITFLDSHCEANVNWL Sbjct: 181 DREHLKKPLEDYMALFPSVRILRTKKREGLIRTRMLGASVATGDVITFLDSHCEANVNWL 240 Query: 241 PPLLDRIARNRKTIVCPMIDVIDHDDFRYETQAGDAMRGAFDWEMYYKRIPIPPELQKAD 300 PPLLDRIARNRKTIVCPMIDVIDHDDFRYETQAGDAMRGAFDWEMYYKRIPIPPELQKAD Sbjct: 241 PPLLDRIARNRKTIVCPMIDVIDHDDFRYETQAGDAMRGAFDWEMYYKRIPIPPELQKAD 300 Query: 301 PSDPFESPVMAGGLFAVDRKWFWELGGYDPGLEIWGGEQYEISFKVSQLSRRPVLGTAS 359 PSDPFESPVMAGGLFAVDRKWFWELGGYDPGLEIWGGEQYEISFKVSQLSRRPVLGTAS Sbjct: 301 PSDPFESPVMAGGLFAVDRKWFWELGGYDPGLEIWGGEQYEISFKVSQLSRRPVLGTAS 359 Searching.........done Results from round 2 Score E Sequences producing significant alignments: (bits) Value Sequences used in model and found again: dbj|BAB84958.1| FLJ00205 protein [Homo sapiens] 758 0.0 Sequences not found previously or not previously below threshold: CONVERGED! >dbj|BAB84958.1| FLJ00205 protein [Homo sapiens] Length = 359 Score = 758 bits (1956), Expect = 0.0 Identities = 359/359 (100%), Positives = 359/359 (100%) Query: 1 LLQAVALVLAALVLLPNVGLWALYRERQPDGTPGGSGAAVAPAAGQGSHSRQKKTFFLGD 60 LLQAVALVLAALVLLPNVGLWALYRERQPDGTPGGSGAAVAPAAGQGSHSRQKKTFFLGD Sbjct: 1 LLQAVALVLAALVLLPNVGLWALYRERQPDGTPGGSGAAVAPAAGQGSHSRQKKTFFLGD 60 Query: 61 GQKLKDWHDKEAIRRDAQRVGNGEQGRPYPMTDAERVDQAYRENGFNIYVSDKISLNRSL 120 GQKLKDWHDKEAIRRDAQRVGNGEQGRPYPMTDAERVDQAYRENGFNIYVSDKISLNRSL Sbjct: 61 GQKLKDWHDKEAIRRDAQRVGNGEQGRPYPMTDAERVDQAYRENGFNIYVSDKISLNRSL 120 Query: 121 PDIRHPNCNSKRYLETLPNTSIIIPFHNEGWSSLLRTVHSVLNRSPPELVAEIVLVDDFS 180 PDIRHPNCNSKRYLETLPNTSIIIPFHNEGWSSLLRTVHSVLNRSPPELVAEIVLVDDFS Sbjct: 121 PDIRHPNCNSKRYLETLPNTSIIIPFHNEGWSSLLRTVHSVLNRSPPELVAEIVLVDDFS 180 Query: 181 DREHLKKPLEDYMALFPSVRILRTKKREGLIRTRMLGASVATGDVITFLDSHCEANVNWL 240 DREHLKKPLEDYMALFPSVRILRTKKREGLIRTRMLGASVATGDVITFLDSHCEANVNWL Sbjct: 181 DREHLKKPLEDYMALFPSVRILRTKKREGLIRTRMLGASVATGDVITFLDSHCEANVNWL 240 Query: 241 PPLLDRIARNRKTIVCPMIDVIDHDDFRYETQAGDAMRGAFDWEMYYKRIPIPPELQKAD 300 PPLLDRIARNRKTIVCPMIDVIDHDDFRYETQAGDAMRGAFDWEMYYKRIPIPPELQKAD Sbjct: 241 PPLLDRIARNRKTIVCPMIDVIDHDDFRYETQAGDAMRGAFDWEMYYKRIPIPPELQKAD 300 Query: 301 PSDPFESPVMAGGLFAVDRKWFWELGGYDPGLEIWGGEQYEISFKVSQLSRRPVLGTAS 359 PSDPFESPVMAGGLFAVDRKWFWELGGYDPGLEIWGGEQYEISFKVSQLSRRPVLGTAS Sbjct: 301 PSDPFESPVMAGGLFAVDRKWFWELGGYDPGLEIWGGEQYEISFKVSQLSRRPVLGTAS 359 Database: nr-0.fa Posted date: Jan 13, 2005 6:32 PM Number of letters in database: 100,812 Number of sequences in database: 175 Lambda K H 0.320 0.139 0.427 Lambda K H 0.267 0.0424 0.140 Matrix: BLOSUM62 Gap Penalties: Existence: 11, Extension: 1 Number of Hits to DB: 174,609 Number of Sequences: 175 Number of extensions: 8789 Number of successful extensions: 19 Number of sequences better than 1.0: 1 Number of HSP's better than 1.0 without gapping: 2 Number of HSP's successfully gapped in prelim test: 0 Number of HSP's that attempted gapping in prelim test: 17 Number of HSP's gapped (non-prelim): 2 length of query: 359 length of database: 100,812 effective HSP length: 69 effective length of query: 290 effective length of database: 88,737 effective search space: 25733730 effective search space used: 25733730 T: 11 A: 40 X1: 16 ( 7.4 bits) X2: 38 (14.6 bits) X3: 64 (24.7 bits) S1: 41 (21.8 bits) S2: 53 (25.0 bits) ********************************************* From jason.stajich at duke.edu Thu Jan 27 18:13:51 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu Jan 27 18:11:26 2005 Subject: [Bioperl-l] Extraction of Intergenic region In-Reply-To: <6FCF17FF93748647A202BE44B651321A1297C6@EDENEVS1.asp.ad.uit.no> References: <6FCF17FF93748647A202BE44B651321A1297C6@EDENEVS1.asp.ad.uit.no> Message-ID: <45698441630d0e250855f9fc96df7ffc@duke.edu> [bioperl-l is really the right list to post to] There isn't exactly something that does this, but you can write a script to do this by parsing the sequence file with Bio::SeqIO and the coordinate file. Have you tried to write the simple perl to do this yet. You can do it pretty basically with the substr function. I also have done it where I mask the coding sequence with 'N's first then use split to go back and extract all the non-N regions. -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Jan 27, 2005, at 4:32 AM, Rafi Ahmad wrote: > Hi everyone, > > I am new to BioPerl. Would like to know that is there a BioPerl code > that helps extract intergenic sequences in a genome, given a > coordinate file mentioning the start and stop position of all the > genes. > > Thanks for the help. > > Regards > > Rafi > > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l > From barry.moore at genetics.utah.edu Thu Jan 27 18:22:00 2005 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu Jan 27 18:18:03 2005 Subject: [Bioperl-l] Re: load_seqdatabase.pl running SLOW! In-Reply-To: References: Message-ID: <41F97798.6060203@genetics.utah.edu> Hilmar- Thanks for the suggestions. Things are working smoothly now, but I'm not entirely sure why. I stopped the slow running load_seqdatabase.pl process on the fast machine, built an identical biosql database under a different name, and began loading the same file into it. This screamed along at 8-10 seq/sec. I re-ran the load script into the old db - still slow. I vacuumed the old db - still slow. I dropped the old db and rebuilt it - now both load very fast. I dropped both dbs and rebuilt just one, and it is now loading fine. Go figure. I send this to the list simply for the record in case it provides a clue to someone in the future with similar trouble. I haven't got a clue. Barry Hilmar Lapp wrote: >To be honest I've never loaded a large file into a Pg installation. The problem that I'd expect you to run into is that if you started with a fresh database the lookup queries will become slower and slower in the absence of the stats being recomputed on a frequent basis through vacuum (which the load script won't do). > >I believe in more recent releases you can actually vacuum the database concurrent to write access; not sure whether 7.2.x will allow this already. You should strongly consider upgrading to at least 7.3 if not 7.4 or even 8.x. The Pg developers may not even answer questions to 7.2 anymore ... > >Your obvservation that the slower machine with the later kernel would be faster leaves me puzzled. If blind-tested I would have suggested that the machine appearing faster has had the database vacuumed. > >Not sure this is very helpful ... > > -hilmar > > -----Original Message----- > From: Barry Moore [mailto:barry.moore@genetics.utah.edu] > Sent: Tue 1/25/2005 3:15 PM > To: Bioperl list; Hilmar Lapp > Cc: > Subject: load_seqdatabase.pl running SLOW! > > > > Hilmar (or others)- > > I've set up a biosql based database using PostgreSQL 7.2 on a PC with an > Intel Pentium 4 3.0 GHz processor, 800 MHz system Bus. 1 GB of RAM, and > Linux (2.2 kernel - Debian woody distro). Onto that I am loading > ~352,000 sequences from RefSeq complete rna collection using > load_seqdatabase.pl. It's running kind of slow - loding on average > about 1 sequence every 2-5 seconds. In the archives I've read your > comments to a previous question like this suggesting two fast > processors, a couple gigs of memory and 2-3 drives to really make things > fly and while my system isn't that good, it seems like I should be doing > better. I got to experimenting on another (slower) system while waiting > for things to load, and found that running the same script to load the > same file goes about 3X faster on a 266MHz Intel processor with 192 Mb > RAM. Same installation of PostgreSQL (both installed from deb package > with defaults), and same installation of Debian Linux (except that the > kernel on the older slow machine has been updated to 2.4) Another > difference I noticed between the two is that the old 266 MHz machine is > using about 75% CPU resources for perl and about 25% for postmaster > whereas the faster 3 GHz machine (but slower running > load_seqdatabase.pl) is using 95% of it's CPU resources for postmaster > and about 3% for perl. Both systems are using up most of their memory, > but little to no swap. Could the kernel upgrade really be making the > difference? Any thoughts? As it's going now I can wait over a week for > all these sequences to load, or build the database on our dinosaur > server in a couple of days and dump it across to our sexy new 3 GHz > server. Talk about bass ackwards! > > Barry > > -- > Barry Moore > Dept. of Human Genetics > University of Utah > Salt Lake City, UT > > > > > > > -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT From hlapp at gnf.org Thu Jan 27 20:10:43 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Thu Jan 27 20:07:44 2005 Subject: [Bioperl-l] RE: load_seqdatabase.pl running SLOW! Message-ID: Thanks for the update Barry. Great to hear it's working now for you. What you're looking at may really be a Postgres version issue. 7.2 has known problems and after the time 7.3 came out everybody was strongly urged to migrate to 7.3. All I can say. BTW if there's no package with higher version don't be scared of compiling from scratch. I built Pg on different platforms from Linux to MacOSX and it compiled like a charm on all of them. -hilmar -----Original Message----- From: Barry Moore [mailto:barry.moore@genetics.utah.edu] Sent: Thu 1/27/2005 3:22 PM To: Hilmar Lapp Cc: Bioperl list Subject: Re: load_seqdatabase.pl running SLOW! Hilmar- Thanks for the suggestions. Things are working smoothly now, but I'm not entirely sure why. I stopped the slow running load_seqdatabase.pl process on the fast machine, built an identical biosql database under a different name, and began loading the same file into it. This screamed along at 8-10 seq/sec. I re-ran the load script into the old db - still slow. I vacuumed the old db - still slow. I dropped the old db and rebuilt it - now both load very fast. I dropped both dbs and rebuilt just one, and it is now loading fine. Go figure. I send this to the list simply for the record in case it provides a clue to someone in the future with similar trouble. I haven't got a clue. Barry Hilmar Lapp wrote: >To be honest I've never loaded a large file into a Pg installation. The problem that I'd expect you to run into is that if you started with a fresh database the lookup queries will become slower and slower in the absence of the stats being recomputed on a frequent basis through vacuum (which the load script won't do). > >I believe in more recent releases you can actually vacuum the database concurrent to write access; not sure whether 7.2.x will allow this already. You should strongly consider upgrading to at least 7.3 if not 7.4 or even 8.x. The Pg developers may not even answer questions to 7.2 anymore ... > >Your obvservation that the slower machine with the later kernel would be faster leaves me puzzled. If blind-tested I would have suggested that the machine appearing faster has had the database vacuumed. > >Not sure this is very helpful ... > > -hilmar > > -----Original Message----- > From: Barry Moore [mailto:barry.moore@genetics.utah.edu] > Sent: Tue 1/25/2005 3:15 PM > To: Bioperl list; Hilmar Lapp > Cc: > Subject: load_seqdatabase.pl running SLOW! > > > > Hilmar (or others)- > > I've set up a biosql based database using PostgreSQL 7.2 on a PC with an > Intel Pentium 4 3.0 GHz processor, 800 MHz system Bus. 1 GB of RAM, and > Linux (2.2 kernel - Debian woody distro). Onto that I am loading > ~352,000 sequences from RefSeq complete rna collection using > load_seqdatabase.pl. It's running kind of slow - loding on average > about 1 sequence every 2-5 seconds. In the archives I've read your > comments to a previous question like this suggesting two fast > processors, a couple gigs of memory and 2-3 drives to really make things > fly and while my system isn't that good, it seems like I should be doing > better. I got to experimenting on another (slower) system while waiting > for things to load, and found that running the same script to load the > same file goes about 3X faster on a 266MHz Intel processor with 192 Mb > RAM. Same installation of PostgreSQL (both installed from deb package > with defaults), and same installation of Debian Linux (except that the > kernel on the older slow machine has been updated to 2.4) Another > difference I noticed between the two is that the old 266 MHz machine is > using about 75% CPU resources for perl and about 25% for postmaster > whereas the faster 3 GHz machine (but slower running > load_seqdatabase.pl) is using 95% of it's CPU resources for postmaster > and about 3% for perl. Both systems are using up most of their memory, > but little to no swap. Could the kernel upgrade really be making the > difference? Any thoughts? As it's going now I can wait over a week for > all these sequences to load, or build the database on our dinosaur > server in a couple of days and dump it across to our sexy new 3 GHz > server. Talk about bass ackwards! > > Barry > > -- > Barry Moore > Dept. of Human Genetics > University of Utah > Salt Lake City, UT > > > > > > > -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT From allenday at ucla.edu Thu Jan 27 23:48:51 2005 From: allenday at ucla.edu (Allen Day) Date: Thu Jan 27 23:45:02 2005 Subject: [Bioperl-l] RPMs for Bioperl and GMOD In-Reply-To: References: Message-ID: SUMMARY: ======== I've successfully built RPMs for the GBrowse, Bioperl, and Bioperl-DB dependency trees. These are for CVS HEAD, not the most recent releases. I've only tested on Fedora Core 2 and RedHat 9. Fedora Core 2 packages are available here: http://sumo.genetics.ucla.edu/~allenday/flute-fc2/i386/ http://sumo.genetics.ucla.edu/~allenday/flute-fc2/noarch/ The source RPMs are available as well http://sumo.genetics.ucla.edu/~allenday/flute-fc2/SRPMS A few notes on what non-obvious steps needed to be taken to make this work: * pruned Bioperl-DB to remove Oracle dependencies * pruned Gbrowse to remove AcePerl dependencies * pruned Bioperl to remove AcePerl dependencies * pruned SOAP::Lite to remove MQSeries dependencies * pruned Gbrowse to remove Mac OS X dependencies * pruned a few modules (Net::DNS, Mail::Sender, etc) to remove Win32::* dependencies. * piggybacked on existing RPMs for rrdtool, Template Toolkit, Module-Build, and CPANPLUS. Thanks to Dag Wieeers [1] for these. TODO: ===== The Bioperl install is fully functional as far as I can tell. I'd appreciate it if someone with Oracle and DBD::Oracle installed could give Bioperl-DB a spin and verify that it works. I'd also like someone with Oracle to help me make a DBD::Oracle rpm. Having a DBD::Oracle RPM will allow me to leave the Oracle code in Bioperl-DB. The Gbrowse install is slightly broken, but this is mainly due to a major rewrite that's taking place right now. I'll make another announcement to the GMOD list when the Gbrowse RPM works out of the box. -Allen [1] http://dag.wieers.com From nathanhaigh at ukonline.co.uk Fri Jan 28 02:40:41 2005 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Fri Jan 28 02:36:50 2005 Subject: [Bioperl-l] bioperl-1.5.0 PPM files In-Reply-To: Message-ID: I've uploaded the ppd files for bioperl-1.5, unfortunately there isn't directory listing enabled, so here are the direct links: http://web.ukonline.co.uk/nathanhaigh/bioperl/bioperl.ppd http://web.ukonline.co.uk/nathanhaigh/bioperl/bioperl-1.5-ppm.tar.gz Could someone copy these over to the http://bioperl.org/DIST/ directory; the existing bioperl.ppd file at http://bioperl.org/DIST/ being renamed to bioperl-1.2.ppd. I've also uploaded the GD-SVG v0.25 files which aren't available at any of the repositories specified in point 1.3.2 of the INSTALL.WIN file, and should be included in the http://bioperl.org/DIST/ directory to allow successful installation of bioperl-1.5. If however, you think we shouldn't keep non-bioperl modules on the bioperl server, we should see about getting it added to one of the repositories mentioned in point 1.3.2 of the INSTALL.WIN file. Again, the direct links are: http://web.ukonline.co.uk/nathanhaigh/bioperl/GD-SVG.ppd http://web.ukonline.co.uk/nathanhaigh/bioperl/GD-SVG-0.25-ppm.tar.gz The MD5 checksums can be found at: http://web.ukonline.co.uk/nathanhaigh/bioperl/md5.txt If I can be of further assistance, please don't hesitate to ask. Nathan > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Jason Stajich > Sent: 25 January 2005 02:37 > To: 'bioperl-l@bioperl.org' List; bioperl-announce-l@bioperl.org > Subject: [Bioperl-l] bioperl-1.5.0 released > > Bioperl 1.5.0 Developer's release is available for download. > =============================================== > > http://bioperl.org/DIST/bioperl-1.5.0.tar.bz2 > 425ac55ecbb4339b7b532ba6d429bb40 > http://bioperl.org/DIST/bioperl-1.5.0.tar.gz > 172472f0675de9a583432e21c9b1b5fc > http://bioperl.org/DIST/bioperl-1.5.0.zip > 3febcd2445a7393c65981a6f9f13a9ed > > We'll update the website to reflect this new release. > > The odd-numbered releases are called developer releases and are not > deposited on CPAN. Please note that the API in 1.5.0 may change before > the 1.6.0 release. which will be consider a stable API. We may do > another developer release before 1.6.0 goes out. > > Lots of people have contributed to this release, I apologize for not > naming them all. I'll try to cover some: thanks to Aaron Mackey for > getting this release started, Brian Osborne for extensive documentation > improvements, Nathan Haigh for volunteering to make a PPM of the > release and Barry Moore and Nathan answering many of the windows > related questions, Allen Day & Scott Cain & Steffen Grossmann for the > work on FeatureIO, GFF3, and SeqFeature::Annotated, Chris Mungall for > the work with Unflattener to merge GenBank annotations into GFF3 > objects. > > Please see the AUTHORS file for a complete list of contributors. > > Jason Stajich on behalf of the Bioperl developers. > > > Here is the info from the Changes file. > 1.5 Developer release > > o Bio::Align::DNAStatistics and Bio::Align::ProteinStatistics > provide Jukes-Cantor and Kimura pairwise distance methods, > respectively. > > o Bio::AlignIO support for "po" format of POA, and "maf"; > Bio::AlignIO::largemultifasta is a new alternative to > Bio::AlignIO::fasta for temporary file-based manipulation of > particularly large multiple sequence alignments. > > o Bio::Assembly::Singlet allows orphan, unassembled sequences to > be treated similarly as an assembled contig. > > o Bio::CodonUsage provides new rare_codon() and probable_codons() > methods for identifying particular codons that encode a given > amino acid. > > o Bio::Coordinate::Utils provides new from_align() method to build > a Bio::Coordinate pair directly from a > Bio::Align::AlignI-conforming object. > > o Bio::DB::Biblio::eutils is a class for querying NCBI's Eutils. > Send a Pubmed, Pubmed Central, Entrez, or other query to NCBI's > web service using standard Pubmed query syntax, and retrieve > results as XML. > > o Bio::DB::GFF has various sundry bug fixes. > > o Bio::FeatureIO is a new SeqIO-style subsystem for > writing/reading genomic features to/from files. I/O classes > exist for BED, GTF (aka GFF v2.5), and GFF v3. Bio::FeatureIO > classes only read/write Bio::SeqFeature::Annotated objects. > Notably, the GFF v3 class requires features to be typed into the > Sequence Ontology. > > o Bio::Graph namespace contains new modules for manipulation and > analysis of protein interaction graphs. > > o Bio::Graphics has many bug fixes and shiny new glyphs. > > o Bio::Index::Hmmer and Bio::Index::Qual provide multiple-file > indexing for HMMER reports and FASTA qual files, respectively. > > o Bio::Map::Clone, Bio::Map::Contig, and Bio::Map::FPCMarker are > new objects that can be placed within a Bio::Map::MapI-compliant > genetic/physical map; Bio::Map::Physical provides a new physical > map type; Bio::MapIO::fpc provides finger-printed clone mapping > import. > > o Bio::Matrix::PSM provide new support for postion-specific > (scoring) matrices (e.g. profiles, or "possums"). > > o Bio::Ontology::Ontology and Bio::Ontology::Term objects can now > be instantiated without explicitly using Bio::OntologyIO. This > is possible through changes to Bio::Ontology::OntologyStore to > download ontology files from the web as necessary. Locations of > ontology files are hard-coded into > Bio::Ontology::DocumentRegistry. > > o Bio::PopGen includes many new methods and data types for > population genetics analyses. > > o New constructor to Bio::Range, unions(). Given a list of > ranges, returns another list of "flattened" ranges -- > overlapping ranges are merged into a single range with the > mininum and maximum coordinates of the entire overlapping group. > > o Bio::Root::IO now supports -url, in addition to -file and -fh. > The new -url argument allows one to specify the network address > of a file for input. -url currently only works for GET > requests, and thus is read-only. > > o Bio::SearchIO::hmmer now returns individual Hit objects for each > domain alignment (thus containing only one HSP); previously > separate alignments would be merged into one hit if the domain > involved in the alignments was the same, but this only worked > when the repeated domain occured without interruption by any > other domain, leading to a confusing mixture of Hit and HSP > objects. > > o Bio::Search::Result::ResultI-compliant report objects now > implement the "get_statistics" method to access > Bio::Search::StatisticsI objects that encapsulate any > statistical parameters associated with the search (e.g. Karlin's > lambda for BLAST/FASTA). > > o Bio::Seq::LargeLocatableSeq combines the functionality already > found in Bio::Seq::LargeSeq and Bio::LocatableSeq. > > o Bio::SeqFeature::Annotated is a replacement for > Bio::SeqFeature::Generic. It breaks compliance with the > Bio::SeqFeatureI interface because the author was sick of > dealing with untyped annotation tags. All > Bio::SeqFeature::Annotated annotations are Bio::AnnotationI > compliant, and accessible through Bio::Annotation::Collection. > > o Bio::SeqFeature::Primer implements a Tm() method for primer > melting point predictions. > > o Bio::SeqIO now supports AGAVE, BSML (via SAX), CHAOS-XML, > InterProScan-XML, TIGR-XML, and NCBI TinySeq formats. > > o Bio::Taxonomy::Node now implements the methods necessary for > Bio::Species interoperability. > > o Bio::Tools::CodonTable has new reverse_translate_all() and > make_iupac_string() methods. > > o Bio::Tools::dpAlign now provides sequence profile alignments. > > o Bio::Tools::GFF now parses GFF version 2.5 (a.k.a. GTF). > > o Bio::Tools::Fgenesh, Bio::Tools::tRNAscanSE are new report > parsers. > > o Bio::Tools::SiRNA includes two new rulesets (Saigo and Tuschl) > for designing small inhibitory RNA. > > o Bio::Tree::DistanceFactory provides NJ and UPGMA tree-building > methods based on a distance matrix. > > o Bio::Tree::Statistics provides an assess_bootstrap() method to > calculate bootstrap support values on a guide tree topology, > based on provided bootstrap tree topologies. > > o Bio::TreeIO now supports the Pagel (PAG) tree format. > > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > --- > avast! Antivirus: Inbound message clean. > Virus Database (VPS): 0503-2, 21/01/2005 > Tested on: 25/01/2005 17:41:57 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > > > --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0504-0, 25/01/2005 Tested on: 25/01/2005 19:06:22 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0504-0, 25/01/2005 Tested on: 25/01/2005 19:28:02 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0504-3, 27/01/2005 Tested on: 28/01/2005 07:39:09 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0504-3, 27/01/2005 Tested on: 28/01/2005 07:40:38 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com From hlapp at gnf.org Fri Jan 28 12:58:58 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Fri Jan 28 12:54:59 2005 Subject: [Bioperl-l] Re: RPMs for Bioperl and GMOD In-Reply-To: References: Message-ID: <48F109E2-7156-11D9-8A2B-000A95AE92B0@gnf.org> On Jan 27, 2005, at 8:48 PM, Allen Day wrote: > I'd appreciate it if someone with Oracle and DBD::Oracle installed > could > give Bioperl-DB a spin and verify that it works. > Do you mean your RPM or bioperl-db on Oracle? I'm running the latter all the time. > I'd also like someone with Oracle to help me make a DBD::Oracle rpm. > Having a DBD::Oracle RPM will allow me to leave the Oracle code in > Bioperl-DB. If installing the supposed DBD::Oracle is then a prerequisite for being able to install the rest, then you are taking the wrong path. DBD::Oracle itself will depend on the Oracle client libraries being installed which aren't even available on all platforms, aside from the fact that installing those is beyond your control and involves downloading about 350MB from OTN. Frankly, I can't believe that there is no way to specify dependencies that are optional. Why would you require all of DBD::mysql, DBD::Pg, and DBD::Oracle if all a persons wants is mysql?? All of these will link to compiled runtime libraries and why should a failure to install DBD::Pg be of any concern to someone who wants to use mysql? BTW DBD::Oracle is on CPAN. I thought that would make it easy to construct an RPM? (There's few if any binaries though - for a reason. Compiling DBD::Oracle may be a charm on some but involve some major tweaking on other platforms. I've been there multiple times, I know what I'm talking about.) -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From allenday at ucla.edu Fri Jan 28 14:50:08 2005 From: allenday at ucla.edu (Allen Day) Date: Fri Jan 28 14:46:06 2005 Subject: [Bioperl-l] Re: RPMs for Bioperl and GMOD In-Reply-To: <48F109E2-7156-11D9-8A2B-000A95AE92B0@gnf.org> References: <48F109E2-7156-11D9-8A2B-000A95AE92B0@gnf.org> Message-ID: > Do you mean your RPM or bioperl-db on Oracle? I'm running the latter > all the time. i mean the RPM. it is the same as bioperl-db cvs head as of last night. > > I'd also like someone with Oracle to help me make a DBD::Oracle rpm. > > Having a DBD::Oracle RPM will allow me to leave the Oracle code in > > Bioperl-DB. > > If installing the supposed DBD::Oracle is then a prerequisite for being > able to install the rest, then you are taking the wrong path. > DBD::Oracle itself will depend on the Oracle client libraries being > installed which aren't even available on all platforms, aside from the > fact that installing those is beyond your control and involves > downloading about 350MB from OTN. > > Frankly, I can't believe that there is no way to specify dependencies > that are optional. Why would you require all of DBD::mysql, DBD::Pg, and > DBD::Oracle if all a persons wants is mysql?? All of these will link to > compiled runtime libraries and why should a failure to install DBD::Pg > be of any concern to someone who wants to use mysql? the problem is something internal to the rpm installer -- it determines perl library dependencies at install-time rather than requiring you to explicitly specify perl packages in the rpm metafiles (aka specfile). so, for instance, if i i tried to install perl-Generic-Genome-Browser, i might get an error like: requires perl(Bio::Root::Root) which could be removed by one of: (1) installing the perl-bioperl package (2) installing bioperl from cvs (3) installing bioperl from cpan there may be a way to code into the metafile to ignore missing perl dependencies detected in the installation process -- i need to look into this. > BTW DBD::Oracle is on CPAN. I thought that would make it easy to > construct an RPM? (There's few if any binaries though - for a reason. > Compiling DBD::Oracle may be a charm on some but involve some major > tweaking on other platforms. I've been there multiple times, I know > what I'm talking about.) given what i've said above, if i had a DBD::Oracle perl module installed, it would prevent rpm from throwing errors about missing dependency "perl(DBD::Oracle)". however, i can't build DBD::Oracle into an rpm because the make process links to the oracle headers and .so files. the DBD::Oracle can be made w/o having explicit dependencies on the oracle binary install, so it would install on a machine that didn't have oracle installed (but wouldn't work). so as far as a bioperl-db rpm goes, here are the options i'm looking into: (1) get a binary perl-DBD-Oracle rpm built by someone with Oracle, leaving out the binary Oracle file dependency. distribute bioperl-db from cvs as-is (2) patch Oracle classes out of bioperl-db as part of the rpm build process. distribute modified bioperl-db. (3) modify rpm "detection of installed perl modules" functionality to have rpm explicitly ignore missing DBD::Oracle dependency. (1) and (2) will definitely work. i don't yet know the feasibility of (3). -allen From jason.stajich at duke.edu Fri Jan 28 15:05:06 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Jan 28 15:00:55 2005 Subject: [Bioperl-l] Re: RPMs for Bioperl and GMOD In-Reply-To: References: <48F109E2-7156-11D9-8A2B-000A95AE92B0@gnf.org> Message-ID: <4e40de988838c77a0768bb96cb4ea1c5@duke.edu> -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Jan 28, 2005, at 2:50 PM, Allen Day wrote: >> Do you mean your RPM or bioperl-db on Oracle? I'm running the latter >> all the time. > > i mean the RPM. it is the same as bioperl-db cvs head as of last > night. > >>> I'd also like someone with Oracle to help me make a DBD::Oracle rpm. >>> Having a DBD::Oracle RPM will allow me to leave the Oracle code in >>> Bioperl-DB. >> >> If installing the supposed DBD::Oracle is then a prerequisite for >> being >> able to install the rest, then you are taking the wrong path. >> DBD::Oracle itself will depend on the Oracle client libraries being >> installed which aren't even available on all platforms, aside from the >> fact that installing those is beyond your control and involves >> downloading about 350MB from OTN. >> >> Frankly, I can't believe that there is no way to specify dependencies >> that are optional. Why would you require all of DBD::mysql, DBD::Pg, >> and >> DBD::Oracle if all a persons wants is mysql?? All of these will link >> to >> compiled runtime libraries and why should a failure to install DBD::Pg >> be of any concern to someone who wants to use mysql? > > the problem is something internal to the rpm installer -- it determines > perl library dependencies at install-time rather than requiring you to > explicitly specify perl packages in the rpm metafiles (aka specfile). > What are you using to generate the specfiles in the first place? Are you using cpan2rpm? > so, for instance, if i i tried to install perl-Generic-Genome-Browser, > i > might get an error like: > > requires perl(Bio::Root::Root) > > which could be removed by one of: > > (1) installing the perl-bioperl package > (2) installing bioperl from cvs > (3) installing bioperl from cpan > > there may be a way to code into the metafile to ignore missing perl > dependencies detected in the installation process -- i need to look > into > this. > >> BTW DBD::Oracle is on CPAN. I thought that would make it easy to >> construct an RPM? (There's few if any binaries though - for a reason. >> Compiling DBD::Oracle may be a charm on some but involve some major >> tweaking on other platforms. I've been there multiple times, I know >> what I'm talking about.) > > given what i've said above, if i had a DBD::Oracle perl module > installed, > it would prevent rpm from throwing errors about missing dependency > "perl(DBD::Oracle)". however, i can't build DBD::Oracle into an rpm > because the make process links to the oracle headers and .so files. > the > DBD::Oracle can be made w/o having explicit dependencies on the oracle > binary install, so it would install on a machine that didn't have > oracle > installed (but wouldn't work). so as far as a bioperl-db rpm goes, > here > are the options i'm looking into: > > (1) get a binary perl-DBD-Oracle rpm built by someone with Oracle, > leaving out the binary Oracle file dependency. distribute > bioperl-db from cvs as-is > (2) patch Oracle classes out of bioperl-db as part of the rpm build > process. distribute modified bioperl-db. > (3) modify rpm "detection of installed perl modules" functionality > to have rpm explicitly ignore missing DBD::Oracle dependency. > > (1) and (2) will definitely work. i don't yet know the feasibility of > (3). > > -allen > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From lstein at cshl.edu Fri Jan 28 10:41:47 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Fri Jan 28 15:01:58 2005 Subject: [Bioperl-l] Re: [GMOD-devel] RPMs for Bioperl and GMOD In-Reply-To: References: Message-ID: <200501281041.48221.lstein@cshl.edu> Hi Allen, Don't release a gbrowse RPM if it is even slightly broken. Lincoln On Thursday 27 January 2005 11:48 pm, Allen Day wrote: > SUMMARY: > ======== > > I've successfully built RPMs for the GBrowse, Bioperl, and > Bioperl-DB dependency trees. These are for CVS HEAD, not the most > recent releases. I've only tested on Fedora Core 2 and RedHat 9. > Fedora Core 2 packages are available here: > > http://sumo.genetics.ucla.edu/~allenday/flute-fc2/i386/ > http://sumo.genetics.ucla.edu/~allenday/flute-fc2/noarch/ > > The source RPMs are available as well > > http://sumo.genetics.ucla.edu/~allenday/flute-fc2/SRPMS > > A few notes on what non-obvious steps needed to be taken to make > this work: > > * pruned Bioperl-DB to remove Oracle dependencies > * pruned Gbrowse to remove AcePerl dependencies > * pruned Bioperl to remove AcePerl dependencies > * pruned SOAP::Lite to remove MQSeries dependencies > * pruned Gbrowse to remove Mac OS X dependencies > * pruned a few modules (Net::DNS, Mail::Sender, etc) to remove > Win32::* dependencies. > * piggybacked on existing RPMs for rrdtool, Template Toolkit, > Module-Build, and CPANPLUS. Thanks to Dag Wieeers [1] for > these. > > > > TODO: > ===== > > The Bioperl install is fully functional as far as I can tell. > > I'd appreciate it if someone with Oracle and DBD::Oracle installed > could give Bioperl-DB a spin and verify that it works. > > I'd also like someone with Oracle to help me make a DBD::Oracle > rpm. Having a DBD::Oracle RPM will allow me to leave the Oracle > code in Bioperl-DB. > > The Gbrowse install is slightly broken, but this is mainly due to a > major rewrite that's taking place right now. I'll make another > announcement to the GMOD list when the Gbrowse RPM works out of the > box. > > -Allen > > [1] http://dag.wieers.com > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IntelliVIEW -- Interactive > Reporting Tool for open source databases. Create drag-&-drop > reports. Save time by over 75%! Publish reports on the web. Export > to DOC, XLS, RTF, etc. Download a FREE copy at > http://www.intelliview.com/go/osdn_nl > _______________________________________________ > Gmod-devel mailing list > Gmod-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-devel -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 NOTE: Please copy Sandra Michelsen on all emails regarding scheduling and other time-critical topics. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050128/caa4e330/attachment.bin From hlapp at gnf.org Fri Jan 28 16:49:55 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Fri Jan 28 16:45:56 2005 Subject: [Bioperl-l] Re: RPMs for Bioperl and GMOD In-Reply-To: Message-ID: <8C2D5F1C-7176-11D9-9251-000A959EB4C4@gnf.org> Like this statement or not, but I think installing all kinds of CPAN packages onto somebody's machine irrespective of whether somebody is ever going to use - or need - them, let alone them working in the first place due to compiled code dependencies being absent, is a really *bad* idea It basically defies the concept of modular packaging to begin with, and sounds way too intrusive for my taste. Unless I misunderstand what Jason is saying then this is not even necessary and is in no way an inherent shortcoming that inevitably comes with RPMs. So unless I'm missing something here I understand that Jason is saying you can have RPMs and still not litter your system with DBD::blah or other modules for which you don't even have the client libraries installed, and still be able to install those at a later time because the respective pieces of code have not been pruned (which I think is actually also a bad idea). -hilmar On Friday, January 28, 2005, at 11:50 AM, Allen Day wrote: >> Do you mean your RPM or bioperl-db on Oracle? I'm running the latter >> all the time. > > i mean the RPM. it is the same as bioperl-db cvs head as of last > night. > >>> I'd also like someone with Oracle to help me make a DBD::Oracle rpm. >>> Having a DBD::Oracle RPM will allow me to leave the Oracle code in >>> Bioperl-DB. >> >> If installing the supposed DBD::Oracle is then a prerequisite for >> being >> able to install the rest, then you are taking the wrong path. >> DBD::Oracle itself will depend on the Oracle client libraries being >> installed which aren't even available on all platforms, aside from the >> fact that installing those is beyond your control and involves >> downloading about 350MB from OTN. >> >> Frankly, I can't believe that there is no way to specify dependencies >> that are optional. Why would you require all of DBD::mysql, DBD::Pg, >> and >> DBD::Oracle if all a persons wants is mysql?? All of these will link >> to >> compiled runtime libraries and why should a failure to install DBD::Pg >> be of any concern to someone who wants to use mysql? > > the problem is something internal to the rpm installer -- it determines > perl library dependencies at install-time rather than requiring you to > explicitly specify perl packages in the rpm metafiles (aka specfile). > > so, for instance, if i i tried to install perl-Generic-Genome-Browser, > i > might get an error like: > > requires perl(Bio::Root::Root) > > which could be removed by one of: > > (1) installing the perl-bioperl package > (2) installing bioperl from cvs > (3) installing bioperl from cpan > > there may be a way to code into the metafile to ignore missing perl > dependencies detected in the installation process -- i need to look > into > this. > >> BTW DBD::Oracle is on CPAN. I thought that would make it easy to >> construct an RPM? (There's few if any binaries though - for a reason. >> Compiling DBD::Oracle may be a charm on some but involve some major >> tweaking on other platforms. I've been there multiple times, I know >> what I'm talking about.) > > given what i've said above, if i had a DBD::Oracle perl module > installed, > it would prevent rpm from throwing errors about missing dependency > "perl(DBD::Oracle)". however, i can't build DBD::Oracle into an rpm > because the make process links to the oracle headers and .so files. > the > DBD::Oracle can be made w/o having explicit dependencies on the oracle > binary install, so it would install on a machine that didn't have > oracle > installed (but wouldn't work). so as far as a bioperl-db rpm goes, > here > are the options i'm looking into: > > (1) get a binary perl-DBD-Oracle rpm built by someone with Oracle, > leaving out the binary Oracle file dependency. distribute > bioperl-db from cvs as-is > (2) patch Oracle classes out of bioperl-db as part of the rpm build > process. distribute modified bioperl-db. > (3) modify rpm "detection of installed perl modules" functionality > to have rpm explicitly ignore missing DBD::Oracle dependency. > > (1) and (2) will definitely work. i don't yet know the feasibility of > (3). > > -allen > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From allenday at ucla.edu Fri Jan 28 19:49:22 2005 From: allenday at ucla.edu (Allen Day) Date: Fri Jan 28 19:45:19 2005 Subject: [Bioperl-l] Re: RPMs for Bioperl and GMOD In-Reply-To: <8C2D5F1C-7176-11D9-9251-000A959EB4C4@gnf.org> References: <8C2D5F1C-7176-11D9-9251-000A959EB4C4@gnf.org> Message-ID: okay, i've looked into this. short answer: you cannot specify to omit automatically determined dependencies without "lying" in the rpm specfile and stating that a package provides a perl module that it, in fact, does not. for example, i can add a statement to the bioperl-db rpm stating that it provides perl(DBD::Oracle), but not actually add DBD/Oracle.pm to the package. there is a thread extensively discussing this aspect of the rpm build system here: http://www.redhat.com/archives/rpm-list/2004-February/msg00083.html if i'm making a package for private use only, i don't mind doing this, but if this package is to be for public consumption i don't want to lie about what is and is not provided. i take the same stance on all the other perl modules in the bioperl dependency tree, including esoteric modules such as Net::Jabber and GD::Graph3d. the only viable option i see here is to patch Oracle dependencies out of bioperl-db. that is what i will do until i have working Oracle and perl-DBD-Oracle packages in-hand. -allen On Fri, 28 Jan 2005, Hilmar Lapp wrote: > Like this statement or not, but I think installing all kinds of CPAN > packages onto somebody's machine irrespective of whether somebody is > ever going to use - or need - them, let alone them working in the first > place due to compiled code dependencies being absent, is a really *bad* > idea > > It basically defies the concept of modular packaging to begin with, and > sounds way too intrusive for my taste. > > Unless I misunderstand what Jason is saying then this is not even > necessary and is in no way an inherent shortcoming that inevitably > comes with RPMs. So unless I'm missing something here I understand that > Jason is saying you can have RPMs and still not litter your system with > DBD::blah or other modules for which you don't even have the client > libraries installed, and still be able to install those at a later time > because the respective pieces of code have not been pruned (which I > think is actually also a bad idea). > > -hilmar > > On Friday, January 28, 2005, at 11:50 AM, Allen Day wrote: > > >> Do you mean your RPM or bioperl-db on Oracle? I'm running the latter > >> all the time. > > > > i mean the RPM. it is the same as bioperl-db cvs head as of last > > night. > > > >>> I'd also like someone with Oracle to help me make a DBD::Oracle rpm. > >>> Having a DBD::Oracle RPM will allow me to leave the Oracle code in > >>> Bioperl-DB. > >> > >> If installing the supposed DBD::Oracle is then a prerequisite for > >> being > >> able to install the rest, then you are taking the wrong path. > >> DBD::Oracle itself will depend on the Oracle client libraries being > >> installed which aren't even available on all platforms, aside from the > >> fact that installing those is beyond your control and involves > >> downloading about 350MB from OTN. > >> > >> Frankly, I can't believe that there is no way to specify dependencies > >> that are optional. Why would you require all of DBD::mysql, DBD::Pg, > >> and > >> DBD::Oracle if all a persons wants is mysql?? All of these will link > >> to > >> compiled runtime libraries and why should a failure to install DBD::Pg > >> be of any concern to someone who wants to use mysql? > > > > the problem is something internal to the rpm installer -- it determines > > perl library dependencies at install-time rather than requiring you to > > explicitly specify perl packages in the rpm metafiles (aka specfile). > > > > so, for instance, if i i tried to install perl-Generic-Genome-Browser, > > i > > might get an error like: > > > > requires perl(Bio::Root::Root) > > > > which could be removed by one of: > > > > (1) installing the perl-bioperl package > > (2) installing bioperl from cvs > > (3) installing bioperl from cpan > > > > there may be a way to code into the metafile to ignore missing perl > > dependencies detected in the installation process -- i need to look > > into > > this. > > > >> BTW DBD::Oracle is on CPAN. I thought that would make it easy to > >> construct an RPM? (There's few if any binaries though - for a reason. > >> Compiling DBD::Oracle may be a charm on some but involve some major > >> tweaking on other platforms. I've been there multiple times, I know > >> what I'm talking about.) > > > > given what i've said above, if i had a DBD::Oracle perl module > > installed, > > it would prevent rpm from throwing errors about missing dependency > > "perl(DBD::Oracle)". however, i can't build DBD::Oracle into an rpm > > because the make process links to the oracle headers and .so files. > > the > > DBD::Oracle can be made w/o having explicit dependencies on the oracle > > binary install, so it would install on a machine that didn't have > > oracle > > installed (but wouldn't work). so as far as a bioperl-db rpm goes, > > here > > are the options i'm looking into: > > > > (1) get a binary perl-DBD-Oracle rpm built by someone with Oracle, > > leaving out the binary Oracle file dependency. distribute > > bioperl-db from cvs as-is > > (2) patch Oracle classes out of bioperl-db as part of the rpm build > > process. distribute modified bioperl-db. > > (3) modify rpm "detection of installed perl modules" functionality > > to have rpm explicitly ignore missing DBD::Oracle dependency. > > > > (1) and (2) will definitely work. i don't yet know the feasibility of > > (3). > > > > -allen > > > From hlapp at gnf.org Fri Jan 28 20:00:30 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Fri Jan 28 19:56:24 2005 Subject: [Bioperl-l] Re: RPMs for Bioperl and GMOD In-Reply-To: References: <8C2D5F1C-7176-11D9-9251-000A959EB4C4@gnf.org> Message-ID: <2BB48BC3-7191-11D9-8A2B-000A95AE92B0@gnf.org> Ah - I think I misunderstood Jason - he probably meant when installing the RPM you ignore certain dependencies? So why don't you allow people to ignore dependencies? I'll stick to my guns here that I don't think this is a good approach, and not just due to DBD::Oracle. Why do you want somebody to install DBD::Pg if she doesn't have or intend to use PostgreSQL? What are you going to tell a sysadmin who wants a clean system? Why install all kinds of esoteric packages into somebody's perl installation without even asking, and even without some of them working?? Why does CPAN ask before it gets and installs a dependency? My opinion anyways, and I'll shut up with this. -hilmar On Jan 28, 2005, at 4:49 PM, Allen Day wrote: > okay, i've looked into this. short answer: you cannot specify to omit > automatically determined dependencies without "lying" in the rpm > specfile > and stating that a package provides a perl module that it, in fact, > does > not. > > for example, i can add a statement to the bioperl-db rpm stating that > it > provides perl(DBD::Oracle), but not actually add DBD/Oracle.pm to the > package. there is a thread extensively discussing this aspect of the > rpm > build system here: > > http://www.redhat.com/archives/rpm-list/2004-February/msg00083.html > > if i'm making a package for private use only, i don't mind doing this, > but > if this package is to be for public consumption i don't want to lie > about > what is and is not provided. i take the same stance on all the other > perl > modules in the bioperl dependency tree, including esoteric modules > such as > Net::Jabber and GD::Graph3d. > > the only viable option i see here is to patch Oracle dependencies out > of > bioperl-db. that is what i will do until i have working Oracle and > perl-DBD-Oracle packages in-hand. > > -allen > > > On Fri, 28 Jan 2005, Hilmar Lapp wrote: > >> Like this statement or not, but I think installing all kinds of CPAN >> packages onto somebody's machine irrespective of whether somebody is >> ever going to use - or need - them, let alone them working in the >> first >> place due to compiled code dependencies being absent, is a really >> *bad* >> idea >> >> It basically defies the concept of modular packaging to begin with, >> and >> sounds way too intrusive for my taste. >> >> Unless I misunderstand what Jason is saying then this is not even >> necessary and is in no way an inherent shortcoming that inevitably >> comes with RPMs. So unless I'm missing something here I understand >> that >> Jason is saying you can have RPMs and still not litter your system >> with >> DBD::blah or other modules for which you don't even have the client >> libraries installed, and still be able to install those at a later >> time >> because the respective pieces of code have not been pruned (which I >> think is actually also a bad idea). >> >> -hilmar >> >> On Friday, January 28, 2005, at 11:50 AM, Allen Day wrote: >> >>>> Do you mean your RPM or bioperl-db on Oracle? I'm running the latter >>>> all the time. >>> >>> i mean the RPM. it is the same as bioperl-db cvs head as of last >>> night. >>> >>>>> I'd also like someone with Oracle to help me make a DBD::Oracle >>>>> rpm. >>>>> Having a DBD::Oracle RPM will allow me to leave the Oracle code in >>>>> Bioperl-DB. >>>> >>>> If installing the supposed DBD::Oracle is then a prerequisite for >>>> being >>>> able to install the rest, then you are taking the wrong path. >>>> DBD::Oracle itself will depend on the Oracle client libraries being >>>> installed which aren't even available on all platforms, aside from >>>> the >>>> fact that installing those is beyond your control and involves >>>> downloading about 350MB from OTN. >>>> >>>> Frankly, I can't believe that there is no way to specify >>>> dependencies >>>> that are optional. Why would you require all of DBD::mysql, DBD::Pg, >>>> and >>>> DBD::Oracle if all a persons wants is mysql?? All of these will link >>>> to >>>> compiled runtime libraries and why should a failure to install >>>> DBD::Pg >>>> be of any concern to someone who wants to use mysql? >>> >>> the problem is something internal to the rpm installer -- it >>> determines >>> perl library dependencies at install-time rather than requiring you >>> to >>> explicitly specify perl packages in the rpm metafiles (aka specfile). >>> >>> so, for instance, if i i tried to install >>> perl-Generic-Genome-Browser, >>> i >>> might get an error like: >>> >>> requires perl(Bio::Root::Root) >>> >>> which could be removed by one of: >>> >>> (1) installing the perl-bioperl package >>> (2) installing bioperl from cvs >>> (3) installing bioperl from cpan >>> >>> there may be a way to code into the metafile to ignore missing perl >>> dependencies detected in the installation process -- i need to look >>> into >>> this. >>> >>>> BTW DBD::Oracle is on CPAN. I thought that would make it easy to >>>> construct an RPM? (There's few if any binaries though - for a >>>> reason. >>>> Compiling DBD::Oracle may be a charm on some but involve some major >>>> tweaking on other platforms. I've been there multiple times, I know >>>> what I'm talking about.) >>> >>> given what i've said above, if i had a DBD::Oracle perl module >>> installed, >>> it would prevent rpm from throwing errors about missing dependency >>> "perl(DBD::Oracle)". however, i can't build DBD::Oracle into an rpm >>> because the make process links to the oracle headers and .so files. >>> the >>> DBD::Oracle can be made w/o having explicit dependencies on the >>> oracle >>> binary install, so it would install on a machine that didn't have >>> oracle >>> installed (but wouldn't work). so as far as a bioperl-db rpm goes, >>> here >>> are the options i'm looking into: >>> >>> (1) get a binary perl-DBD-Oracle rpm built by someone with Oracle, >>> leaving out the binary Oracle file dependency. distribute >>> bioperl-db from cvs as-is >>> (2) patch Oracle classes out of bioperl-db as part of the rpm build >>> process. distribute modified bioperl-db. >>> (3) modify rpm "detection of installed perl modules" functionality >>> to have rpm explicitly ignore missing DBD::Oracle dependency. >>> >>> (1) and (2) will definitely work. i don't yet know the feasibility >>> of >>> (3). >>> >>> -allen >>> >> >> -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From allenday at ucla.edu Fri Jan 28 20:01:39 2005 From: allenday at ucla.edu (Allen Day) Date: Fri Jan 28 19:57:27 2005 Subject: [Bioperl-l] Re: RPMs for Bioperl and GMOD In-Reply-To: <4e40de988838c77a0768bb96cb4ea1c5@duke.edu> References: <48F109E2-7156-11D9-8A2B-000A95AE92B0@gnf.org> <4e40de988838c77a0768bb96cb4ea1c5@duke.edu> Message-ID: no, i'm using RPM::Specfile's cpanflute2. it's similar. i autogenerate, then add patches/tweaks as necessary. -allen On Fri, 28 Jan 2005, Jason Stajich wrote: > > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > On Jan 28, 2005, at 2:50 PM, Allen Day wrote: > > >> Do you mean your RPM or bioperl-db on Oracle? I'm running the latter > >> all the time. > > > > i mean the RPM. it is the same as bioperl-db cvs head as of last > > night. > > > >>> I'd also like someone with Oracle to help me make a DBD::Oracle rpm. > >>> Having a DBD::Oracle RPM will allow me to leave the Oracle code in > >>> Bioperl-DB. > >> > >> If installing the supposed DBD::Oracle is then a prerequisite for > >> being > >> able to install the rest, then you are taking the wrong path. > >> DBD::Oracle itself will depend on the Oracle client libraries being > >> installed which aren't even available on all platforms, aside from the > >> fact that installing those is beyond your control and involves > >> downloading about 350MB from OTN. > >> > >> Frankly, I can't believe that there is no way to specify dependencies > >> that are optional. Why would you require all of DBD::mysql, DBD::Pg, > >> and > >> DBD::Oracle if all a persons wants is mysql?? All of these will link > >> to > >> compiled runtime libraries and why should a failure to install DBD::Pg > >> be of any concern to someone who wants to use mysql? > > > > the problem is something internal to the rpm installer -- it determines > > perl library dependencies at install-time rather than requiring you to > > explicitly specify perl packages in the rpm metafiles (aka specfile). > > > What are you using to generate the specfiles in the first place? Are > you using cpan2rpm? > > > so, for instance, if i i tried to install perl-Generic-Genome-Browser, > > i > > might get an error like: > > > > requires perl(Bio::Root::Root) > > > > which could be removed by one of: > > > > (1) installing the perl-bioperl package > > (2) installing bioperl from cvs > > (3) installing bioperl from cpan > > > > there may be a way to code into the metafile to ignore missing perl > > dependencies detected in the installation process -- i need to look > > into > > this. > > > >> BTW DBD::Oracle is on CPAN. I thought that would make it easy to > >> construct an RPM? (There's few if any binaries though - for a reason. > >> Compiling DBD::Oracle may be a charm on some but involve some major > >> tweaking on other platforms. I've been there multiple times, I know > >> what I'm talking about.) > > > > given what i've said above, if i had a DBD::Oracle perl module > > installed, > > it would prevent rpm from throwing errors about missing dependency > > "perl(DBD::Oracle)". however, i can't build DBD::Oracle into an rpm > > because the make process links to the oracle headers and .so files. > > the > > DBD::Oracle can be made w/o having explicit dependencies on the oracle > > binary install, so it would install on a machine that didn't have > > oracle > > installed (but wouldn't work). so as far as a bioperl-db rpm goes, > > here > > are the options i'm looking into: > > > > (1) get a binary perl-DBD-Oracle rpm built by someone with Oracle, > > leaving out the binary Oracle file dependency. distribute > > bioperl-db from cvs as-is > > (2) patch Oracle classes out of bioperl-db as part of the rpm build > > process. distribute modified bioperl-db. > > (3) modify rpm "detection of installed perl modules" functionality > > to have rpm explicitly ignore missing DBD::Oracle dependency. > > > > (1) and (2) will definitely work. i don't yet know the feasibility of > > (3). > > > > -allen > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > From hlapp at gnf.org Fri Jan 28 20:03:14 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Fri Jan 28 19:59:07 2005 Subject: [Bioperl-l] Re: RPMs for Bioperl and GMOD In-Reply-To: References: <8C2D5F1C-7176-11D9-9251-000A959EB4C4@gnf.org> Message-ID: <8DA2A4A6-7191-11D9-8A2B-000A95AE92B0@gnf.org> BTW, %define _use_internal_dependency_generator 0 is not an option? -hilmar On Jan 28, 2005, at 4:49 PM, Allen Day wrote: > okay, i've looked into this. short answer: you cannot specify to omit > automatically determined dependencies without "lying" in the rpm > specfile > and stating that a package provides a perl module that it, in fact, > does > not. > > for example, i can add a statement to the bioperl-db rpm stating that > it > provides perl(DBD::Oracle), but not actually add DBD/Oracle.pm to the > package. there is a thread extensively discussing this aspect of the > rpm > build system here: > > http://www.redhat.com/archives/rpm-list/2004-February/msg00083.html > > if i'm making a package for private use only, i don't mind doing this, > but > if this package is to be for public consumption i don't want to lie > about > what is and is not provided. i take the same stance on all the other > perl > modules in the bioperl dependency tree, including esoteric modules > such as > Net::Jabber and GD::Graph3d. > > the only viable option i see here is to patch Oracle dependencies out > of > bioperl-db. that is what i will do until i have working Oracle and > perl-DBD-Oracle packages in-hand. > > -allen > > > On Fri, 28 Jan 2005, Hilmar Lapp wrote: > >> Like this statement or not, but I think installing all kinds of CPAN >> packages onto somebody's machine irrespective of whether somebody is >> ever going to use - or need - them, let alone them working in the >> first >> place due to compiled code dependencies being absent, is a really >> *bad* >> idea >> >> It basically defies the concept of modular packaging to begin with, >> and >> sounds way too intrusive for my taste. >> >> Unless I misunderstand what Jason is saying then this is not even >> necessary and is in no way an inherent shortcoming that inevitably >> comes with RPMs. So unless I'm missing something here I understand >> that >> Jason is saying you can have RPMs and still not litter your system >> with >> DBD::blah or other modules for which you don't even have the client >> libraries installed, and still be able to install those at a later >> time >> because the respective pieces of code have not been pruned (which I >> think is actually also a bad idea). >> >> -hilmar >> >> On Friday, January 28, 2005, at 11:50 AM, Allen Day wrote: >> >>>> Do you mean your RPM or bioperl-db on Oracle? I'm running the latter >>>> all the time. >>> >>> i mean the RPM. it is the same as bioperl-db cvs head as of last >>> night. >>> >>>>> I'd also like someone with Oracle to help me make a DBD::Oracle >>>>> rpm. >>>>> Having a DBD::Oracle RPM will allow me to leave the Oracle code in >>>>> Bioperl-DB. >>>> >>>> If installing the supposed DBD::Oracle is then a prerequisite for >>>> being >>>> able to install the rest, then you are taking the wrong path. >>>> DBD::Oracle itself will depend on the Oracle client libraries being >>>> installed which aren't even available on all platforms, aside from >>>> the >>>> fact that installing those is beyond your control and involves >>>> downloading about 350MB from OTN. >>>> >>>> Frankly, I can't believe that there is no way to specify >>>> dependencies >>>> that are optional. Why would you require all of DBD::mysql, DBD::Pg, >>>> and >>>> DBD::Oracle if all a persons wants is mysql?? All of these will link >>>> to >>>> compiled runtime libraries and why should a failure to install >>>> DBD::Pg >>>> be of any concern to someone who wants to use mysql? >>> >>> the problem is something internal to the rpm installer -- it >>> determines >>> perl library dependencies at install-time rather than requiring you >>> to >>> explicitly specify perl packages in the rpm metafiles (aka specfile). >>> >>> so, for instance, if i i tried to install >>> perl-Generic-Genome-Browser, >>> i >>> might get an error like: >>> >>> requires perl(Bio::Root::Root) >>> >>> which could be removed by one of: >>> >>> (1) installing the perl-bioperl package >>> (2) installing bioperl from cvs >>> (3) installing bioperl from cpan >>> >>> there may be a way to code into the metafile to ignore missing perl >>> dependencies detected in the installation process -- i need to look >>> into >>> this. >>> >>>> BTW DBD::Oracle is on CPAN. I thought that would make it easy to >>>> construct an RPM? (There's few if any binaries though - for a >>>> reason. >>>> Compiling DBD::Oracle may be a charm on some but involve some major >>>> tweaking on other platforms. I've been there multiple times, I know >>>> what I'm talking about.) >>> >>> given what i've said above, if i had a DBD::Oracle perl module >>> installed, >>> it would prevent rpm from throwing errors about missing dependency >>> "perl(DBD::Oracle)". however, i can't build DBD::Oracle into an rpm >>> because the make process links to the oracle headers and .so files. >>> the >>> DBD::Oracle can be made w/o having explicit dependencies on the >>> oracle >>> binary install, so it would install on a machine that didn't have >>> oracle >>> installed (but wouldn't work). so as far as a bioperl-db rpm goes, >>> here >>> are the options i'm looking into: >>> >>> (1) get a binary perl-DBD-Oracle rpm built by someone with Oracle, >>> leaving out the binary Oracle file dependency. distribute >>> bioperl-db from cvs as-is >>> (2) patch Oracle classes out of bioperl-db as part of the rpm build >>> process. distribute modified bioperl-db. >>> (3) modify rpm "detection of installed perl modules" functionality >>> to have rpm explicitly ignore missing DBD::Oracle dependency. >>> >>> (1) and (2) will definitely work. i don't yet know the feasibility >>> of >>> (3). >>> >>> -allen >>> >> >> -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From allenday at ucla.edu Fri Jan 28 20:09:11 2005 From: allenday at ucla.edu (Allen Day) Date: Fri Jan 28 20:05:25 2005 Subject: [Bioperl-l] Re: RPMs for Bioperl and GMOD In-Reply-To: <8DA2A4A6-7191-11D9-8A2B-000A95AE92B0@gnf.org> References: <8C2D5F1C-7176-11D9-9251-000A959EB4C4@gnf.org> <8DA2A4A6-7191-11D9-8A2B-000A95AE92B0@gnf.org> Message-ID: it's no different than lying about what the package provides. you're still going to have components in the package that will not function, because all package dependencies have not been installed. patching out the dependent code is really the most honest and least problematic solution here. -allen On Fri, 28 Jan 2005, Hilmar Lapp wrote: > BTW, > > %define _use_internal_dependency_generator 0 > > is not an option? > > -hilmar > > On Jan 28, 2005, at 4:49 PM, Allen Day wrote: > > > okay, i've looked into this. short answer: you cannot specify to omit > > automatically determined dependencies without "lying" in the rpm > > specfile > > and stating that a package provides a perl module that it, in fact, > > does > > not. > > > > for example, i can add a statement to the bioperl-db rpm stating that > > it > > provides perl(DBD::Oracle), but not actually add DBD/Oracle.pm to the > > package. there is a thread extensively discussing this aspect of the > > rpm > > build system here: > > > > http://www.redhat.com/archives/rpm-list/2004-February/msg00083.html > > > > if i'm making a package for private use only, i don't mind doing this, > > but > > if this package is to be for public consumption i don't want to lie > > about > > what is and is not provided. i take the same stance on all the other > > perl > > modules in the bioperl dependency tree, including esoteric modules > > such as > > Net::Jabber and GD::Graph3d. > > > > the only viable option i see here is to patch Oracle dependencies out > > of > > bioperl-db. that is what i will do until i have working Oracle and > > perl-DBD-Oracle packages in-hand. > > > > -allen > > > > > > On Fri, 28 Jan 2005, Hilmar Lapp wrote: > > > >> Like this statement or not, but I think installing all kinds of CPAN > >> packages onto somebody's machine irrespective of whether somebody is > >> ever going to use - or need - them, let alone them working in the > >> first > >> place due to compiled code dependencies being absent, is a really > >> *bad* > >> idea > >> > >> It basically defies the concept of modular packaging to begin with, > >> and > >> sounds way too intrusive for my taste. > >> > >> Unless I misunderstand what Jason is saying then this is not even > >> necessary and is in no way an inherent shortcoming that inevitably > >> comes with RPMs. So unless I'm missing something here I understand > >> that > >> Jason is saying you can have RPMs and still not litter your system > >> with > >> DBD::blah or other modules for which you don't even have the client > >> libraries installed, and still be able to install those at a later > >> time > >> because the respective pieces of code have not been pruned (which I > >> think is actually also a bad idea). > >> > >> -hilmar > >> > >> On Friday, January 28, 2005, at 11:50 AM, Allen Day wrote: > >> > >>>> Do you mean your RPM or bioperl-db on Oracle? I'm running the latter > >>>> all the time. > >>> > >>> i mean the RPM. it is the same as bioperl-db cvs head as of last > >>> night. > >>> > >>>>> I'd also like someone with Oracle to help me make a DBD::Oracle > >>>>> rpm. > >>>>> Having a DBD::Oracle RPM will allow me to leave the Oracle code in > >>>>> Bioperl-DB. > >>>> > >>>> If installing the supposed DBD::Oracle is then a prerequisite for > >>>> being > >>>> able to install the rest, then you are taking the wrong path. > >>>> DBD::Oracle itself will depend on the Oracle client libraries being > >>>> installed which aren't even available on all platforms, aside from > >>>> the > >>>> fact that installing those is beyond your control and involves > >>>> downloading about 350MB from OTN. > >>>> > >>>> Frankly, I can't believe that there is no way to specify > >>>> dependencies > >>>> that are optional. Why would you require all of DBD::mysql, DBD::Pg, > >>>> and > >>>> DBD::Oracle if all a persons wants is mysql?? All of these will link > >>>> to > >>>> compiled runtime libraries and why should a failure to install > >>>> DBD::Pg > >>>> be of any concern to someone who wants to use mysql? > >>> > >>> the problem is something internal to the rpm installer -- it > >>> determines > >>> perl library dependencies at install-time rather than requiring you > >>> to > >>> explicitly specify perl packages in the rpm metafiles (aka specfile). > >>> > >>> so, for instance, if i i tried to install > >>> perl-Generic-Genome-Browser, > >>> i > >>> might get an error like: > >>> > >>> requires perl(Bio::Root::Root) > >>> > >>> which could be removed by one of: > >>> > >>> (1) installing the perl-bioperl package > >>> (2) installing bioperl from cvs > >>> (3) installing bioperl from cpan > >>> > >>> there may be a way to code into the metafile to ignore missing perl > >>> dependencies detected in the installation process -- i need to look > >>> into > >>> this. > >>> > >>>> BTW DBD::Oracle is on CPAN. I thought that would make it easy to > >>>> construct an RPM? (There's few if any binaries though - for a > >>>> reason. > >>>> Compiling DBD::Oracle may be a charm on some but involve some major > >>>> tweaking on other platforms. I've been there multiple times, I know > >>>> what I'm talking about.) > >>> > >>> given what i've said above, if i had a DBD::Oracle perl module > >>> installed, > >>> it would prevent rpm from throwing errors about missing dependency > >>> "perl(DBD::Oracle)". however, i can't build DBD::Oracle into an rpm > >>> because the make process links to the oracle headers and .so files. > >>> the > >>> DBD::Oracle can be made w/o having explicit dependencies on the > >>> oracle > >>> binary install, so it would install on a machine that didn't have > >>> oracle > >>> installed (but wouldn't work). so as far as a bioperl-db rpm goes, > >>> here > >>> are the options i'm looking into: > >>> > >>> (1) get a binary perl-DBD-Oracle rpm built by someone with Oracle, > >>> leaving out the binary Oracle file dependency. distribute > >>> bioperl-db from cvs as-is > >>> (2) patch Oracle classes out of bioperl-db as part of the rpm build > >>> process. distribute modified bioperl-db. > >>> (3) modify rpm "detection of installed perl modules" functionality > >>> to have rpm explicitly ignore missing DBD::Oracle dependency. > >>> > >>> (1) and (2) will definitely work. i don't yet know the feasibility > >>> of > >>> (3). > >>> > >>> -allen > >>> > >> > >> > From allenday at ucla.edu Fri Jan 28 20:15:31 2005 From: allenday at ucla.edu (Allen Day) Date: Fri Jan 28 20:11:29 2005 Subject: [Bioperl-l] Re: RPMs for Bioperl and GMOD In-Reply-To: <2BB48BC3-7191-11D9-8A2B-000A95AE92B0@gnf.org> References: <8C2D5F1C-7176-11D9-9251-000A959EB4C4@gnf.org> <2BB48BC3-7191-11D9-8A2B-000A95AE92B0@gnf.org> Message-ID: On Fri, 28 Jan 2005, Hilmar Lapp wrote: > Ah - I think I misunderstood Jason - he probably meant when installing > the RPM you ignore certain dependencies? So why don't you allow people > to ignore dependencies? you can force rpms to ignore dependencies and install anyway. if you're trying to make a purely rpm-maintained system though, this leads to missing package dependency problems further down the road... in my experience force installing packages generally causes bigger problems than it's worth. > I'll stick to my guns here that I don't think this is a good approach, > and not just due to DBD::Oracle. Why do you want somebody to install > DBD::Pg if she doesn't have or intend to use PostgreSQL? What are you > going to tell a sysadmin who wants a clean system? Why install all kinds > of esoteric packages into somebody's perl installation without even > asking, and even without some of them working?? Why does CPAN ask before > it gets and installs a dependency? if a user is fortunate enough to be in the position to have a system administrator to do these installation and configuration tasks for them, by all means they should do a source-based install or resolve the dependencies in some other way. but for the graduate student that just wants to get gbrowse up and running to visualize some data he's working with, saving several hours interacting with CPAN, make, gcc, autoconf, tweaking configuration files, etc in exchange for installation of an extra module or two might sound like a good deal. -Allen > My opinion anyways, and I'll shut up with this. > > -hilmar > > On Jan 28, 2005, at 4:49 PM, Allen Day wrote: > > > okay, i've looked into this. short answer: you cannot specify to omit > > automatically determined dependencies without "lying" in the rpm > > specfile > > and stating that a package provides a perl module that it, in fact, > > does > > not. > > > > for example, i can add a statement to the bioperl-db rpm stating that > > it > > provides perl(DBD::Oracle), but not actually add DBD/Oracle.pm to the > > package. there is a thread extensively discussing this aspect of the > > rpm > > build system here: > > > > http://www.redhat.com/archives/rpm-list/2004-February/msg00083.html > > > > if i'm making a package for private use only, i don't mind doing this, > > but > > if this package is to be for public consumption i don't want to lie > > about > > what is and is not provided. i take the same stance on all the other > > perl > > modules in the bioperl dependency tree, including esoteric modules > > such as > > Net::Jabber and GD::Graph3d. > > > > the only viable option i see here is to patch Oracle dependencies out > > of > > bioperl-db. that is what i will do until i have working Oracle and > > perl-DBD-Oracle packages in-hand. > > > > -allen > > > > > > On Fri, 28 Jan 2005, Hilmar Lapp wrote: > > > >> Like this statement or not, but I think installing all kinds of CPAN > >> packages onto somebody's machine irrespective of whether somebody is > >> ever going to use - or need - them, let alone them working in the > >> first > >> place due to compiled code dependencies being absent, is a really > >> *bad* > >> idea > >> > >> It basically defies the concept of modular packaging to begin with, > >> and > >> sounds way too intrusive for my taste. > >> > >> Unless I misunderstand what Jason is saying then this is not even > >> necessary and is in no way an inherent shortcoming that inevitably > >> comes with RPMs. So unless I'm missing something here I understand > >> that > >> Jason is saying you can have RPMs and still not litter your system > >> with > >> DBD::blah or other modules for which you don't even have the client > >> libraries installed, and still be able to install those at a later > >> time > >> because the respective pieces of code have not been pruned (which I > >> think is actually also a bad idea). > >> > >> -hilmar > >> > >> On Friday, January 28, 2005, at 11:50 AM, Allen Day wrote: > >> > >>>> Do you mean your RPM or bioperl-db on Oracle? I'm running the latter > >>>> all the time. > >>> > >>> i mean the RPM. it is the same as bioperl-db cvs head as of last > >>> night. > >>> > >>>>> I'd also like someone with Oracle to help me make a DBD::Oracle > >>>>> rpm. > >>>>> Having a DBD::Oracle RPM will allow me to leave the Oracle code in > >>>>> Bioperl-DB. > >>>> > >>>> If installing the supposed DBD::Oracle is then a prerequisite for > >>>> being > >>>> able to install the rest, then you are taking the wrong path. > >>>> DBD::Oracle itself will depend on the Oracle client libraries being > >>>> installed which aren't even available on all platforms, aside from > >>>> the > >>>> fact that installing those is beyond your control and involves > >>>> downloading about 350MB from OTN. > >>>> > >>>> Frankly, I can't believe that there is no way to specify > >>>> dependencies > >>>> that are optional. Why would you require all of DBD::mysql, DBD::Pg, > >>>> and > >>>> DBD::Oracle if all a persons wants is mysql?? All of these will link > >>>> to > >>>> compiled runtime libraries and why should a failure to install > >>>> DBD::Pg > >>>> be of any concern to someone who wants to use mysql? > >>> > >>> the problem is something internal to the rpm installer -- it > >>> determines > >>> perl library dependencies at install-time rather than requiring you > >>> to > >>> explicitly specify perl packages in the rpm metafiles (aka specfile). > >>> > >>> so, for instance, if i i tried to install > >>> perl-Generic-Genome-Browser, > >>> i > >>> might get an error like: > >>> > >>> requires perl(Bio::Root::Root) > >>> > >>> which could be removed by one of: > >>> > >>> (1) installing the perl-bioperl package > >>> (2) installing bioperl from cvs > >>> (3) installing bioperl from cpan > >>> > >>> there may be a way to code into the metafile to ignore missing perl > >>> dependencies detected in the installation process -- i need to look > >>> into > >>> this. > >>> > >>>> BTW DBD::Oracle is on CPAN. I thought that would make it easy to > >>>> construct an RPM? (There's few if any binaries though - for a > >>>> reason. > >>>> Compiling DBD::Oracle may be a charm on some but involve some major > >>>> tweaking on other platforms. I've been there multiple times, I know > >>>> what I'm talking about.) > >>> > >>> given what i've said above, if i had a DBD::Oracle perl module > >>> installed, > >>> it would prevent rpm from throwing errors about missing dependency > >>> "perl(DBD::Oracle)". however, i can't build DBD::Oracle into an rpm > >>> because the make process links to the oracle headers and .so files. > >>> the > >>> DBD::Oracle can be made w/o having explicit dependencies on the > >>> oracle > >>> binary install, so it would install on a machine that didn't have > >>> oracle > >>> installed (but wouldn't work). so as far as a bioperl-db rpm goes, > >>> here > >>> are the options i'm looking into: > >>> > >>> (1) get a binary perl-DBD-Oracle rpm built by someone with Oracle, > >>> leaving out the binary Oracle file dependency. distribute > >>> bioperl-db from cvs as-is > >>> (2) patch Oracle classes out of bioperl-db as part of the rpm build > >>> process. distribute modified bioperl-db. > >>> (3) modify rpm "detection of installed perl modules" functionality > >>> to have rpm explicitly ignore missing DBD::Oracle dependency. > >>> > >>> (1) and (2) will definitely work. i don't yet know the feasibility > >>> of > >>> (3). > >>> > >>> -allen > >>> > >> > >> > From perlguy at hotmail.com Sat Jan 29 16:34:04 2005 From: perlguy at hotmail.com (Philip Parker) Date: Sat Jan 29 16:30:53 2005 Subject: [Bioperl-l] Request for info on volunteering... Message-ID: I'm curious about volunteering for the BioPerl project. I have 4 years of professional Perl programming experience and have an interest in bioinformatics. Philip Parker - perlguy@hotmail.com From hlapp at gmx.net Sat Jan 29 18:19:25 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Jan 29 18:15:19 2005 Subject: [Bioperl-l] struggling with Bio::FeatureIO and Bio::SeqFeature::Annotated In-Reply-To: Message-ID: <374835EA-724C-11D9-A311-000A959EB4C4@gmx.net> On Tuesday, January 25, 2005, at 01:45 AM, Allen Day wrote: >> >> Also, do you think it will be possible to convert the >> Bio::SeqFeature::Annotated features into persistent ones so that >> these can be stored in BioSQL ? I'll try to test that out today. > > no idea. my guess is not without substantial effort. > There shouldn't be a problem to serialize them unless SeqFeature::Annotated does not implement SeqFeatureI. The problem is rather that you will get them out in a slightly different fashion. Provided my understanding of SeqFeature::Annotated is correct (which it may not be!) then all tags be treated (stored) equally as any others, unlike SeqFeature::Generic which has methods primary_tag and source_tag that store their values separately. So, upon retrieval of such a feature you would probably have the primary_tag and source_tag values in the tag/value system as well. This may or may not be an issue. Furthermore, SeqFeature::Annotated does away with tag/value plus annotation bundle and stores everything in the latter. Bioperl-db uses SeqFeature::AnnotationAdaptor to access a feature's tags and annotations as if there only was an annotation bundle, which is what SeqFeature::Annotated does too but AnnotationAdaptor assumes that the underlying SeqFeatureI implementation stores them separately. The result is that when you plug a SeqFeature::Annotated into SeqFeature::Annotation, every tag/value may be reported both by the plugged feature's get_tag_values() and annotation->get_Annotations() methods, which may lead to redundant storage (and retrieval). So at worst you may get duplication of all tag/value pairs for a feature. If you retrieve features directly (instead of automatically as those attached to the sequence you retrieved), then you may even be able to circumvent this problem by providing a SeqFeatureI factory that instantiates SeqFeature::Annotated instead of SeqFeature::Generic (which is the default). Bioperl-db will again set the tag/value properties through the AnnotationAdaptor, but if the plugged feature is a SeqFeature::Annotated instance, it may take care of the duplication because redundant set operations will probably overwrite the previous one (because everything is stored in the annotation bundle). Bottom line is, provided SeqFeature::Annotated implements SeqFeatureI it will be stored - just the result may have some redundancy in the annotation and tags. To know exactly it would need to be debugged, which I think nobody's done yet. Also, if I'm wrong w.r.t. SeqFeature::Annotated's behaviour, any education from its authors will be welcome ... -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From sutripa at vbi.vt.edu Sat Jan 29 23:17:58 2005 From: sutripa at vbi.vt.edu (Sucheta Tripathy) Date: Sat Jan 29 23:14:04 2005 Subject: [Bioperl-l] GD::Font does not work Message-ID: <1182.199.3.136.4.1107058678.squirrel@webmail.vbi.vt.edu> Hi, I searched the archive but could not find a solution to this problem. My apologies, if this problem has already been discussed. Recently some of my CGI scripts threw error message saying "premature end of script headers". Finally, I found out after running the script in commandline that, it gives a segmentation fault wherever there is a use of: GD::Font. example: my $image=GD::Image->new(); $image->string(gdLargeFont,.....) etc. After commenting these lines the script runs fine. I tried re-installing GD, but it's saying GD up to date. Is there a way around to make this working? Many thanks Sucheta -- Sucheta Tripathy Virginia Bioinformatics Institute Phase-I Washington street. Virginia Tech. Blacksburg,VA 24061-0447 phone:(540)231-8138 Fax: (540) 231-2606 From Guillaume.Rousse at inria.fr Sun Jan 30 12:23:26 2005 From: Guillaume.Rousse at inria.fr (Guillaume Rousse) Date: Sun Jan 30 12:19:24 2005 Subject: [Bioperl-l] various problems with mdk bioperl package Message-ID: <41FD180E.4040103@inria.fr> Hello. I'm the maintainer for mdk bioperl packages. Here are some problems I had with latest release. First, one test fail as part of the whole rpm build process: t/SeqFeatCollection..........FAILED test 425 Failed 1/432 tests, 99.77% okay However, I'm unable to reproduce it manually issuing a 'make test' command in the build directory. perl -Iblib t/SeqFeatCollection.t -> OK Using BIOPERLDEBUG to have more verbose output, other tests fails: not ok 421 # Test 421 got: '0' (t/SeqFeatCollection.t at line 156 fail #406) # Expected: '6' not ok 423 # Test 423 got: '0' (t/SeqFeatCollection.t at line 156 fail #408) # Expected: '4' This is perl 5.8.6, without thread support, on mandrake cooker. There is no special environment variable used during package building that could explain the different results. Second, man page generation is disabled by default, using some strange construction in Makefile.PL: sub MY::manifypods { my $self = shift; #print STDERR "In manifypods moment\n"; if( 1 ) { return "\nmanifypods : pure_all\n\t$self->{NOECHO}\$(NOOP)\n" } else { return $self->SUPER::manifypods(@_); } } If the goal is just to avoid man page generation, why not INSTALLMAN3DIR => undef ? Third, I'd like to split the package a little bit, to avoid drawing so much dependencies. Here is the whole list of external dependencies of the current package, as automatically computed by rpm: perl(CGI) perl(CGI::Carp) perl(Cache::FileCache) perl(Carp) perl(Class::AutoClass) perl(Clone) perl(DBI) perl(DB_File) perl(Data::Dumper) perl(Data::Stag) perl(Data::Stag::XMLWriter) perl(Digest::MD5) perl(Dumpvalue) perl(English) perl(Error) perl(Exporter) perl(Fcntl) perl(File::Basename) perl(File::Path) perl(File::Spec) perl(File::Temp) perl(FileHandle) perl(GD) perl(Getopt::Long) perl(Getopt::Std) perl(HTML::Entities) perl(HTML::HeadParser) perl(HTML::Parser) perl(HTTP::Request::Common) perl(HTTP::Response) perl(IO::File) perl(IO::Handle) perl(IO::Socket) perl(IO::String) perl(LWP) perl(LWP::Simple) perl(LWP::UserAgent) perl(Math::BigFloat) perl(POSIX) perl(Pod::Usage) perl(SOAP::Lite) perl(SVG::Graph) perl(SVG::Graph::Data) perl(SVG::Graph::Data::Node) perl(SVG::Graph::Data::Tree) perl(Storable) perl(Symbol) perl(TestInterface) perl(Text::Shellwords) perl(Text::Wrap) perl(Tie::Handle) perl(Tie::RefHash) perl(Tree::DAG_Node) perl(UNIVERSAL) perl(URI) perl(URI::Escape) perl(XML::DOM) perl(XML::DOM::XPath) perl(XML::Handler::Subs) perl(XML::Parser) perl(XML::Parser::PerlSAX) perl(XML::SAX) perl(XML::SAX::Base) perl(XML::SAX::Writer) perl(XML::Twig) perl(XML::Writer) >= 0.4 Just having the Bio::DB branch in a subpackage would be enough to avoid a mandatory dependency on Ace. What else could I split ? Fourth, they are still two scripts in the main bioperl archive relying on bioperl-run: bp_pairwise_kaks.pl and bp_blast2tree.pl. They should really be moved there, to avoid circular dependencies. -- The engine falls out of the car the day after the warranty expires -- Murphy's Driving Laws n?18 From allenday at ucla.edu Sun Jan 30 15:26:57 2005 From: allenday at ucla.edu (Allen Day) Date: Sun Jan 30 15:22:54 2005 Subject: [Bioperl-l] various problems with mdk bioperl package In-Reply-To: <41FD180E.4040103@inria.fr> References: <41FD180E.4040103@inria.fr> Message-ID: I was thinking about this as well. I agree that bioperl-db requiring scripts should be moved out of the bioperl-live repository. We might also think about distributing a bioperl-core package that contains Bio::Root::*, and separate packages for each of the *IO subsystems (SeqIO, SearchIO, FeatureIO, etc). -Allen > Third, I'd like to split the package a little bit, to avoid drawing so > much dependencies. Here is the whole list of external dependencies of > the current package, as automatically computed by rpm: > perl(CGI) > perl(CGI::Carp) > perl(Cache::FileCache) > perl(Carp) > perl(Class::AutoClass) > perl(Clone) > perl(DBI) > perl(DB_File) > perl(Data::Dumper) > perl(Data::Stag) > perl(Data::Stag::XMLWriter) > perl(Digest::MD5) > perl(Dumpvalue) > perl(English) > perl(Error) > perl(Exporter) > perl(Fcntl) > perl(File::Basename) > perl(File::Path) > perl(File::Spec) > perl(File::Temp) > perl(FileHandle) > perl(GD) > perl(Getopt::Long) > perl(Getopt::Std) > perl(HTML::Entities) > perl(HTML::HeadParser) > perl(HTML::Parser) > perl(HTTP::Request::Common) > perl(HTTP::Response) > perl(IO::File) > perl(IO::Handle) > perl(IO::Socket) > perl(IO::String) > perl(LWP) > perl(LWP::Simple) > perl(LWP::UserAgent) > perl(Math::BigFloat) > perl(POSIX) > perl(Pod::Usage) > perl(SOAP::Lite) > perl(SVG::Graph) > perl(SVG::Graph::Data) > perl(SVG::Graph::Data::Node) > perl(SVG::Graph::Data::Tree) > perl(Storable) > perl(Symbol) > perl(TestInterface) > perl(Text::Shellwords) > perl(Text::Wrap) > perl(Tie::Handle) > perl(Tie::RefHash) > perl(Tree::DAG_Node) > perl(UNIVERSAL) > perl(URI) > perl(URI::Escape) > perl(XML::DOM) > perl(XML::DOM::XPath) > perl(XML::Handler::Subs) > perl(XML::Parser) > perl(XML::Parser::PerlSAX) > perl(XML::SAX) > perl(XML::SAX::Base) > perl(XML::SAX::Writer) > perl(XML::Twig) > perl(XML::Writer) >= 0.4 From hlapp at gmx.net Sun Jan 30 21:35:36 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun Jan 30 21:31:33 2005 Subject: [Bioperl-l] various problems with mdk bioperl package In-Reply-To: Message-ID: I think you may be confusing Bio::DB::GFF with Bio::DB::BioSQL. AFAIK there hasn't been any test in bioperl-live that would require bioperl-db since a long time. -hilmar On Sunday, January 30, 2005, at 12:26 PM, Allen Day wrote: > I was thinking about this as well. I agree that bioperl-db requiring > scripts should be moved out of the bioperl-live repository. > > We might also think about distributing a bioperl-core package that > contains Bio::Root::*, and separate packages for each of the *IO > subsystems (SeqIO, SearchIO, FeatureIO, etc). > > -Allen > > >> Third, I'd like to split the package a little bit, to avoid drawing so >> much dependencies. Here is the whole list of external dependencies of >> the current package, as automatically computed by rpm: >> perl(CGI) >> perl(CGI::Carp) >> perl(Cache::FileCache) >> perl(Carp) >> perl(Class::AutoClass) >> perl(Clone) >> perl(DBI) >> perl(DB_File) >> perl(Data::Dumper) >> perl(Data::Stag) >> perl(Data::Stag::XMLWriter) >> perl(Digest::MD5) >> perl(Dumpvalue) >> perl(English) >> perl(Error) >> perl(Exporter) >> perl(Fcntl) >> perl(File::Basename) >> perl(File::Path) >> perl(File::Spec) >> perl(File::Temp) >> perl(FileHandle) >> perl(GD) >> perl(Getopt::Long) >> perl(Getopt::Std) >> perl(HTML::Entities) >> perl(HTML::HeadParser) >> perl(HTML::Parser) >> perl(HTTP::Request::Common) >> perl(HTTP::Response) >> perl(IO::File) >> perl(IO::Handle) >> perl(IO::Socket) >> perl(IO::String) >> perl(LWP) >> perl(LWP::Simple) >> perl(LWP::UserAgent) >> perl(Math::BigFloat) >> perl(POSIX) >> perl(Pod::Usage) >> perl(SOAP::Lite) >> perl(SVG::Graph) >> perl(SVG::Graph::Data) >> perl(SVG::Graph::Data::Node) >> perl(SVG::Graph::Data::Tree) >> perl(Storable) >> perl(Symbol) >> perl(TestInterface) >> perl(Text::Shellwords) >> perl(Text::Wrap) >> perl(Tie::Handle) >> perl(Tie::RefHash) >> perl(Tree::DAG_Node) >> perl(UNIVERSAL) >> perl(URI) >> perl(URI::Escape) >> perl(XML::DOM) >> perl(XML::DOM::XPath) >> perl(XML::Handler::Subs) >> perl(XML::Parser) >> perl(XML::Parser::PerlSAX) >> perl(XML::SAX) >> perl(XML::SAX::Base) >> perl(XML::SAX::Writer) >> perl(XML::Twig) >> perl(XML::Writer) >= 0.4 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Sun Jan 30 23:10:49 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun Jan 30 23:07:20 2005 Subject: [Bioperl-l] struggling with Bio::FeatureIO and Bio::SeqFeature::Annotated In-Reply-To: Message-ID: <172B7F31-733E-11D9-BDB0-000A959EB4C4@gmx.net> On Tuesday, January 25, 2005, at 01:45 AM, Allen Day wrote: >>> because Bio::SeqFeautre::Annotated holds annotations as >>> objects pointers >>> rather than strings. We can fix this with a stringification >>> overload, but I noticed that the code exists to do this in the >>> Bio::Annotation::* >>> classes but is commented out, and I'm not sure why. Maybe >>> Hilmar can shed some light on this. >>> sorry I think I missed this. I don't know what pieces of code you're talking about, so I can't shed light either. Where did you see the commented out stringification overload? I checked SimpleValue and couldn't see anything. Generally, I'd comment that if a method is supposed to return an array of strings but in violation returns an array of objects, then adding stringification overload to the returned objects' implementations is the wrong strategy. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From lstein at cshl.edu Mon Jan 31 10:21:15 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Mon Jan 31 13:16:01 2005 Subject: [Bioperl-l] Re: [GMOD-devel] Re: RPMs for Bioperl and GMOD In-Reply-To: <8C2D5F1C-7176-11D9-9251-000A959EB4C4@gnf.org> References: <8C2D5F1C-7176-11D9-9251-000A959EB4C4@gnf.org> Message-ID: <200501311021.16482.lstein@cshl.edu> I agree with Hilmar's sentiment. If it is possible, the RPMs should only install what is necessary to bring up the core functionality of the modules in question. Lincoln On Friday 28 January 2005 04:49 pm, Hilmar Lapp wrote: > Like this statement or not, but I think installing all kinds of > CPAN packages onto somebody's machine irrespective of whether > somebody is ever going to use - or need - them, let alone them > working in the first place due to compiled code dependencies being > absent, is a really *bad* idea > > It basically defies the concept of modular packaging to begin with, > and sounds way too intrusive for my taste. > > Unless I misunderstand what Jason is saying then this is not even > necessary and is in no way an inherent shortcoming that inevitably > comes with RPMs. So unless I'm missing something here I understand > that Jason is saying you can have RPMs and still not litter your > system with DBD::blah or other modules for which you don't even > have the client libraries installed, and still be able to install > those at a later time because the respective pieces of code have > not been pruned (which I think is actually also a bad idea). > > -hilmar > > On Friday, January 28, 2005, at 11:50 AM, Allen Day wrote: > >> Do you mean your RPM or bioperl-db on Oracle? I'm running the > >> latter all the time. > > > > i mean the RPM. it is the same as bioperl-db cvs head as of last > > night. > > > >>> I'd also like someone with Oracle to help me make a DBD::Oracle > >>> rpm. Having a DBD::Oracle RPM will allow me to leave the Oracle > >>> code in Bioperl-DB. > >> > >> If installing the supposed DBD::Oracle is then a prerequisite > >> for being > >> able to install the rest, then you are taking the wrong path. > >> DBD::Oracle itself will depend on the Oracle client libraries > >> being installed which aren't even available on all platforms, > >> aside from the fact that installing those is beyond your control > >> and involves downloading about 350MB from OTN. > >> > >> Frankly, I can't believe that there is no way to specify > >> dependencies that are optional. Why would you require all of > >> DBD::mysql, DBD::Pg, and > >> DBD::Oracle if all a persons wants is mysql?? All of these will > >> link to > >> compiled runtime libraries and why should a failure to install > >> DBD::Pg be of any concern to someone who wants to use mysql? > > > > the problem is something internal to the rpm installer -- it > > determines perl library dependencies at install-time rather than > > requiring you to explicitly specify perl packages in the rpm > > metafiles (aka specfile). > > > > so, for instance, if i i tried to install > > perl-Generic-Genome-Browser, i > > might get an error like: > > > > requires perl(Bio::Root::Root) > > > > which could be removed by one of: > > > > (1) installing the perl-bioperl package > > (2) installing bioperl from cvs > > (3) installing bioperl from cpan > > > > there may be a way to code into the metafile to ignore missing > > perl dependencies detected in the installation process -- i need > > to look into > > this. > > > >> BTW DBD::Oracle is on CPAN. I thought that would make it easy to > >> construct an RPM? (There's few if any binaries though - for a > >> reason. Compiling DBD::Oracle may be a charm on some but involve > >> some major tweaking on other platforms. I've been there multiple > >> times, I know what I'm talking about.) > > > > given what i've said above, if i had a DBD::Oracle perl module > > installed, > > it would prevent rpm from throwing errors about missing > > dependency "perl(DBD::Oracle)". however, i can't build > > DBD::Oracle into an rpm because the make process links to the > > oracle headers and .so files. the > > DBD::Oracle can be made w/o having explicit dependencies on the > > oracle binary install, so it would install on a machine that > > didn't have oracle > > installed (but wouldn't work). so as far as a bioperl-db rpm > > goes, here > > are the options i'm looking into: > > > > (1) get a binary perl-DBD-Oracle rpm built by someone with > > Oracle, leaving out the binary Oracle file dependency. > > distribute bioperl-db from cvs as-is > > (2) patch Oracle classes out of bioperl-db as part of the rpm > > build process. distribute modified bioperl-db. > > (3) modify rpm "detection of installed perl modules" > > functionality to have rpm explicitly ignore missing DBD::Oracle > > dependency. > > > > (1) and (2) will definitely work. i don't yet know the > > feasibility of (3). > > > > -allen -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 NOTE: Please copy Sandra Michelsen on all emails regarding scheduling and other time-critical topics. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050131/e98fc593/attachment-0001.bin From lstein at cshl.edu Mon Jan 31 10:35:40 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Mon Jan 31 13:16:05 2005 Subject: [Bioperl-l] Re: RPMs for Bioperl and GMOD In-Reply-To: References: <8C2D5F1C-7176-11D9-9251-000A959EB4C4@gnf.org> Message-ID: <200501311035.40955.lstein@cshl.edu> Perhaps we should split the modules into bioperl-db and bioperl-db-oracle. And so forth. Lincoln On Friday 28 January 2005 07:49 pm, Allen Day wrote: > okay, i've looked into this. short answer: you cannot specify to > omit automatically determined dependencies without "lying" in the > rpm specfile and stating that a package provides a perl module that > it, in fact, does not. > > for example, i can add a statement to the bioperl-db rpm stating > that it provides perl(DBD::Oracle), but not actually add > DBD/Oracle.pm to the package. there is a thread extensively > discussing this aspect of the rpm build system here: > > http://www.redhat.com/archives/rpm-list/2004-February/msg00083.html > > if i'm making a package for private use only, i don't mind doing > this, but if this package is to be for public consumption i don't > want to lie about what is and is not provided. i take the same > stance on all the other perl modules in the bioperl dependency > tree, including esoteric modules such as Net::Jabber and > GD::Graph3d. > > the only viable option i see here is to patch Oracle dependencies > out of bioperl-db. that is what i will do until i have working > Oracle and perl-DBD-Oracle packages in-hand. > > -allen > > On Fri, 28 Jan 2005, Hilmar Lapp wrote: > > Like this statement or not, but I think installing all kinds of > > CPAN packages onto somebody's machine irrespective of whether > > somebody is ever going to use - or need - them, let alone them > > working in the first place due to compiled code dependencies > > being absent, is a really *bad* idea > > > > It basically defies the concept of modular packaging to begin > > with, and sounds way too intrusive for my taste. > > > > Unless I misunderstand what Jason is saying then this is not even > > necessary and is in no way an inherent shortcoming that > > inevitably comes with RPMs. So unless I'm missing something here > > I understand that Jason is saying you can have RPMs and still not > > litter your system with DBD::blah or other modules for which you > > don't even have the client libraries installed, and still be able > > to install those at a later time because the respective pieces of > > code have not been pruned (which I think is actually also a bad > > idea). > > > > -hilmar > > > > On Friday, January 28, 2005, at 11:50 AM, Allen Day wrote: > > >> Do you mean your RPM or bioperl-db on Oracle? I'm running the > > >> latter all the time. > > > > > > i mean the RPM. it is the same as bioperl-db cvs head as of > > > last night. > > > > > >>> I'd also like someone with Oracle to help me make a > > >>> DBD::Oracle rpm. Having a DBD::Oracle RPM will allow me to > > >>> leave the Oracle code in Bioperl-DB. > > >> > > >> If installing the supposed DBD::Oracle is then a prerequisite > > >> for being > > >> able to install the rest, then you are taking the wrong path. > > >> DBD::Oracle itself will depend on the Oracle client libraries > > >> being installed which aren't even available on all platforms, > > >> aside from the fact that installing those is beyond your > > >> control and involves downloading about 350MB from OTN. > > >> > > >> Frankly, I can't believe that there is no way to specify > > >> dependencies that are optional. Why would you require all of > > >> DBD::mysql, DBD::Pg, and > > >> DBD::Oracle if all a persons wants is mysql?? All of these > > >> will link to > > >> compiled runtime libraries and why should a failure to install > > >> DBD::Pg be of any concern to someone who wants to use mysql? > > > > > > the problem is something internal to the rpm installer -- it > > > determines perl library dependencies at install-time rather > > > than requiring you to explicitly specify perl packages in the > > > rpm metafiles (aka specfile). > > > > > > so, for instance, if i i tried to install > > > perl-Generic-Genome-Browser, i > > > might get an error like: > > > > > > requires perl(Bio::Root::Root) > > > > > > which could be removed by one of: > > > > > > (1) installing the perl-bioperl package > > > (2) installing bioperl from cvs > > > (3) installing bioperl from cpan > > > > > > there may be a way to code into the metafile to ignore missing > > > perl dependencies detected in the installation process -- i > > > need to look into > > > this. > > > > > >> BTW DBD::Oracle is on CPAN. I thought that would make it easy > > >> to construct an RPM? (There's few if any binaries though - for > > >> a reason. Compiling DBD::Oracle may be a charm on some but > > >> involve some major tweaking on other platforms. I've been > > >> there multiple times, I know what I'm talking about.) > > > > > > given what i've said above, if i had a DBD::Oracle perl module > > > installed, > > > it would prevent rpm from throwing errors about missing > > > dependency "perl(DBD::Oracle)". however, i can't build > > > DBD::Oracle into an rpm because the make process links to the > > > oracle headers and .so files. the > > > DBD::Oracle can be made w/o having explicit dependencies on the > > > oracle binary install, so it would install on a machine that > > > didn't have oracle > > > installed (but wouldn't work). so as far as a bioperl-db rpm > > > goes, here > > > are the options i'm looking into: > > > > > > (1) get a binary perl-DBD-Oracle rpm built by someone with > > > Oracle, leaving out the binary Oracle file dependency. > > > distribute bioperl-db from cvs as-is > > > (2) patch Oracle classes out of bioperl-db as part of the rpm > > > build process. distribute modified bioperl-db. > > > (3) modify rpm "detection of installed perl modules" > > > functionality to have rpm explicitly ignore missing DBD::Oracle > > > dependency. > > > > > > (1) and (2) will definitely work. i don't yet know the > > > feasibility of (3). > > > > > > -allen -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 NOTE: Please copy Sandra Michelsen on all emails regarding scheduling and other time-critical topics. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050131/79d67bf9/attachment-0001.bin From lstein at cshl.edu Mon Jan 31 10:52:14 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Mon Jan 31 13:16:09 2005 Subject: [Bioperl-l] GD::Font does not work In-Reply-To: <1182.199.3.136.4.1107058678.squirrel@webmail.vbi.vt.edu> References: <1182.199.3.136.4.1107058678.squirrel@webmail.vbi.vt.edu> Message-ID: <200501311052.14693.lstein@cshl.edu> Try removing old versions of libgd (the C library, not the perl module) and installing libgd 2.0.33 or higher. Lincoln On Saturday 29 January 2005 11:17 pm, Sucheta Tripathy wrote: > Hi, > > I searched the archive but could not find a solution to this > problem. My apologies, if this problem has already been discussed. > > Recently some of my CGI scripts threw error message saying > "premature end of script headers". Finally, I found out after > running the script in commandline that, it gives a segmentation > fault wherever there is a use of: GD::Font. > > example: > my $image=GD::Image->new(); > $image->string(gdLargeFont,.....) etc. > > After commenting these lines the script runs fine. > > I tried re-installing GD, but it's saying GD up to date. > > Is there a way around to make this working? > > Many thanks > > Sucheta -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 NOTE: Please copy Sandra Michelsen on all emails regarding scheduling and other time-critical topics. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050131/01c1569a/attachment-0002.bin From lstein at cshl.edu Mon Jan 31 10:52:14 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Mon Jan 31 13:16:12 2005 Subject: [Bioperl-l] GD::Font does not work In-Reply-To: <1182.199.3.136.4.1107058678.squirrel@webmail.vbi.vt.edu> References: <1182.199.3.136.4.1107058678.squirrel@webmail.vbi.vt.edu> Message-ID: <200501311052.14693.lstein@cshl.edu> Try removing old versions of libgd (the C library, not the perl module) and installing libgd 2.0.33 or higher. Lincoln On Saturday 29 January 2005 11:17 pm, Sucheta Tripathy wrote: > Hi, > > I searched the archive but could not find a solution to this > problem. My apologies, if this problem has already been discussed. > > Recently some of my CGI scripts threw error message saying > "premature end of script headers". Finally, I found out after > running the script in commandline that, it gives a segmentation > fault wherever there is a use of: GD::Font. > > example: > my $image=GD::Image->new(); > $image->string(gdLargeFont,.....) etc. > > After commenting these lines the script runs fine. > > I tried re-installing GD, but it's saying GD up to date. > > Is there a way around to make this working? > > Many thanks > > Sucheta -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 NOTE: Please copy Sandra Michelsen on all emails regarding scheduling and other time-critical topics. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050131/01c1569a/attachment-0003.bin From lstein at cshl.edu Mon Jan 31 12:00:11 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Mon Jan 31 13:16:15 2005 Subject: Fwd: Re: [Bioperl-l] GD::Font does not work Message-ID: <200501311200.11743.lstein@cshl.edu> This is a followup from Sucheta, who was able to fix the GD::Font issues by updating libgd (the C library, not the perl module). Lincoln ---------- Forwarded Message ---------- Subject: Re: [Bioperl-l] GD::Font does not work Date: Monday 31 January 2005 11:25 am From: Sucheta Tripathy To: Lincoln Stein Thanks, I already did that and it worked fine for me. Sucheta At 10:52 AM 1/31/2005 -0500, you wrote: >Try removing old versions of libgd (the C library, not the perl >module) and installing libgd 2.0.33 or higher. > >Lincoln > >On Saturday 29 January 2005 11:17 pm, Sucheta Tripathy wrote: > > Hi, > > > > I searched the archive but could not find a solution to this > > problem. My apologies, if this problem has already been > > discussed. > > > > Recently some of my CGI scripts threw error message saying > > "premature end of script headers". Finally, I found out after > > running the script in commandline that, it gives a segmentation > > fault wherever there is a use of: GD::Font. > > > > example: > > my $image=GD::Image->new(); > > $image->string(gdLargeFont,.....) etc. > > > > After commenting these lines the script runs fine. > > > > I tried re-installing GD, but it's saying GD up to date. > > > > Is there a way around to make this working? > > > > Many thanks > > > > Sucheta > >-- >Lincoln D. Stein >Cold Spring Harbor Laboratory >1 Bungtown Road >Cold Spring Harbor, NY 11724 > >NOTE: Please copy Sandra Michelsen on >all emails regarding scheduling and other time-critical topics. ------------------------------------------------------- -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 NOTE: Please copy Sandra Michelsen on all emails regarding scheduling and other time-critical topics. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050131/62578482/attachment-0001.bin From Guillaume.Rousse at inria.fr Mon Jan 31 15:45:32 2005 From: Guillaume.Rousse at inria.fr (Guillaume Rousse) Date: Mon Jan 31 15:42:23 2005 Subject: [Bioperl-l] Re: RPMs for Bioperl and GMOD In-Reply-To: <200501311035.40955.lstein@cshl.edu> References: <8C2D5F1C-7176-11D9-9251-000A959EB4C4@gnf.org> <200501311035.40955.lstein@cshl.edu> Message-ID: <41FE98EC.4010008@inria.fr> I'm taking the discussion in the middle, so I may be wrong... Lincoln Stein wrote: > Perhaps we should split the modules into bioperl-db and > bioperl-db-oracle. This isn't needed. Splitting a package into subpackages is a packager decision that doesn't rely on upstream developpers action. It would just bring everyone additional work. > And so forth. > > Lincoln > > > On Friday 28 January 2005 07:49 pm, Allen Day wrote: > >>okay, i've looked into this. short answer: you cannot specify to >>omit automatically determined dependencies without "lying" in the >>rpm specfile and stating that a package provides a perl module that >>it, in fact, does not. >> >>for example, i can add a statement to the bioperl-db rpm stating >>that it provides perl(DBD::Oracle), but not actually add >>DBD/Oracle.pm to the package. I don't think so. Unless this is a specific mdk rpm patch, you can always use exceptions to automatic requires/provides computing: %define _requires_exceptions perl(DBD::Oracle) And if it doesn't work, you can also disable completly automatic dependency computing: AutoReqProv: no BTW, why do you bother dealing with rpm when some distributions as Debian or Mandrake already provide official packages, and biolinux project provide Redhat and Suze packages too ? -- If you improve or tinker with something long enough, eventually it will break or malfunction -- Murphy's In Laws n?8 From hlapp at gnf.org Mon Jan 31 16:28:25 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Mon Jan 31 16:24:32 2005 Subject: [Bioperl-l] Re: RPMs for Bioperl and GMOD In-Reply-To: <200501311035.40955.lstein@cshl.edu> References: <8C2D5F1C-7176-11D9-9251-000A959EB4C4@gnf.org> <200501311035.40955.lstein@cshl.edu> Message-ID: <0A944023-73CF-11D9-9995-000A95AE92B0@gnf.org> On Jan 31, 2005, at 7:35 AM, Lincoln Stein wrote: > Perhaps we should split the modules into bioperl-db and > bioperl-db-oracle. > > And so forth. Sure you could ... but where do you draw the line? E.g., gbrowse-pgsql-png-no-gif-SVG-no-staden-Ace-berkeleyDB.rpm ... I mean, applying this to all dependencies will lead to permutations of several dependencies - which I'm not sure will further the goal of making it easier on the end user's end ... Just my two cents ... -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From allenday at ucla.edu Mon Jan 31 17:27:36 2005 From: allenday at ucla.edu (Allen Day) Date: Mon Jan 31 17:23:42 2005 Subject: [Bioperl-l] [Bioperl-guts-l] [Bug 1742] New: GFF parser messes attributes (fwd) Message-ID: Here's the stringification problem being discussed in another thread. It came up in a 1.5 branch bug report. Objections to putting the stringification overload back? -Allen ---------- Forwarded message ---------- Date: Mon, 31 Jan 2005 15:24:19 -0500 From: bugzilla-daemon@portal.open-bio.org To: bioperl-guts-l@bioperl.org Subject: [Bioperl-guts-l] [Bug 1742] New: GFF parser messes attributes http://bugzilla.open-bio.org/show_bug.cgi?id=1742 Summary: GFF parser messes attributes Product: Bioperl Version: 1.5 branch Platform: Macintosh OS/Version: MacOS X Status: NEW Severity: major Priority: P2 Component: Core Components AssignedTo: bioperl-guts-l@bioperl.org ReportedBy: jldai@yahoo.com In BioPerl 1.5.0, use Bio::Tools::GFF and Bio::SeqIO to paser GFF string: 8255763 tigrscan final-exon 67 558 56.8 - 2 transgrp "1001"; into an Bio::SeqIO object and later print out as embl file, resulting in: FT final-exon complement(67..558) FT /transgrp="Bio::Annotation::SimpleValue=HASH(0x93a5d8)" FT /note="score=56.8" FT /note="frame=2" The value of tag "transgrp" should be 1001. Same script worked fine in BioPerl-1.4 ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. _______________________________________________ Bioperl-guts-l mailing list Bioperl-guts-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l From ed at compbio.berkeley.edu Mon Jan 31 18:07:56 2005 From: ed at compbio.berkeley.edu (Ed Green) Date: Mon Jan 31 18:05:07 2005 Subject: [Bioperl-l] dssp script In-Reply-To: <41F13730.1080203@compbio.berkeley.edu> References: <5F7CE35370B6CF429AA3CA960ECC278001638D65@EXCHANGE2.charite.de> <41F13730.1080203@compbio.berkeley.edu> Message-ID: <41FEBA4C.3090906@compbio.berkeley.edu> Peter- Just checked in these fixes to Bio::Structure::SecStr::DSSP:Res.pm You may find the new residues() interator method useful. Regards, Ed Green Ed Green wrote: > Dear Peter, > > These two are in fact bugs that I will fix. The first results because of > the presence of 'termination residues' that don't have residue numbers. > Their residue numbers, then, can't be compared numerically. Fortunately, > this bug won't result in wrong results as we want this comparison to > always be false anyway. The solution to this is to first check if either > of the termination residue signals are set and if so, don't do this > numerical comparison. > > The second, blank line(s) at end of file will also be fixed. > > Beware that there is, I think, a bug in your script. It appears that you > are attempting to iterate over all residues. However, iterating A:1 .. > A:max doesn't get it done because of the crazy way residues can be > numbered in PDB files: you'll miss all the residues with altloc codes > (A:27A, A:27B, A:27C, e.g.). > > To make this easy an iterator is called for. It will just return all > 'real' residues for the pdb file or for a specified chain - I'll try to > get that done this weekend. > > Regards, > Ed Green > > Robinson, Peter wrote: > >> Dear BioPerlers, >> >> I am writing a script to use the BioPerl DSSP module to print out a >> list of phi and psi angles for all applicable residues of all chains. >> Although the results are correct, I get the following error message at >> the end of each chain: >> >> Argument "" isn't numeric in numeric eq (==) at >> /usr/local/share/perl/5.8.4/Bio/Structure/SecStr/DSSP/Res.pm line 1168. >> >> and I am not quite sure where it is coming from. Perhaps I am using >> the wrong part of the API, but I am trying to get a list of all >> residues for each chain as follows: >> >> foreach my $ch (@chains) { >> my $ss_elements_pts = $dssp->secBounds($ch); >> print "Chain $ch:\n"; >> my $pos = 0; >> my $max = 0; >> foreach my $stretch (@{$ss_elements_pts}) { >> my $start = $stretch->[0]; >> my $end = $stretch->[1]; if ($end =~ m/(\d+)/) { $end = $1; } >> if ($end > $max) { $max = $end; } >> } >> ## END is now the last residue in this chain >> for my $res (1..$max) { >> my $residueID = $res . ":" . $ch; >> my ($phi,$psi,$SS,$SSsum,$AA); >> eval { $phi = $dssp->resPhi($residueID);}; >> etc. >> >> The full script is appended to the bottom of this mail. >> >> >> I also noticed what might be a minor bug in the module DSSP/Res.pm; >> when I use dsspcmbi to analyze a PDB file, it produces a results file >> with an empty last line. This causes a crash: >> >> Use of uninitialized value in chomp at >> /usr/local/share/perl/5.8.4/Bio/Structure/SecStr/DSSP/Res.pm line >> 1284, line 955. >> >> >> If I manually remove this last empty line, there was no error. By >> adding the following line at Res.pm l.1284, you can fix the problem: >> >> >> while ( chomp( $cur = <$file> ) ) { >> next if ($cur =~ m/^\s*$/); >> ********************************************* >> $res_num = substr( $cur, 0, 5 ); >> $res_num =~ s/\s//g; >> $self->{ 'Res' }->[ $res_num ] = &_parseResLine( $cur ); >> } >> } >> >> >> >> >> Thanks in adavance for any tips! Peter >> Peter N. Robinson, M.D. >> Institute of Medical Genetics >> Charit? University Hospital >> Augustenburger Platz 1 >> 13353 Berlin >> Germany >> ++49-30-450 569124 >> peter.robinson@charite.de >> http://www.charite.de/ch/medgen/robinson >> Beware of bugs in the above code; I have only proved it correct, not >> tried it. -Donald Knuth, computer scientist (1938- ) >> >> ######################## >> >> #!/usr/bin/perl -w >> use IO::File; >> use Bio::Structure::SecStr::DSSP::Res; >> use Data::Dumper; >> >> >> =pod >> parseDSSP.pl >> Script to parse the output of DSSP using the BioPerl module >> Bio::Structure::SecStr::DSSP::Res. To use it, process a PDB >> file with dssp or dsspcmbi, and pass the resulting file to this >> script. For more information on dssp and BioPerl see the >> module documentation at http://bioperl.org >> >> @email peter.robinson@charite.de >> 21 January, 2005 >> >> =cut >> >> >> >> my $file = "pdb43ca.dssp"; >> my $dssp = new Bio::Structure::SecStr::DSSP::Res('-file'=> "$file"); >> >> my $pdbID = $dssp->pdbID(); >> my $auth = $dssp->pdbAuthor(); >> my $cmpd = $dssp->pdbCompound(); >> my $pdb_date = $dssp->pdbDate(); >> my $header = $dssp->pdbHeader(); >> my $pdbSource = $dssp->pdbSource(); >> >> print "PDB entry $pdbID \n\tauthor:\t$auth", >> "\n\tCompound:\t$cmpd", >> "\n\tDate:\t$pdb_date", >> "\n\tHeader:\t$header", >> "\n\tsource:\t$pdbSource\n\n"; >> >> my $totalRes = $dssp->numResidues(); >> print "Total residue count (all chains):$totalRes\n"; >> >> >> my $surArea= $dssp->totSurfArea(); >> print "Total accessible surface area:\t$surArea (square Ang)\n"; >> >> >> my $chainRef = $dssp->chains(); >> my @chains = sort @{$chainRef}; >> print "Chain[s]:\n"; >> foreach my $ch (@chains) { >> print "\t$ch"; >> } >> print "\n"; >> >> my $hb = $dssp->hBonds(); >> print "H BONDS.\n"; >> print "TYPE O(I)-->H-N(J): $hb->[0]\n", >> "IN PARALLEL BRIDGES: $hb->[1]\n", >> "IN ANTIPARALLEL BRIDGES $hb->[2]\n", >> "TYPE O(I)-->H-N(I-5) $hb->[3]\n", >> "TYPE O(I)-->H-N(I-4) $hb->[4]\n", >> "TYPE O(I)-->H-N(I-3) $hb->[5]\n", >> "TYPE O(I)-->H-N(I-2) $hb->[6]\n", >> "TYPE O(I)-->H-N(I-1) $hb->[7]\n", >> "TYPE O(I)-->H-N(I+0) $hb->[8]\n", >> "TYPE O(I)-->H-N(I+1) $hb->[9]\n", >> "TYPE O(I)-->H-N(I+2) $hb->[10]\n", >> "TYPE O(I)-->H-N(I+3) $hb->[11]\n", >> "TYPE O(I)-->H-N(I+4) $hb->[12]\n", >> "TYPE O(I)-->H-N(I+5) $hb->[13]\n", >> "\n"; >> >> >> foreach my $ch (@chains) { >> my $ss_elements_pts = $dssp->secBounds($ch); >> print "Chain $ch:\n"; >> my $pos = 0; >> my $max = 0; >> foreach my $stretch (@{$ss_elements_pts}) { >> my $start = $stretch->[0]; >> my $end = $stretch->[1]; if ($end =~ m/(\d+)/) { $end = $1; } >> if ($end > $max) { $max = $end; } >> } >> ## END is now the last residue in this chain >> for my $res (1..$max) { >> my $residueID = $res . ":" . $ch; >> my ($phi,$psi,$SS,$SSsum,$AA); >> eval { $phi = $dssp->resPhi($residueID);}; >> eval { $psi = $dssp->resPsi($residueID);}; >> eval { $SS = $dssp->resSecStr($residueID);}; >> eval { $SSsum = $dssp->resSecStrSum($residueID);}; >> $AA = $dssp->resAA($residueID); >> $phi = $phi || "n/a"; >> $psi = $psi || "n/a"; >> $SS = $SS || "-"; >> my $SSclass; >> if ($SSsum eq "H") { $SSclass = "helix"; } >> elsif ($SSsum eq "T") { $SSclass = "turn"; } >> elsif ($SSsum eq "B") { $SSclass = "beta"; } >> else { $SSclass = $SSsum; } >> print "$residueID) [$AA] phi:$phi psi:$psi SecStruct: $SS >> ($SSclass) \n"; >> } >> } >> >> >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gnf.org Mon Jan 31 18:34:29 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Mon Jan 31 18:30:52 2005 Subject: [Bioperl-l] Re: [Bioperl-guts-l] [Bug 1742] New: GFF parser messes attributes (fwd) In-Reply-To: References: Message-ID: This is coming from SeqFeature::Annotated, right? What's wrong with making SeqFeature::Annotated return a tag's value as a string as demanded by the contract instead of returning an object? IMNSHO stringification overload band-aids rather than fixes the problem, and also introduces a trip wire that sooner or later will be triggered by someone unsuspecting. I.e., it makes the code more brittle, not more robust. You won't like me for this, but I do think it's the wrong strategy. -hilmar On Jan 31, 2005, at 2:27 PM, Allen Day wrote: > Here's the stringification problem being discussed in another thread. > It > came up in a 1.5 branch bug report. Objections to putting the > stringification overload back? > > -Allen > > ---------- Forwarded message ---------- > Date: Mon, 31 Jan 2005 15:24:19 -0500 > From: bugzilla-daemon@portal.open-bio.org > To: bioperl-guts-l@bioperl.org > Subject: [Bioperl-guts-l] [Bug 1742] New: GFF parser messes attributes > > http://bugzilla.open-bio.org/show_bug.cgi?id=1742 > > Summary: GFF parser messes attributes > Product: Bioperl > Version: 1.5 branch > Platform: Macintosh > OS/Version: MacOS X > Status: NEW > Severity: major > Priority: P2 > Component: Core Components > AssignedTo: bioperl-guts-l@bioperl.org > ReportedBy: jldai@yahoo.com > > > In BioPerl 1.5.0, use Bio::Tools::GFF and Bio::SeqIO to paser GFF > string: > > 8255763 tigrscan final-exon 67 558 56.8 - 2 transgrp "1001"; > > into an Bio::SeqIO object and later print out as embl file, resulting > in: > > FT final-exon complement(67..558) > FT > /transgrp="Bio::Annotation::SimpleValue=HASH(0x93a5d8)" > FT /note="score=56.8" > FT /note="frame=2" > > The value of tag "transgrp" should be 1001. > > Same script worked fine in BioPerl-1.4 > > > > ------- You are receiving this mail because: ------- > You are the assignee for the bug, or are watching the assignee. > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From allenday at ucla.edu Mon Jan 31 19:56:21 2005 From: allenday at ucla.edu (Allen Day) Date: Mon Jan 31 19:52:11 2005 Subject: [Bioperl-l] Re: [Bioperl-guts-l] [Bug 1742] New: GFF parser messes attributes (fwd) In-Reply-To: References: Message-ID: This bug isn't coming from SeqFeature::Annotated, it's from the refactor of SeqFeatureI to inherit from AnnotatableI. Meaning get_tag_values() and similar functions are now get/setting attributes to a Bio::AnnotationColleciton store. Simple strings passed in are turned into objects when added to the store, and given back in object form. It was never specified in Bio::SeqFeatureI, even before refactoring, that the returned values of annotation tags should be strings. Code using the interface just assumed this was the case, and it wasn't a bad assumption given that Bio::SeqFeature::Generic was the only instantiable class and did use strings as values rather than objects. -Allen On Mon, 31 Jan 2005, Hilmar Lapp wrote: > This is coming from SeqFeature::Annotated, right? What's wrong with > making SeqFeature::Annotated return a tag's value as a string as > demanded by the contract instead of returning an object? > > IMNSHO stringification overload band-aids rather than fixes the > problem, and also introduces a trip wire that sooner or later will be > triggered by someone unsuspecting. I.e., it makes the code more > brittle, not more robust. You won't like me for this, but I do think > it's the wrong strategy. > > -hilmar > > On Jan 31, 2005, at 2:27 PM, Allen Day wrote: > > > Here's the stringification problem being discussed in another thread. > > It > > came up in a 1.5 branch bug report. Objections to putting the > > stringification overload back? > > > > -Allen > > > > ---------- Forwarded message ---------- > > Date: Mon, 31 Jan 2005 15:24:19 -0500 > > From: bugzilla-daemon@portal.open-bio.org > > To: bioperl-guts-l@bioperl.org > > Subject: [Bioperl-guts-l] [Bug 1742] New: GFF parser messes attributes > > > > http://bugzilla.open-bio.org/show_bug.cgi?id=1742 > > > > Summary: GFF parser messes attributes > > Product: Bioperl > > Version: 1.5 branch > > Platform: Macintosh > > OS/Version: MacOS X > > Status: NEW > > Severity: major > > Priority: P2 > > Component: Core Components > > AssignedTo: bioperl-guts-l@bioperl.org > > ReportedBy: jldai@yahoo.com > > > > > > In BioPerl 1.5.0, use Bio::Tools::GFF and Bio::SeqIO to paser GFF > > string: > > > > 8255763 tigrscan final-exon 67 558 56.8 - 2 transgrp "1001"; > > > > into an Bio::SeqIO object and later print out as embl file, resulting > > in: > > > > FT final-exon complement(67..558) > > FT > > /transgrp="Bio::Annotation::SimpleValue=HASH(0x93a5d8)" > > FT /note="score=56.8" > > FT /note="frame=2" > > > > The value of tag "transgrp" should be 1001. > > > > Same script worked fine in BioPerl-1.4 > > > > > > > > ------- You are receiving this mail because: ------- > > You are the assignee for the bug, or are watching the assignee. > > _______________________________________________ > > Bioperl-guts-l mailing list > > Bioperl-guts-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l > > > From hlapp at gnf.org Mon Jan 31 20:18:23 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Mon Jan 31 20:14:49 2005 Subject: [Bioperl-l] Re: [Bioperl-guts-l] [Bug 1742] New: GFF parser messes attributes (fwd) In-Reply-To: Message-ID: <2A70A6B0-73EF-11D9-860C-000A959EB4C4@gnf.org> On Monday, January 31, 2005, at 04:56 PM, Allen Day wrote: > This bug isn't coming from SeqFeature::Annotated, it's from the > refactor > of SeqFeatureI to inherit from AnnotatableI. Meaning get_tag_values() > and > similar functions are now get/setting attributes to a > Bio::AnnotationColleciton store. Simple strings passed in are turned > into > objects when added to the store, and given back in object form. > > It was never specified in Bio::SeqFeatureI, even before refactoring, > that > the returned values of annotation tags should be strings. Code using > the > interface just assumed this was the case, and it wasn't a bad > assumption > given that Bio::SeqFeature::Generic was the only instantiable class and > did use strings as values rather than objects. If it wasn't a bad assumption and if everybody made that assumption, what's so great about breaking that? What SeqFeatureI stated was: Title : get_tag_values Usage : @values = $self->get_tag_values('some_tag') Function: Returns : An array comprising the values of the specified tag. Args : a string So you might say the term 'values' does not say it must be a string, yet in the synopsis that's exactly how the method is demonstrated. I think it's fair to say that implicitly by usage pattern the contract has become you have to return a string here, and I think breaking this so as to return objects may be a great idea and a great change but in a bioperl-2.0 only. Otherwise, you demand that all current and future Bio::AnnotationI implementations are properly stringification-overloaded, and that people are perfectly aware that $annvalue and "$annvalue" are two very different things. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 -------------------------------------------------------------