From birney@ebi.ac.uk Fri Dec 1 08:21:06 2000 Date: Fri, 1 Dec 2000 08:21:06 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] 0.7 release: tasks & assignments
On Thu, 30 Nov 2000, Hilmar Lapp wrote: [To second hilmar's comment - yup - this is perhaps *the* most important job for this release/branch cycle and it is great to see you taking up the challenge] > > Ewan probably has made rich experience with questions and problems users > have, as they also held another bioperl-workshop this summer over at > EBI, where expecially documentation issues arose. So, Ewan, what's your > point of view? > Cookbook syle documentation, with explanation seems to be the best way for people to learn the package. Concept documentation (this object inheriets from this and abstracts the idea of a sequence yadadada) seems to be a bad idea. Here is some ideas for the quick "cookbook" scripts to write down (a) calculate average length of sequences in a fasta file (b) convert EMBL format files to GenBank (c) take a file of identifiers, retrieve sequences via genbank (d) take sequence, run blast, put sequence features onto sequnece, dump as genbank (e) take sequence, run HMMER... you get the idea ;) (f) index a fasta file for lookup (g) run gene prediction programs I think just some these written down will really help people get aquainted to the package. The code developers can also ensure that this functionality works smoothly (jason/hilmar - notice the sequence features on blast have to come out in genbank/embl format. Does not work with Bio::Tools::BLAST - probably does with BPLite) > And: folks on the list, this is the time to mail your special mad > documentation encounter in bioperl to Brian. > > Hilmar > > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From birney@ebi.ac.uk Fri Dec 1 08:27:21 2000 Date: Fri, 1 Dec 2000 08:27:21 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Bio::SeqIO::game
On Thu, 30 Nov 2000, Hilmar Lapp wrote: > > Unfortunately not. I also don't know the internals of the perl SAX > interface, but in general SAX was exactly defined for one-pass > chunk-by-chunk stream reading. Does the perl SAX parser not adhere to > this concept, or are there pecularities of the GAME DTD that prohibit > this? It is the GAME DTD which does not provide a "parse unit" tag (I think) and I think this is a clear bug in GAME. As people have noticed, if we dump ensembl in GAME (feasible now) then loading the whole thing up in a non chunk by chunk way will be murder (this is about 10GB of data). I would propose that either <game> </game> becomes an official "parse unit" tag or that the game people figure out another tag that we can chunk on... > > Hilmar > > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From krbou@pgsgent.be Fri Dec 1 09:54:42 2000 Date: Fri, 1 Dec 2000 10:54:42 +0100 From: Kris Boulez krbou@pgsgent.be Subject: [Bioperl-l] 0.7 release: tasks & assignments
Quoting Osborne, Brian (Brian.Osborne@osip.com): > Hilmar et al., > > >>Second, I signed up Brian for creating more and > >>especially better documentation. > > First, I apologize for not responding earlier, like most of you > bioperl is a "labor of love", done outside of the 9 to 5. Regardless, > I accept the job. Though I won't be making the deadline with my own > task list I imagine it will look something like this : > It looks as if the timeslice from 5 to 9 will not be filled by renovating our house anymore. So I plan on spending time on this. I started looking already at the code fragments from the SYNOPSIS to see if these are still correct and can be used in a biostart.pod or cookbook-like type of documentation. Kris, > - Reread and refamiliarize myself with the modules. > - Create prioriotized list of modules which could tell me where > to start. This list will be based on my experience as user, not > programmer. I'm assuming that all modules should be documented, > so the list simply tells me where to start, not what to do or not > to do. Perhaps we'd only want to focus on the very most stable > modules, in addition. > - Identify some small set of modules whose documentation I can emulate > (certainly this list will contain modules whose documentation is a > well balanced mixture of description and example) or offer to others > as examples. > > As I begin to write I'm assuming a few things will happen. One could be > that authors are asked to document their public methods (should private > methods all be annotated too?). Another could be that authors could > be asked to write the "general" documentation of their modules. Certainly > I'll will be asking the group for example code, and looking through > bioperl.org for the same. > > Talk to you soon, > > Brian O. > > > -----Original Message----- > From: Hilmar Lapp > To: Bioperl > Sent: 29/11/00 12:35 > Subject: [Bioperl-l] 0.7 release: tasks & assignments > > As a reminder, the tentative task list for the 0.7 release will be > closed tonight PST, which is early morning tomorrow in Europe. > > I will add one more, fairly small task, namely setting up a template for > good-style module creation, which shall also depict the 'right' way of > object initialization. Right now I'm having problems editing the Wiki > page. > > Hilmar > > -------- Original Message -------- > Subject: [Bioperl-l] 0.7 release: tasks & assignments > Date: Sun, 26 Nov 2000 19:46:43 -0800 > From: Hilmar Lapp <hlapp@gmx.net> > Organization: Nereis 4 > To: Bioperl <bioperl-l@bioperl.org> > > After some days of silence, after curing the head-aches the upgrade of > my system caused me, I suggest that we come to terms regarding the tasks > we consider necessary in order to get the code 0.7 release ready. Since > Santa won't accomplish our tasks, this also means agreeing on assigments > regarding who's going to do what. > > I extended the Wiki pages Jason started, check out > http://www.bioperl.org/wiki/html/BioPerl/BioperlRelease0.7.html. The > task list can be found at a link near the bottom (for the impatient: > http://www.bioperl.org/wiki/html/BioPerl/TaskList.html). Please check > out/review also the programming conventions, I made some additions > there, too. > > Please thoroughly review this list. Since it is Wiki, you can edit it > directly, but you can also post to the list. Feel free to change > priorities, add tasks, add comments, etc. I made initial assignments > based on loose knowledge about who might be interested in what, please > check whether you find or would like to find yourself in a particular > cell. If you're there but don't want to be, please remove yourself. A > first review by myself suggests that I might be taking over too much, so > *please* feel encouraged to add yourself anywhere you can commit to, > regardless of whether there's already someone placed there. > > There are two points I'd like to mention here because they might escape > you otherwise: I signed up Shailesh for testing on Win32, is there > someone out there who would be willing to test on Mac? Second, I signed > up Brian for creating more and especially better documentation. He needs > more hands for that task: *please* volunteer if you can spare some time. > > I would like to get this list finalized by the middle of the week, say > Wednesday night, and by Friday night at the latest on condition that > some unforeseen delays happen (e.g. a shutdown of the internet, > bioperl.org goes on fire, etc). The next three days might be the last > chance to get your wish or concern considered for the 0.7 release, so > please speak out *now* if there's something itching you. > > If you're worrying about tampering with the web-page, I have a backup > copy. Question to our Wiki-guys: is there a backup or any means of > restoring a page in the incident that someone wants to? > > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-lFrom mrp@sanger.ac.uk Fri Dec 1 11:58:26 2000 Date: Fri, 01 Dec 2000 11:58:26 +0000 From: Matthew Pocock mrp@sanger.ac.uk Subject: SUCCESS! Re: [Bioperl-l] wincvs
Hi Mark, I use the same software to connect to biojava from my windows ME box. Great to hear that it works for you. The most important thing on my system was to set the HOME environment variable. SSH gave realy strange errors otherwise. M Mark Wilkinson wrote: > Hi group, > I had some success in the wee hours of this morning in obtaining bioperl-live on > my Windows98 machine at home. I didn't actually use WinCVS as I had attempted > to use that package before with no luck (though the changes I made this morning > might have solved those problems too...??) > > I was using the DOS command-line CVS program (simply called CVS.exe) available > from the CVS website. For the RSH connection I used the Windows SSH package, > which includes a command-line SSH program called SCP2.exe. I then added some > settings in the Autoexec.bat as follows: > > SET HOME=C:\SSH > set CVSROOT=:pserver:markw@bioinfo.pbi.nrc.ca/2022:/devel/cvsroot > set CVS_RSH=c:\ssh\scp2.exe > set PATH=%PATH%;c:\SSH;C:\cvs <- my cvs.exe is in the /CVS folder, my > SCP2.exe is in the /SSH folder > > I was then able to do an anonymous cvs checkout on bioperl-live and bioperl-gui > with no difficulties. I did not check if I was able to login with my "proper" > username and password... > > If any of the Windows testers need help please let me know what I can do (and > how!). > > Cheers all! > > Mark > > -- > --- > Dr. Mark Wilkinson > Bioinformatics Group > National Research Council of Canada > Plant Biotechnology Institute > 110 Gymnasium Place > Saskatoon, SK > Canada > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-lFrom Shailesh L Mistry"
> It is the GAME DTD which does not provide a "parse > unit" tag (I think) and > I think this is a clear bug in GAME. As people have > noticed, if we dump > ensembl in GAME (feasible now) then loading the > whole thing up in a non > chunk by chunk way will be murder (this is about > 10GB of data). > > I would propose that either <game> </game> becomes > an official "parse > unit" tag or that the game people figure out another > tag that we can chunk > on... > Hmm... This may be more difficult than it sounds. The problem is that features may produce their own sequences. So if you have genomic sequence, it can have exon features that point to mRNA sequences and aa sequences. So one <seq_chunk/>, which has a top level genomic sequence can have subsequences. At BDGP we export the XML into files representing one genbank accession unit of around 300kb. I haven't done performance testing with the sax parser, but it only takes a few seconds (maybe 3-5) to do one of these chunks with annotations. I suppose it would be pretty inefficient if we exported the whole database into one xml file, but I don't think it would be terrible, either. Still, we could add a <parse_unit/> tag which could break it down by top level sequences. Each <parse_unit/> could be fed into memory as a string and the sax parser could have at them that way. Does this sound like a good solution to everybody? Thanks for the feed back, guys. Brad __________________________________________________ Do You Yahoo!? Yahoo! Shopping - Thousands of Stores. Millions of Products. http://shopping.yahoo.com/From mwilkinson@gene.pbi.nrc.ca Fri Dec 1 21:50:36 2000 Date: Fri, 01 Dec 2000 15:50:36 -0600 From: Mark Wilkinson mwilkinson@gene.pbi.nrc.ca Subject: SUCCESS! Re: [Bioperl-l] wincvs
Matthew Pocock wrote: > I use the same software to connect to biojava from my windows ME box. Great to hear > that it works for you. The most important thing on my system was to set the HOME > environment variable. SSH gave realy strange errors otherwise. strangely, I have just discovered that these settings only work for the *anonymous* CVS under windows, but not for my "real" login! I get all sorts of crazy RSH errors such as "bio.perl.org/-l directory doesn't exist"... very strange. I just double-checked my UN/PW in Linux and it works properly there, but it doesn't work at all from Windows, even though the anonymous checkout works just fine (I just tested it again...) ??? is there a fundamental difference between -d :pserver: and -d :ext: that I need to understand first? any advice appreciated, Mark -- --- Dr. Mark Wilkinson Bioinformatics Group National Research Council of Canada Plant Biotechnology Institute 110 Gymnasium Place Saskatoon, SK CanadaFrom jason@chg.mc.duke.edu Fri Dec 1 21:52:58 2000 Date: Fri, 1 Dec 2000 16:52:58 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] Bio::DB::WebDB
I'm playing with it some, still not sure how to handle the whole stream issue appropriately with LWP::UserAgent, but I think I'm planning on making it general enough to work with [most] any WebDB interface that returns sequence data. I'll try and work on a proposed set of methods over the weekend so we can discuss early next week. Something we might consider adding to this bioperl release, NCBI had XML as a retrieval option to entrez so we should consider supporting this. This is using their DTD - NCBI_Seqset.dtd. I'm GAME for helping out on this front at the same time. -Jason Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.mc.duke.edu/From bradmars@yahoo.com Fri Dec 1 22:26:15 2000 Date: Fri, 1 Dec 2000 14:26:15 -0800 (PST) From: Bradley Marshall bradmars@yahoo.com Subject: [Bioperl-l] Bio::SeqIO::game
How about this as a solution? We'll add a top level attribute and/or tag describing whether or not the document is "chunkable". Chris suggested we have a top level <flavor> element. This can specify whether or not the document is chunkable. A chunkable document would have this structure: <game> <flavor>chunkable</flavor> <seq1/> < all features pertaining to seq1 /> <seq2/> < all features pertaining to seq2 /> <seq3/> < all features pertaining to seq3 /> </game> If a document is chunkable, we will read into memory a string from the first <seq> to the next <seq> and parse that in one pass when next_seq is called. Then we'll move on to the next chunk. If the document is not chunkable, we'll continue to parse it as we have been. This allows us to keep GAME flexible and yet still be useful in the SeqIO system. Brad __________________________________________________ Do You Yahoo!? Yahoo! Shopping - Thousands of Stores. Millions of Products. http://shopping.yahoo.com/From mwilkinson@gene.pbi.nrc.ca Fri Dec 1 22:40:00 2000 Date: Fri, 01 Dec 2000 16:40:00 -0600 From: Mark Wilkinson mwilkinson@gene.pbi.nrc.ca Subject: [Bioperl-l] somethin' strange about RootI?
Hi Group, Is there something about the RootI that has changed to prevent these lines from working properly? (This routine was working until I cvs-updated my bioperl "live" folder this afternoon...) my $SelectedSeq = Bio::PrimarySeq->new(-seq => (join '',@SelectedSeqs), -moltype => 'dna'); my $SelectedTrans = $SelectedSeq->translate()->seq; I get the following error (in both Windows and Linux): -------------------- EXCEPTION -------------------- MSG: CONTEXT: Error in uNKNOWN CONTEXT SCRIPT: Workbench2.pl STACK: Bio::Root::RootI::new(82) Bio::PrimarySeqI::translate(602) QueryScreen::_checkFrame(286) QueryScreen::__ANON__(242) Tk::MainLoop(329) QueryScreen::new(581) main::Workbench2.pl(18) ---------------------------------------------------From lapp@gnf.org Fri Dec 1 23:17:16 2000 Date: Fri, 01 Dec 2000 15:17:16 -0800 From: Hilmar Lapp lapp@gnf.org Subject: [Bioperl-l] somethin' strange about RootI?
Mark Wilkinson wrote: > > Hi Group, > > Is there something about the RootI that has changed to prevent these > lines from > working properly? (This routine was working until I cvs-updated my > bioperl "live" folder this afternoon...) > > my $SelectedSeq = Bio::PrimarySeq->new(-seq => (join '',@SelectedSeqs), > -moltype => 'dna'); > my $SelectedTrans = $SelectedSeq->translate()->seq; > > I get the following error (in both Windows and Linux): > > -------------------- EXCEPTION -------------------- > MSG: > CONTEXT: Error in uNKNOWN CONTEXT > SCRIPT: Workbench2.pl > STACK: > Bio::Root::RootI::new(82) Hm. The only thing I can say right now is that RootI::new() actually shouldn't be called, because it is meant to be absent (and eventually it will be or throw an exception). If this is the reason, PrimarySeqI::translate seems to create an object from an interface or a class that does not implement new() itself (which now every class is supposed to do), or chains back to the inherited (unlikely). As a general comment to the list, it is probably not a good idea to check out the main trunk for those people who aren't involved in the RootI-transition until we've settled this. It's a really a development-only trunk at the moment. If you're eager to get the newest version for the latest features, you better wait 1-2 weeks or maybe even a bit more. Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 -------------------------------------------------------------From birney@ebi.ac.uk Sat Dec 2 11:54:54 2000 Date: Sat, 2 Dec 2000 11:54:54 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Bio::SeqIO::game
On Fri, 1 Dec 2000, Bradley Marshall wrote: > > Still, we could add a <parse_unit/> tag which could > break it down by top level sequences. Each > <parse_unit/> could be fed into memory as a string and > the sax parser could have at them that way. Does this > sound like a good solution to everybody? > This sounds good/ideal to me. > Thanks for the feed back, guys. > > Brad > > > __________________________________________________ > Do You Yahoo!? > Yahoo! Shopping - Thousands of Stores. Millions of Products. > http://shopping.yahoo.com/ > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From birney@ebi.ac.uk Sat Dec 2 12:00:02 2000 Date: Sat, 2 Dec 2000 12:00:02 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Bio::SeqIO::game
On Fri, 1 Dec 2000, Bradley Marshall wrote: > > How about this as a solution? > > We'll add a top level attribute and/or tag describing > whether or not the document is "chunkable". Chris > suggested we have a top level <flavor> element. This > can specify whether or not the document is chunkable. > A chunkable document would have this structure: > ;). I think all useful documents will be chunkable. I'd claim that were just letting ourselves into trouble if we allow badly compacted XML to be "valid" This solution is ok, but I would argue that it is better to be strict about these things otherwise implementations either will have to throw exceptions on non chunkable documents or have other poorly defined criteria.... > <game> > <flavor>chunkable</flavor> > <seq1/> > < all features pertaining to seq1 /> > <seq2/> > < all features pertaining to seq2 /> > <seq3/> > < all features pertaining to seq3 /> > </game> > > If a document is chunkable, we will read into memory a > string from the first <seq> to the next <seq> and > parse that in one pass when next_seq is called. Then > we'll move on to the next chunk. > > If the document is not chunkable, we'll continue to > parse it as we have been. This allows us to keep GAME > flexible and yet still be useful in the SeqIO system. > > Brad > > > > > __________________________________________________ > Do You Yahoo!? > Yahoo! Shopping - Thousands of Stores. Millions of Products. > http://shopping.yahoo.com/ > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From jason@chg.mc.duke.edu Sat Dec 2 15:14:26 2000 Date: Sat, 2 Dec 2000 10:14:26 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] somethin' strange about RootI
There are some funky things going on with translate and the can_call_new routine I believe. I have not been able to pinpoint what specifically is happening but I have seen this on a UNIX box as well. I'll make sure some variant of the below code makes it into the tests. The next few weeks the bioperl code is going to be in transition and we're going to try really hard to check in code that works, and pass all the tests, but bugs happen. Volunteers for more tests writing are definitely welcome if you don't want to get into the code rewrite fray. -Jason On Fri, 1 Dec 2000, Mark Wilkinson wrote: > > Hi Group, > > Is there something about the RootI that has changed to prevent these > lines from > working properly? (This routine was working until I cvs-updated my > bioperl "live" folder this afternoon...) > > my $SelectedSeq = Bio::PrimarySeq->new(-seq => (join '',@SelectedSeqs), > -moltype => 'dna'); > my $SelectedTrans = $SelectedSeq->translate()->seq; > > I get the following error (in both Windows and Linux): > > > -------------------- EXCEPTION -------------------- > MSG: > CONTEXT: Error in uNKNOWN CONTEXT > SCRIPT: Workbench2.pl > STACK: > Bio::Root::RootI::new(82) > Bio::PrimarySeqI::translate(602) > QueryScreen::_checkFrame(286) > QueryScreen::__ANON__(242) > Tk::MainLoop(329) > QueryScreen::new(581) > main::Workbench2.pl(18) > --------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.mc.duke.edu/From birney@ebi.ac.uk Sun Dec 3 13:12:10 2000 Date: Sun, 3 Dec 2000 13:12:10 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Gene Interface?
[apologies for the cross post. I can't see a better way to do this] For the mini-update to the ensembl web site to address a number of small issues, it is becoming clear that we need a more "delayed get" approach to genes, such that getting gene-id and genedblinks (say) does not require a full trawl across the exon/exontranscript etc table (this makes sense to ensembl-dev people, will confuse bioperl people) For the current web site, I will put in some sneaky calls on GeneObj that work off geneid. Not nice. Long term I am pretty sure we need to have a full blown interface definition of gene and then have (potentially a number) of implementations behind it, one being the current in-memory implementation. At the same time, we might as well synchronise with the upcoming bioperl-0.7 genestructure interface as well. This is why I have cc'd in bioperl for this. Hilmar in particular I know has views here and I want us to get to a sensible solution. So - I guess I am trying to open discussion of a gene interface, which probably will have to be cross-posted between ensembl-dev and bioperl-l (? do you agree hilmar?). Apologies for people who will get two copies of the email... In addition for ensembl, I guess there is a question about whether we aim for this being before or after branching the main trunk. I suspect if there is a large number of changes it has to be after branching <sigh>. Let's map out some clear use cases for the generic gene interface: - should be able to store transcript information (one gene has multiple transcripts) - easy to get protein and cDNA sequences - should be able to store exons as seqfeatures ? should have slots for DBLinks/annotation (or do we want a higher collection interface for this? If so, how structured?) - should not mandate an in memory implementation Here are some issues that I think could be difficult to reconcile between bioperl and ensembl views: - Ensembl genes and transcripts are NOT seqfeatures. The placement of an ensembl gene on a single coordinate system is held in something called "VirtualGene" (not a great name. It is a gene on a virtualcontig). Ensembl has a big win by allowing a gene to be built "across" coordinate systems, allowing the coordinate system to be by-and-large decoupled from the gene structure. Some "magic" is used for the places where the gene structure is highly dependent on the assembly. (NB - the ensembl gene reminiscent of EMBL/GenBank 'exploded' seqfeatures, where it goes join(AL000012:122-132,AC000002:1000023-1000015).) - Ensembl makes a distinction between alternative transcripts and alternative translations (two alternative transcripts can have the same translation). This makes the objects one step more complex - Ensembl wants to keep track of DBLinks etc close to the gene Here are some issues which should be a simple matter of providing extension of a core bioperl interface for ensembl - Ensembl can cope with exons that do not splice correctly (due to missing intervenning genomic sequence) (needs phase information bound to the exon for ensembl) - Ensembl needs ensembl identifiers on all objects. So - I suspect Michele, Arne, Richard and Hilmar (along with other people) have views on this. Let's kick around some ideas and then see whether we can get to a strong definition for bioperl which we can extend in ensembl where necessary. ewan ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From bradmars@yahoo.com Sun Dec 3 19:52:41 2000 Date: Sun, 3 Dec 2000 11:52:41 -0800 (PST) From: Bradley Marshall bradmars@yahoo.com Subject: [Bioperl-l] Bio::SeqIO::game
--- Ewan Birney <birney@ebi.ac.uk> wrote: > On Fri, 1 Dec 2000, Bradley Marshall wrote: > > > > > How about this as a solution? > > > > We'll add a top level attribute and/or tag > describing > > whether or not the document is "chunkable". Chris > > suggested we have a top level <flavor> element. > This > > can specify whether or not the document is > chunkable. > > A chunkable document would have this structure: > > > > ;). > > I think all useful documents will be chunkable. I agree that this is the case for large data transfer jobs like you're talking about. A question we have is whether or not you're planning on transfering only genomic seqs w/ features or if you're doing mixed files - with genomic seqs' features forming mRNA and AA sequences. It is this second case in which keeping things "chunkable" becomes difficult. But this flexibilty is also a major advantage of the GAME format. And even if a document is NOT chunkable, parsing performance is pretty gude for non-huge documents. We still need to deal with the file-handle issue.... Brad > I'd > claim that were just > letting ourselves into trouble if we allow badly > compacted XML to be > "valid" > > This solution is ok, but I would argue that it is > better to be strict > about these things otherwise implementations either > will have to throw > exceptions on non chunkable documents or have other > poorly defined > criteria.... > > > > > <game> > > <flavor>chunkable</flavor> > > <seq1/> > > < all features pertaining to seq1 /> > > <seq2/> > > < all features pertaining to seq2 /> > > <seq3/> > > < all features pertaining to seq3 /> > > </game> > > > > If a document is chunkable, we will read into > memory a > > string from the first <seq> to the next <seq> and > > parse that in one pass when next_seq is called. > Then > > we'll move on to the next chunk. > > > > If the document is not chunkable, we'll continue > to > > parse it as we have been. This allows us to keep > GAME > > flexible and yet still be useful in the SeqIO > system. > > > > Brad > > > > > > > > > > __________________________________________________ > > Do You Yahoo!? > > Yahoo! Shopping - Thousands of Stores. Millions of > Products. > > http://shopping.yahoo.com/ > > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 > 1223 494420 > <birney@ebi.ac.uk>. > ----------------------------------------------------------------- > __________________________________________________ Do You Yahoo!? Yahoo! Shopping - Thousands of Stores. Millions of Products. http://shopping.yahoo.com/From dag@sonsorol.org Sun Dec 3 23:08:18 2000 Date: Sun, 03 Dec 2000 18:08:18 -0500 From: chris dagdigian dag@sonsorol.org Subject: [Bioperl-l] bioperl/bioxml/biojava now have web browsable CVS access
Hi folks, After waiting far too long after Ann Loraine's gentle suggestion for web browsable CVS access :) I have finally sat down and installed the python-based 'viewcvs' CGI's on our anonymous CVS server. The anonymous CVS box is separate from our production server due mostly to the fact that for security reasons I don't want to enable anonymous :pserver: access to our main system. The repositories on the anon server are updated hourly via rsync. URLs for web access to our source code repositories Bioperl - http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/?cvsroot=bioperl Biojava - http://cvs.biojava.org/cgi-bin/viewcvs/viewcvs.cgi/?cvsroot=biojava BioXML - http://cvs.bioxml.org/cgi-bin/viewcvs/viewcvs.cgi/?cvsroot=bioxml URLs for anonymous checkout (via pserver) of our source code repositories Bioperl - http://cvs.bioperl.org Biojava - http://cvs.biojava.org BioXML - http://cvs.bioxml.org Please bang on this system if you have a chance and email me with any comments or problems. Thanks again to Ann for pointing me to 'viewcvs' which seems much better than 'cvsweb'. Regards, Chris'From hlapp@gmx.net Mon Dec 4 08:26:36 2000 Date: Mon, 04 Dec 2000 00:26:36 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Gene Interface?
Ewan Birney wrote: > > So - I guess I am trying to open discussion of a gene interface, which > probably will have to be cross-posted between ensembl-dev and bioperl-l > (? do you agree hilmar?). Sure I do. A primary question triggered by the bioperl SeqFeature::GeneStructure class where a Gene and the properties specific to a gene will end up. I made basically two suggestions at that time: have Gene as its own module, utilizing an associated GeneStructure and Bio::Seq, and of course adding things like transcript(s) etc. The second was to simply extend GeneStructure by what was missing with respect to the notion of a gene, which came down to be transcripts. We came to some agreement that extending GeneStructure would be the way to go. As nothing has been done (code-wise) in this direction so far, this issue has not matured since. So, comments are very welcome, and maybe people have a third (or 4th ...) way of doing it. > > Let's map out some clear use cases for the generic gene interface: > > - should be able to store transcript information > (one gene has multiple transcripts) See above. This can be achieved either way quite simply. The only question is how to model a transcript: simply an array of the right exons in the right order, or a module inheriting off SeqFeatureI with the right exons as subfeatures, or as a sequence, or something different. Then there is also a predicted transcript for gene structures arising from gene structure predictions. Do we want to treat this separately (which is done now), or is it essentially a transcript like any other. > - easy to get protein and cDNA sequences The only thing missing here right now is annotating every exon with its frame. This will be fixed. > - should be able to store exons as seqfeatures > ? should have slots for DBLinks/annotation (or do we want a higher > collection interface for this? If so, how structured?) I'd have derived classes implementing such capabilities. > - should not mandate an in memory implementation Hmm. Which part and why? The reason I can see is using up too much memory, which as far as I can imagine could almost only be caused by an attached sequence being too big. So, it's only the sequence object that should be able to swap itself to e.g. disk. I'm not sure whether I'm missing something. > > Here are some issues that I think could be difficult to reconcile between > bioperl and ensembl views: > > - Ensembl genes and transcripts are NOT seqfeatures. The placement of > an ensembl gene on a single coordinate system is held in something called > "VirtualGene" (not a great name. It is a gene on a virtualcontig). Ensembl > has a big win by allowing a gene to be built "across" coordinate systems, > allowing the coordinate system to be by-and-large decoupled from the gene > structure. Some "magic" is used for the places where the gene structure is > highly dependent on the assembly. > Hmm. I guess they will stay features in bioperl. The question is then whether this is prohibitive for the respective bioperl objects being used in ensembl, and what we can/should do about it. Not sure what's involved in all this, and looking forward to comments from the ensembl guys therefore. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From gert.thijs@esat.kuleuven.ac.be Mon Dec 4 15:16:11 2000 Date: Mon, 04 Dec 2000 16:16:11 +0100 From: gert thijs gert.thijs@esat.kuleuven.ac.be Subject: [Bioperl-l] Fuzzy location parsing
Has anyone done fuzzy location parsing in Genbank entries? A fuzzy location looks like: join(M97814.1:1176..1245,178..356,572..688,1153..1203) Gert -- ========================================================== + Gert Thijs gert.thijs@esat.kuleuven.ac.be + + + + Dept. Elektrotechniek ESAT-SISTA + + Kardinaal Mercierlaan, 94 + + B-3001 HEVERLEE Belgium + + Tel : +32-16-32 18 84 ---- Fax : +32-16-32 19 70 + ==========================================================From mrp@sanger.ac.uk Mon Dec 4 14:58:59 2000 Date: Mon, 04 Dec 2000 14:58:59 +0000 From: Matthew Pocock mrp@sanger.ac.uk Subject: SUCCESS! Re: [Bioperl-l] wincvs
Mark Wilkinson wrote: > Matthew Pocock wrote: > > > I use the same software to connect to biojava from my windows ME box. Great to hear > > that it works for you. The most important thing on my system was to set the HOME > > environment variable. SSH gave realy strange errors otherwise. > > strangely, I have just discovered that these settings only work for the *anonymous* > CVS under windows, but not for my "real" login! I get all sorts of crazy RSH errors > such as "bio.perl.org/-l directory doesn't exist"... very strange. I just > double-checked my UN/PW in Linux and it works properly there, but it doesn't work at > all from Windows, even though the anonymous checkout works just fine (I just tested it > again...) > > ??? is there a fundamental difference between -d :pserver: and -d :ext: that I need > to understand first? > I use it to talk to the BioJava read-write repository using -d :ext - I have to be very careful to explicitly set CVS_RSH to 'ssh', otherwise it complains bitterly. Does that help at all? M > > any advice appreciated, > > Mark > > -- > --- > Dr. Mark Wilkinson > Bioinformatics Group > National Research Council of Canada > Plant Biotechnology Institute > 110 Gymnasium Place > Saskatoon, SK > Canada > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-lFrom mrp@sanger.ac.uk Mon Dec 4 15:02:22 2000 Date: Mon, 04 Dec 2000 15:02:22 +0000 From: Matthew Pocock mrp@sanger.ac.uk Subject: [Bioperl-l] Bio::SeqIO::game
Hi. At the risk of getting flamed, are GAME files in general a serialized form of a sequence database rather than a serialized form of a sequence stream? Mabey we can get arround some of these issues by having a sequence db builder using GAME. It doesn't get you over the issues with memory usage, but at least the semantics become clear. Matthew Bradley Marshall wrote: > --- Ewan Birney <birney@ebi.ac.uk> wrote: > > On Fri, 1 Dec 2000, Bradley Marshall wrote: > > > > > > > > How about this as a solution? > > > > > > We'll add a top level attribute and/or tag > > describing > > > whether or not the document is "chunkable". Chris > > > suggested we have a top level <flavor> element. > > This > > > can specify whether or not the document is > > chunkable. > > > A chunkable document would have this structure: > > > > > > > ;). > > > > I think all useful documents will be chunkable. > > I agree that this is the case for large data transfer > jobs like you're talking about. A question we have is > whether or not you're planning on transfering only > genomic seqs w/ features or if you're doing mixed > files - with genomic seqs' features forming mRNA and > AA sequences. It is this second case in which keeping > things "chunkable" becomes difficult. > > But this flexibilty is also a major advantage of the > GAME format. And even if a document is NOT chunkable, > parsing performance is pretty gude for non-huge > documents. We still need to deal with the file-handle > issue.... > > Brad > > > I'd > > claim that were just > > letting ourselves into trouble if we allow badly > > compacted XML to be > > "valid" > > > > This solution is ok, but I would argue that it is > > better to be strict > > about these things otherwise implementations either > > will have to throw > > exceptions on non chunkable documents or have other > > poorly defined > > criteria....From elia@ebi.ac.uk Mon Dec 4 13:20:11 2000 Date: Mon, 4 Dec 2000 13:20:11 +0000 (GMT) From: Elia Stupka elia@ebi.ac.uk Subject: [Bioperl-l] Re: BPLite
> Since we're using bioperl-06 and Lorenz's BPLite module isn't in that, > I had to copy it (or at least an old version that Matloob had been > using) > into somewhere that is seen by $PERL5LIB - for me that's > ~/perllib/BPLite.pm > > Then all was OK ... Yes, but that makes the tests fail for anybody else who does not know of this trick, and kills portability, am I right? I can: 1)cvs add BPlite, and all its modules to the branch-06 bioperl (I have cc:ed this to bioperl to see if there is any reason not to) 2)modify the Pipeline Blast.pm module accordingly. Then all you would need to do is cvs update the bioperl branch and ensembl pipeline at the same time, how does that sound? Elia ************************** tel: +44 1223 49 44 31 mobile: +44 7971 59 03 69 fax: +44 1223 49 44 68 **************************From birney@ebi.ac.uk Mon Dec 4 14:20:41 2000 Date: Mon, 4 Dec 2000 14:20:41 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Re: BPLite
On Mon, 4 Dec 2000, Elia Stupka wrote: > > Since we're using bioperl-06 and Lorenz's BPLite module isn't in that, > > I had to copy it (or at least an old version that Matloob had been > > using) > > into somewhere that is seen by $PERL5LIB - for me that's > > ~/perllib/BPLite.pm > > > > Then all was OK ... > > Yes, but that makes the tests fail for anybody else who does not know of > this trick, and kills portability, am I right? > > I can: > 1)cvs add BPlite, and all its modules to the branch-06 bioperl (I have > cc:ed this to bioperl to see if there is any reason not to) No. Don't do that. Consider BPLite to be an external dependancy (like the blast executable) for the Ensembl runnable system... > > 2)modify the Pipeline Blast.pm module accordingly. > > Then all you would need to do is cvs update the bioperl branch and ensembl > pipeline at the same time, how does that sound? > > Elia > > > > ************************** > tel: +44 1223 49 44 31 > mobile: +44 7971 59 03 69 > fax: +44 1223 49 44 68 > ************************** > > > > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From jason@chg.mc.duke.edu Mon Dec 4 16:32:10 2000 Date: Mon, 4 Dec 2000 11:32:10 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] proposal Bio::SeqIO::largefasta
I have written a simple SeqIO adaptor for reading in a large sequence in fasta format (ie Jim Kent's golden path representation of a chromosome). This uses Ewan's more memory efficient Bio::Seq::LargePrimarySeq implementation for large > 100 MB seqs. Assumingly those using this module had their own ways of pushing sequence into this object, hopefully this implementation will be useful. Before I check it in, this okay to commit as part of 0.7? -Jason Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.mc.duke.edu/From hlapp@gmx.net Mon Dec 4 17:45:36 2000 Date: Mon, 04 Dec 2000 09:45:36 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Fuzzy location parsing
gert thijs wrote: > > Has anyone done fuzzy location parsing in Genbank entries? > A fuzzy location looks like: > join(M97814.1:1176..1245,178..356,572..688,1153..1203) > Some have probably done, but it's not supported yet through bioperl genbank.pm (and neither embl.pm). According to the finalized tasklist it will be included in the 0.7 release. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From hlapp@gmx.net Mon Dec 4 17:49:18 2000 Date: Mon, 04 Dec 2000 09:49:18 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] proposal Bio::SeqIO::largefasta
Jason Stajich wrote: > > I have written a simple SeqIO adaptor for reading in a large sequence > in fasta format (ie Jim Kent's golden path representation of a > chromosome). This uses Ewan's more memory efficient > Bio::Seq::LargePrimarySeq implementation for large > 100 MB seqs. > Assumingly those using this module had their own ways of pushing sequence > into this object, hopefully this implementation will be useful. > > Before I check it in, this okay to commit as part of 0.7? > Sure. I know you'll add a test script for it (code both new and untested is probably inappropriate for the new release). Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From jason@chg.mc.duke.edu Mon Dec 4 20:33:10 2000 Date: Mon, 4 Dec 2000 15:33:10 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] Bio::PrimarySeq _guess_type
This is mostly for Ewan - In the _guess_type method where one is trying to guess whether it is RNA, DNA, or PROTEIN sequence - one calculates percentage of ACGT and ACGT+U and compares to see if they are > 85% The line to remove all U's from the seq is written as $str2 =~ s/Uu//g; Should it not be written as $str2 =~ s/[Uu]//g; I can fix it, just wanted to be sure. -Jason Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.mc.duke.edu/From jason@chg.mc.duke.edu Mon Dec 4 21:03:23 2000 Date: Mon, 4 Dec 2000 16:03:23 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] added Bio::SeqIO::largefasta
I have added support for reading in a large fasta file and making it a Bio::Seq::LargePrimarySeq. Some more testing and debugging will need to be done to insure all the weird fasta cases are handled since I cannot use the same patterns as are possible in the fasta.pm module since I can only read in one line at a time in order to meet our not holding the sequence in memory requirements. Please note that currently next_seq will return a PrimarySeq until I decide if we can have or need a LargeSeq class or just a wrapper as well. Also the Bio::Seq::LargePrimarySeq implementation means that it will make a copy of the fasta file to your tmpdir (as defined by File::Spec->tmpdir) which if overly large could make your machine very unhappy as it could run out of swap space. You can override the location of the tmp file by setting $Bio::Seq::LargePrimarySeq::DEFAULT_TEMP_DIR = 'somedir' BEFORE you instantiate a new LargePrimarySeq object. The test, largefasta.t has been added as well and some additional routines were added LargePrimarySeq to bring it up to PrimarySeqI spec. Some likely uses, at least from my perspective, is the ability to read in a large sequence file and chop it into smaller managable chunks for some specific tasks. This will likely not be on the 0.7 branch as it is new code so we'll have to omit it from the branch. Suggestions and Comments are always appreciated. -Jason Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.mc.duke.edu/From mwilkinson@gene.pbi.nrc.ca Mon Dec 4 21:09:22 2000 Date: Mon, 04 Dec 2000 15:09:22 -0600 From: Mark Wilkinson mwilkinson@gene.pbi.nrc.ca Subject: SUCCESS! Re: [Bioperl-l] wincvs
Matthew Pocock wrote: > > I use it to talk to the BioJava read-write repository using -d :ext - I have to be very > careful to explicitly set CVS_RSH to 'ssh', otherwise it complains bitterly. Does that > help at all? but... what if my CVS_RSH is *not* ssh? i.e. that is not the name of the Windows program that accepts command-line SSH "stuff". the program is called scp2.exe, so if I explicitly set CVS_RSH to ssh it will never find this program... and that still doesn't really explain (to my wee brain) what is the fundamental difference is between an anonymous CVS and a logged-in CVS, and why one would work but the other not... ?? M -- --- Dr. Mark Wilkinson Bioinformatics Group National Research Council of Canada Plant Biotechnology Institute 110 Gymnasium Place Saskatoon, SK CanadaFrom hlapp@gmx.net Tue Dec 5 07:37:02 2000 Date: Mon, 04 Dec 2000 23:37:02 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] added Bio::SeqIO::largefasta
Jason Stajich wrote: > will make a copy of the fasta file to your tmpdir (as defined by > File::Spec->tmpdir) which if overly large could make your machine very Hm. I recall someone claiming that tmpdir() is in that module or its submodules even though it's not really documented. However, 'grep -i tmp' on these .pm files doesn't reveal the hidden place ... use File::Spec; print File::Spec->tmpdir, "\n"; gives Can't locate object method "tmpdir" via package "File::Spec" at - line 2. I'm running Perl 5.005_03; do I have to upgrade File::Spec from CPAN to a newer version, or where the hell is tmpdir()? Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From birney@ebi.ac.uk Tue Dec 5 10:00:29 2000 Date: Tue, 5 Dec 2000 10:00:29 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Bio::PrimarySeq _guess_type
On Mon, 4 Dec 2000, Jason Stajich wrote: > This is mostly for Ewan - > > In the _guess_type method where one is trying to guess whether it > is RNA, DNA, or PROTEIN sequence - one calculates percentage > of ACGT and ACGT+U and compares to see if they are > 85% > > The line to remove all U's from the seq is written as > > $str2 =~ s/Uu//g; > > Should it not be written as > > $str2 =~ s/[Uu]//g; > > I can fix it, just wanted to be sure. <embarassed> yes. did you have to put this one out the general mailing list? ;)From birney@ebi.ac.uk Tue Dec 5 10:22:37 2000 Date: Tue, 5 Dec 2000 10:22:37 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] added Bio::SeqIO::largefasta
On Mon, 4 Dec 2000, Jason Stajich wrote: > I have added support for reading in a large fasta file and making it a > Bio::Seq::LargePrimarySeq. Some more testing and debugging will > need to be done to insure all the weird fasta cases are handled > since I cannot use the same patterns as are possible in the fasta.pm > module since I can only read in one line at a time in order to meet > our not holding the sequence in memory requirements. Right. > > Please note that currently next_seq will return a PrimarySeq > until I decide if we can have or need a LargeSeq class or just a wrapper > as well. Also the Bio::Seq::LargePrimarySeq implementation means that it > will make a copy of the fasta file to your tmpdir (as defined by > File::Spec->tmpdir) which if overly large could make your machine very > unhappy as it could run out of swap space. You can override the location > of the tmp file by setting > $Bio::Seq::LargePrimarySeq::DEFAULT_TEMP_DIR = 'somedir' > BEFORE you instantiate a new LargePrimarySeq object. I am with hilmar that this should return a Seq object which has-a Bio::Seq::LargePrimarySeq. > > The test, largefasta.t has been added as well and some additional routines > were added LargePrimarySeq to bring it up to PrimarySeqI spec. > > Some likely uses, at least from my perspective, is the ability to read in > a large sequence file and chop it into smaller managable chunks for some > specific tasks. > Also for adding features put a massive coordinate scale (perhaps produced by some database group somewhere...) and then dumping out the sequence associated with that efficiently BTW - so that people know, LargePrimarySeq relies on the fact that people use the $seq->subseq(1000,1100); methods to get out regions, not substr($seq->seq,1000,100); > This will likely not be on the 0.7 branch as it is new code so we'll have > to omit it from the branch. > I, personally, think this is fine on the branch, but Hilmar is branch king, so he has the final say ... I don't think this is going to break anything. > Suggestions and Comments are always appreciated. > > -Jason > > Jason Stajich > jason@chg.mc.duke.edu > Center for Human Genetics > Duke University Medical Center > http://www.chg.mc.duke.edu/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From jason@chg.mc.duke.edu Tue Dec 5 13:54:47 2000 Date: Tue, 5 Dec 2000 08:54:47 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] added Bio::SeqIO::largefasta
On Mon, 4 Dec 2000, Hilmar Lapp wrote: > Jason Stajich wrote: > > will make a copy of the fasta file to your tmpdir (as defined by > > File::Spec->tmpdir) which if overly large could make your machine very > > Hm. I recall someone claiming that tmpdir() is in that module or its > submodules even though it's not really documented. However, 'grep -i > tmp' on these .pm files doesn't reveal the hidden place ... > > use File::Spec; > print File::Spec->tmpdir, "\n"; > > gives > > Can't locate object method "tmpdir" via package "File::Spec" at - line > 2. > > I'm running Perl 5.005_03; do I have to upgrade File::Spec from CPAN to > a newer version, or where the hell is tmpdir()? Okay, it is located in File::Spec::Unix, File::Spec::Win32, File::Spec::MacOS even though it is not documented in File::Spec, go figure. But I am running perl 5.6 -- that is really weird that it is not present in 5.005_03 I am pretty sure when I was running that version I was able to use the same call.... But if you are getting errors, I'll have to setup a fallback in an eval block. Will do today. > > Hilmar > > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.mc.duke.edu/From M.W.E.J.Fiers@plant.wag-ur.nl Tue Dec 5 17:01:40 2000 Date: Tue, 05 Dec 2000 18:01:40 +0100 From: Fiers, M.W.E.J. M.W.E.J.Fiers@plant.wag-ur.nl Subject: [Bioperl-l] computation object
Hi I'm rather new to this, so if I say strange things, or seem to behave in a improper way, please let me know. I like the bioperl/bioxml iniative a lot and hope I can contribute. My goal is to us bioperl/bioxml in a bigger database system to communicate with a diverse set of tools. I've discussed this with Bradley Marshal, and we seem to agree that although the seqFeature object can store most of the data needed, it would be nice to extend it so that it is more suited to holding the results of a computation and make import and export (to game) easier. The new computation object would contain * computation_id (for game) * A way of storing several set's of sub_seqfeature's, much like the current 'tags' system, but returning an array of seqfeatures. This is to make it easier to parse and separate subsets of sub_seqfeature's. The sub_seqfeature method could stay intact and just return all sets. Advantages to this structure would be that if somebody inherits from this object and stores seqfeature's of children of seqfeature's in this structure, it would still be parsable without the parsing having to know exactly what subset's are there. * A set of specific tags more geared toward a computation - computation_date - program_name, program_version, program_date, program_url - database_name, database_version, database_date, database_url * score related data this would look like the tags structure, but would be dedicated to storing score's. It might be a good idea to have a small score object which can also store the range of value's which the score can have, but that might be over the top. I am aware that this is slightly in conflict with the way the genscan module works now with the Gene object, but I see an advantage to a general way of handling data like this. If we choose to take this path, it would not be an enormous problem to have this object inherit from computation, I think. I would have time to write thinks like this, by the way. Tell me if it is a good idea or if not, how to store the results of a diverse set of computation results in a consistent way. Mark FiersFrom mrp@sanger.ac.uk Tue Dec 5 17:32:36 2000 Date: Tue, 05 Dec 2000 17:32:36 +0000 From: Matthew Pocock mrp@sanger.ac.uk Subject: [Bioperl-l] computation object
"Fiers, M.W.E.J." wrote: > Hi > <snip/> > * A way of storing several set's of sub_seqfeature's, much like the current > 'tags' system, but returning an array of seqfeatures. This is to make it > easier to parse and separate subsets of sub_seqfeature's. The sub_seqfeature > method could stay intact and just return all sets. > Advantages to this structure would be that if somebody inherits from this > object and stores seqfeature's of children of seqfeature's in this > structure, it would still be parsable without the parsing having to know > exactly what subset's are there. BioJava uses FeatureFilter objects to pull out a sub-set of child features. This has the added benefit that you can pull out any sub-set, not just those that the data-publisher thought would be usefull. This ability to arbitrarily filter features turns out to be a real time-saver. If you code it up right, it doesn't affect object encapsulation, and the queries can be optimized into SQL, DAS or whaterver other query languages (can anybody say 'compiler' / 'interpreter' ?). We have well-known filters for locations, names, key/value pairs and a collection of boolean operators (and, not, or etc.), but the user can always supply their own routine. Just my 2c. Feel free to ignore me - Perl is not Java. Matthew > > Mark Fiers > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-lFrom DZhao1@prius.jnj.com Wed Dec 6 01:41:46 2000 Date: Tue, 5 Dec 2000 20:41:46 -0500 From: Zhao, David [PRI] DZhao1@prius.jnj.com Subject: [Bioperl-l] bioperl or perl module can perform SQL-like query on XML files
This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ------_=_NextPart_001_01C05F25.B1A345B0 Content-Type: text/plain Hi there, Does bioperl have such perl module can perform SQL-like query on a XML file, including inserting, modifying and updating the XML file? Thanks in advance! David > David Zhao > Drug Discovery IM&T > The R.W.Johnson PRI > 3210 Merryfield Row > San Diego, CA 92121 > Tel: (858) 784-3184 > > ------_=_NextPart_001_01C05F25.B1A345B0 Content-Type: text/html Content-Transfer-Encoding: quoted-printable <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <HTML> <HEAD> <META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; = charset=3Dus-ascii"> <META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version = 5.5.2653.10"> <TITLE>bioperl or perl module can perform SQL-like query on XML = files</TITLE> </HEAD> <BODY> <P><FONT COLOR=3D"#000080" FACE=3D"Comic Sans MS">Hi there,</FONT> <BR><FONT COLOR=3D"#000080" FACE=3D"Comic Sans MS">Does bioperl have = such perl module can perform SQL-like query on a XML file, including = inserting, modifying and updating the XML file?</FONT></P> <P><FONT COLOR=3D"#000080" FACE=3D"Comic Sans MS">Thanks in = advance!</FONT> <BR><FONT COLOR=3D"#000080" FACE=3D"Comic Sans MS">David</FONT> </P> <UL><UL> <P><FONT COLOR=3D"#000080" FACE=3D"Courier New">David Zhao</FONT> </UL></UL> <P> <FONT COLOR=3D"#000080" = FACE=3D"Courier New">Drug Discovery IM&T</FONT> <BR> <FONT COLOR=3D"#000080" = FACE=3D"Courier New">The R.W.Johnson PRI</FONT> <BR> <FONT COLOR=3D"#000080" = FACE=3D"Courier New">3210 Merryfield Row</FONT> <BR> <FONT COLOR=3D"#000080" = FACE=3D"Courier New">San Diego, CA 92121</FONT> <BR><FONT COLOR=3D"#000080" FACE=3D"Courier New"> = Tel: (858) = 784-3184</FONT> </P> <BR> </BODY> </HTML> ------_=_NextPart_001_01C05F25.B1A345B0--From gordonp@niji.imb.nrc.ca Wed Dec 6 14:19:12 2000 Date: Wed, 6 Dec 2000 10:19:12 -0400 (AST) From: Paul Gordon gordonp@niji.imb.nrc.ca Subject: [Bioperl-l] bioperl or perl module can perform SQL-like query on XML files
> Hi there, > Does bioperl have such perl module can perform SQL-like query on a XML file, > including inserting, modifying and updating the XML file? > Thanks in advance! The great thing about XML is that people have already written generic modules to deal with most kinds of data massaging as long as you use the standard interfaces. You can do such editing of the file using one of the DOM modules on CPAN. I don't know about a SQL type of interface to do such things though... XQL lets you retrieve information only I believe. ________________________________________________________________________ Paul Gordon Paul.Gordon@nrc.ca Genomic Technologies http://maggie.cbr.nrc.ca Institute for Marine Biosciences National Research Council CanadaFrom jason@chg.mc.duke.edu Wed Dec 6 21:45:20 2000 Date: Wed, 6 Dec 2000 16:45:20 -0500 From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] bug #868
This is a multi-part message in MIME format. ------=_NextPart_000_0032_01C05FA3.EBEFE790 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Murad, thanks for your bug report. We'll look at the bug when we get a = chance. =20 The bioperl website - www.bioperl.org is a good place to start learning = more about the bioperl project. You'll also find a link for the Mailing = lists where you can subscribe to the general list where most of the = design discussion takes place. That address is for mailing list = subscription is http://bioperl.org/mailman/listinfo/bioperl-l Jason Stajich ------=_NextPart_000_0032_01C05FA3.EBEFE790 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <META content=3D"text/html; charset=3Diso-8859-1" = http-equiv=3DContent-Type> <META content=3D"MSHTML 5.00.3013.2600" name=3DGENERATOR> <STYLE></STYLE> </HEAD> <BODY bgColor=3D#ffffff> <DIV><FONT face=3DArial size=3D2>Murad, thanks for your bug report. = We'll look at=20 the bug when we get a chance. </FONT></DIV> <DIV><FONT face=3DArial size=3D2>The bioperl website - <A=20 href=3D"http://www.bioperl.org">www.bioperl.org</A> is a good place to = start=20 learning more about the bioperl project. You'll also find a link = for the=20 Mailing lists where you can subscribe to the general list where most of = the=20 design discussion takes place.</FONT><FONT face=3DArial size=3D2> = That address=20 is for mailing list subscription is <A=20 href=3D"http://bioperl.org/mailman/listinfo/bioperl-l">http://bioperl.org= /mailman/listinfo/bioperl-l</A></FONT></DIV> <DIV> </DIV> <DIV><FONT face=3DArial size=3D2>Jason Stajich</FONT></DIV> <DIV> </DIV></BODY></HTML> ------=_NextPart_000_0032_01C05FA3.EBEFE790--From kusalik@cs.usask.ca Wed Dec 6 21:53:01 2000 Date: Wed, 06 Dec 2000 15:53:01 -0600 From: Tony Kusalik kusalik@cs.usask.ca Subject: [Bioperl-l] faculty position available
At the Department of Computer Science at the University of Saskatchewan, we have several openings for faculty positions. At least one of the positions we are trying to fill is in the area of bioinformatics. The "generic" advertisment is below. However, there have some exciting initiatives and opportunities specifically in the area of bioinformatics to be aware of: - The University recently received a CFI grant to establish a $900K bioinformatics laboratory. The laboratory involves researchers from 6 different departments as two national research centers located on campus. The laboratory will start "coming on stream" in the early spring of 2001. - NRC's Plant Biotechnology Institute and Agriculture and Agri-Food Canada's Saskatoon Research Centre (both on campus) are each establishing bioinformatics centres, and welcome collaborations with faculty from the Dept of Comp. Sci. at the UofS. - The UofS will have Canada's only synchotron light source. This facility is under construction, and when complete, will be a nation-wide source for X-ray crystallographic data. - The Department and University are pursuing the introduction of an undergraduate program in bioinformatics. If you would like more information about bioinformatics at the UofS, contact me at kusalik@cs.usask.ca General ad follows ... The Department of Computer Science at the University of Saskatchewan has arguably the best climate in Canada. The friendly and supportive environment is ideal for helping new academics to get established. The collegiality and the healthy mix of teaching and research makes the Department an ideal place to launch a successful academic career. With substantial growth planned across all areas of Computer Science, the Department is seeking several good candidates. Applications are invited for tenure-track faculty positions at the Assistant Professor or Associate Professor level to start July 1, 2001. The Department is interested in outstanding candidates from all areas of computer science. However, preference will be given to candidates interested in areas of database systems, software engineering, bioinformatics, computer networks, hardware systems, or human-computer interaction. We are seeking motivated researchers who are interested in collaborative, applied research that cuts across traditional boundaries. A successful applicant will be expected to build and sustain a strong research program and to make a commitment to excellence in teaching at both the undergraduate and graduate levels. Applicants must have a Ph.D. in computer science or equivalent. Located in Saskatoon, one of Canada's most liveable cities, the University of Saskatchewan is a major Western Canadian university with a wide range of academic programs and approximately 18,000 students. The city of Saskatoon is known as Canada's best place to raise a family, with safe neighbourhoods, excellent schools and great community services. The Department of Computer Science is highly respected both locally and nationally for the quality of its academic programs and research. The Department is well known for its collegiality and supportive environment. It offers graduate programs at the M.Sc. and Ph.D. levels, with approximately 65 students enrolled. There are professionally accredited undergraduate programs in Computer Science and in Software Engineering. The Department produces over 100 BSc's each year. Home to a diverse collection of vigorous research programs, the Department is seeking to develop new multi-disciplinary initiatives in a variety of areas including electronic commerce, bioinformatics, pervasive computing and telecommunications to complement its existing strengths. For further information about the Department, see http://www.cs.usask.ca. Please direct applications or inquiries to the Department Chair: Professor Jim Greer Department of Computer Science 57 Campus Drive University of Saskatchewan Saskatoon, SK S7N 5A9 Canada greer@cs.usask.ca Applications, including curriculum vitae and the names and addresses of three references, will be accepted until all position are filled. Applications are invited from qualified individuals regardless of their immigration status in Canada. The University of Saskatchewan is committed to Employment Equity. Members of Designated Groups (women, aboriginal people, people with disabilities, and visible minorities) are encouraged to self-identify on their applications. Special efforts will be made to assist with locating positions for spouses.From birney@ebi.ac.uk Thu Dec 7 08:53:59 2000 Date: Thu, 7 Dec 2000 08:53:59 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] faculty position available
I'd like to remind everyone on this list that this is *not* the right forum for job advertisments, including academic positions. I realise that is not made clear on our web site, and we do not currently have moderation. If people could stick by this rule that would be great. ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From mrp@sanger.ac.uk Thu Dec 7 10:24:03 2000 Date: Thu, 07 Dec 2000 10:24:03 +0000 From: Matthew Pocock mrp@sanger.ac.uk Subject: [Bioperl-l] faculty position available
Hi all, Could we have a specific SPAM mailing list for jobs/adds etc. & clearly state that that is the list to send spam to on each project homepage? biojobs-l or biospam-l or boianounce-l would be fine. Then, at least people have a way to post to us without getting shouted at, and we can all opt not to subscribe to it if we don't like those sort of messages. Matthew Ewan Birney wrote: > I'd like to remind everyone on this list that this is *not* the right > forum for job advertisments, including academic positions. > > I realise that is not made clear on our web site, and we do not currently > have moderation. If people could stick by this rule that would be great. > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > <birney@ebi.ac.uk>. > ----------------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-lFrom jason@chg.mc.duke.edu Fri Dec 8 16:52:43 2000 Date: Fri, 8 Dec 2000 11:52:43 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] Bio::Object
I was thinking, as I am starting to use the temp dir for code in (Bio::Seq;:LargePrimarySeq, Bio::DB::WebSeqDBI ), maybe it would make sense to have a tempdir routine in Bio::Root::RootI or some sort of Bio::Root::Util (depending on how we decide to clean things up). This would be set in a BEGIN block with the code looking something like this eval { $TEMPDIR = File::Spec->tmpdir(); }; if( $@ || !defined $TEMPDIR || $TEMPDIR eq '' ) { if( defined $ENV{'TMPDIR'} ) { $TEMPDIR = $ENV{'TMPDIR'} } elsif( defined $ENV{'TEMPDIR'} ) { $TEMPDIR = $ENV{'TEMPDIR'} } elsif(-w '/tmp') { $TEMPDIR = '/tmp' } else { $TEMPDIR = "."; } } # in initialize we would initialize this sub _initialize { $self->tmpdir($TEMPDIR); } I'd also like to be able to have a tempfilename generator - we could use the File::Temp or File::MkTemp modules or write our own routine in RootI. Just would be important to have a standardization. How does this sound? -Jason Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.mc.duke.edu/From gordonp@niji.imb.nrc.ca Fri Dec 8 17:21:49 2000 Date: Fri, 8 Dec 2000 13:21:49 -0400 (AST) From: Paul Gordon gordonp@niji.imb.nrc.ca Subject: [Bioperl-l] Bio::Object
> I'd also like to be able to have a tempfilename generator - we could use > the File::Temp or File::MkTemp modules or write our own routine in RootI. > Just would be important to have a standardization. I use File::Temp and am quite happy with it. It has some nice features like unlinking files as soon as the stream is opened to avoid people messing with them, and automatic file deletion... My $0.02.From hlapp@gmx.net Fri Dec 8 18:55:04 2000 Date: Fri, 08 Dec 2000 10:55:04 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Re: Bio::DB::WebSeqDB
Aaron J Mackey wrote: > > > Jason Stajich wrote: > > > > b) Write our own LWP::Protocol class which extends LWP::Protocol::http > > > and processes the stream as it goes rather than reading the whole thing > > > in, not sure about the feasibility here. > > While this would seem to be the most desirable case, in actuality I'm not > so sure: You'd be asking the http connection to remain open for as long as > you needed it. Now if all you were doing was converting formats, no big > deal, but if you were piping each sequence to some analysis program, I'm > not sure you could count on the web server on the other side of the http > connection to keep the connection open. I could be wrong, though, as I'm > definitely not a web server expert. I know many servers have KeepAlive > turned off (usually used for server-push applications), not sure how this > would impact this application. > The performance penalty concern I had was that downloading the sequences may actually take the same time as processing them. So, if you could start firing something in the background on your first seq while the second keeps loading, it may save you significant time on a slow network connection, and the server certainly wouldn't shut down (would it?) the connection in the middle of the download. But again, I don't think this is even a close to average use case of Bioperl, so it is probably overkill to try to accommodate this situation efficiently. People, correct me if I'm wrong. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From dag@sonsorol.org Fri Dec 8 19:18:44 2000 Date: Fri, 08 Dec 2000 14:18:44 -0500 From: Chris Dagdigian dag@sonsorol.org Subject: [Bioperl-l] Changes to the wiki-enabled portions of our website(s)
Hi folks, Over the past few months there have been several incidents where people have abused the collaborative editing features contained within the wiki-enabled portions of the Open Bio websites (bioperl.org, biojava.org, biocorba.org, bioxml.org and biopython.org). The most recent incident happened within last 24 hours when someone deleted and/or attempted to change the bioperl wiki docs that outlined our release 07 roadmap and module checklist. Although we have enough logs & audit data to start tracking these people down we haven't bothered - simple web vandals are not worth our time. The CVS integration within Wiki makes it easy to roll back the malicious deletions & changes whenever we detect them. Special thanks are owed to Jason Stajich who wrote some behind-the-scenes scripts that automate the rebuild/recover process. The problem has now become one of administrative time and effort -- we have better things to do than monitor our wiki constantly. At the same time the obvious benefits of having anyone within our projects be able to create and update web content make it essential to keep the system around. Hence a compromise (and a bit of a social experiment): We are making the assumption that the web vandals are just random surfers who chanced on our site and could not resist the temptation of web links that say "edit this page" and "delete this page". We are hoping that they are not also subscribers who are reading our mailing lists :) So-- I have now password protected the "edit" and "delete" portions of all the various Open Bio project wiki sites. The 'experiment' is that this email is going to disclose the username and password so that all of you can continue to help improve and update our web content. We are hoping that this semi-public password will be enough to keep our site safe from the casual sort of mischief. Wiki edit/delete access info: ====================== username: wiki password: wicked Our backup plan if this experiment fails is to change the password and reveal it only to people who ask for it. I'm hoping that we will not have to take this step as it will have the effect of slowing down our content creation and updating progress. Regards, Chris (and all the Open Bio admin folks) Chris Dagdigian -- Blackstone Technology Group (Work ) dagdigian@computefarm.com (Home) dag@sonsorol.org (Web ) http://ComputeFarm.com, http://open-bio.org, http://sonsorol.org (More ) Full contact info and schedule -- http://sonsorol.org/dag/contact.htmlFrom perl_adm@puny.vm.com Fri Dec 8 20:48:27 2000 Date: 8 Dec 2000 20:48:27 -0000 From: John van V perl_adm@puny.vm.com Subject: [Bioperl-l] Hello, Newbie Questions
I consulted for Merck & Co. and developed a strong interest in the possible use or perl in this field. I am however, an Admin, not a scientist, but none-the-less am interested in a career in supporting health and computing. I would be most grateful for links to documents outlining the field as well as specific uses for software like bioperl. I also have an industry based perl club in NY, Perl Users of NY, and I am wondering if any of the list members are located in NY and would be interested in speaking about their products. http://puny.vm.com Tia, JohnFrom birney@ebi.ac.uk Sat Dec 9 19:03:10 2000 Date: Sat, 9 Dec 2000 19:03:10 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] An open letter to bioinformatcis researchers
Dear fellow bioinformatics developers: By now you have probably heard that Celera Genomics has submitted their human genome paper to the journal Science. Science and Celera have agreed to special terms for the release of the human genome sequence data. It will be made available through the Celera website, and will not be submitted to the international DNA database consortium (GenBank, EMBL and DDBJ). Science's statement regarding the agreement is at: http://www.sciencemag.org/feature/data/announcement/genomesequenceplan.shl All major journals, including Science, have a policy of deposition of sequence data with the "appropriate data bank". The accepted community standard is submission to GenBank/EMBL/DDBJ. The reason for this deposition is to make the results of the work openly available for future research. This principle was specifically mentioned in the Clinton/Blair statement on human genome sequencing - http://www.usinfo.state.gov/topical/global/biotech/00031401.htm - who strongly upheld the view that "unencumbered access" to genome data was critical. The terms of the Celera/Science agreement will give us access to the genome sequence, but not unencumbered access. Celera is suggesting publishing their data under a MTA (Material Transfer Agreement) which would prevent large scale downloads and incorporation of this data into GenBank/EMBL/DDBJ. In order to download the data, you and your institution will have to sign a contract guaranteeing that you will not "redistribute" the Celera data. Science believes that the deal is an adequate compromise because it provides us the right to download the data and publish our results. We believe Science is thinking in terms of single gene biology, not large scale bioinformatics. It is probably not hard for you to imagine scenarios in bioinformatics in which "publication" and "redistribution" are virtually the same thing; we cannot imagine Celera allowing us to incorporate data into Pfam, for example, nor into Ensembl. We are asking for your support in writing to Science to politely insist that genome sequence papers should be accompanied by unencumbered deposition to GenBank/EMBL/DDBJ. Please note that we have no issue with Celera either keeping this data unpublished for commercial reasons, nor with them combining their data with freely available data from the public genome projects. We would defend their right to do either. Our view is simply that the genome community has established a clear principle that published genome data must be deposited in the international databases, that bioinformatics is fueled by this principle, and that Science therefore threatens to set a precedent that undermines our research. We encourage you to express your views on this matter to Donald Kennedy (kennedyd@kennedyd.pobox.stanford.edu), the Editor-in-Chief of Science, and/or to Barbara Jasny (bjasny@aaas.org), the managing editor in charge of genomics papers at Science. Here is a Q/A about some points. * Why does this matter? A classic example of how our field began to have an impact on molecular biology was Russ Doolittle's discovery of a significant sequence similarity between a viral oncogene and a cellular growth factor receptor. Russ could not have found that result if he did not have an aggregate database of previously published sequences. We have come a long way from Russ and his son typing data into the NEWAT protein sequence database by hand. Throughout the 80's the international database community fought hard to insist that DNA sequence data be deposited into the public domain databases. Journals now generally require deposition as a condition of accepting a paper. The forming of these databases and the international agreements on data sharing between the European, American and Japanase databases fostered the rapid development of bioinformatics research. We now all take for granted the fact that large DNA databases are accessible from a single point of contact, and the identifiers are coordinated worldwide. Bioinformatics research relies on open data with minimal legal encumberances submitted to public databases. Without these databases there is no real substrate for bioinformatics research. * What would happen if this precedent was set? There are a number of consequences if Science set a precedent that allowed people to publish DNA data under a variety of MTAs. - One would not be able to form a single DNA database on which to do bioinformatics research, and the derivative databases (Swissprot, PIR, Pfam, PROSITE, etc.) would not be legal. - Bench biologists would have to visit a number of websites and possibly enter into a number of different contracts for access to DNA data. Unexpected informative homologies could become prohibitively difficult to find. - You may need to get a legal review before you can publish the results of an analysis, if your analysis is large-scale and detailed enough that it could be reasonably interpreted as a "redistribution" of the primary sequence data. You could be sued for breach of contract for a Web Supplement page that discloses extensive sequence data supporting your results. - Scientific openness will be undermined. Efforts to engage the community in cooperative annotation of large genomes, for instance, would be blocked -- we can't usefully annotate a genome we can't freely redistribute. * Celera paid for it. Can't they set their own access terms? Absolutely. We have no issue with Celera's commercial data gathering, and their right to set their own access terms to their data. We do feel, though, that scientific publications carry a certain ethical responsibility. The purpose of a paper is to enable the community to efficiently build on your work. There is always a tension between disclosing your work to your competitors (this is not unique to private companies!) and receiving scientific credit for your work via publication. This tension is natural, and maintaining a consistent and acceptable balance is the reason that scientist and journals establish community standards that dictate how data are required to be disclosed. In this case, the clearly accepted community standard is that DNA sequence data are deposited in Genbank/EMBL/DDBJ upon publication. We certainly do not blame Celera (much) for seeking a special deal that lets them have their cake and eat it too -- they would understandably like scientific credit for their terrific and important work in human sequencing, and they would also like a profitable business model. We do blame Science for failing to take a strong stand in upholding accepted scientific publication practices. We cannot accept that it is necessary to sacrifice ethics for expediency. * Science claims they are honouring their own policy. What gives? Science now claims that all their policy really requires is that archival data be available via a publicly accessible database. We think this is a conveniently revisionist view of their own policy, which states (in Instructions to Authors): "archival data sets (such as sequence and structural data) must be deposited with the appropriate data bank and the identifier code should be sent to Science for inclusion in the published manuscript (coordinates must be released at the time of publication)" Notice the use of the definitive article "THE appropiate data bank", the notion of "deposition", and the additional rider that the identifier code should be sent. The spirit of this statement seems clear to us. Science's statement anticipates that there is an appropriate, single, aggregrate community database for each sort of archival data, whether DNA sequence, protein structure coordinates, or something else. Sensibly, they don't name every possible database for every possible archival data set. They expect that recognized community standards exist. In no way does Science's statement seem consistent with the view that an individual lab could start its own "public" DNA sequence database and send a meaningless internal database identifier; to try to read it that way is a post hoc rationalisation. * What can Science do? This is a done deal. It's true that this is a done deal. Science and Celera have mutually agreed to the general terms of data release. But there are two ways that we can minimize the damage. First, the details of the agreement are not set. In particular, there is no definition of allowed "publication" versus prohibited "redistribution". Science could specify definitions that did not interfere with noncommercial uses of the data in bioinformatics, allowing us redistribution rights if it made sense in the context of our project (for example, a genome annotation project like Ensembl). Second, and preferably, Science -- or even the peer reviewers -- can uphold Science's own data access policy, and reject the paper. Incidentally, they might also choose to enforce Science's policy on prior publication, which states "...the main findings of a paper should not have been reported in the mass media. Authors are, however, permitted to present their data at open meetings but should not overtly seek media attention." If I issued a press release upon submission of a manuscript to Science, like Celera did, Science would rightly fire it back to me without review. * What can I do? Agitate. Let Science know that you care. They consider this deal to be a trial balloon for future genome papers. Even if we can't change the deal with Celera, we can try to make sure it's a one-time-only deal that's viewed as a Big Mistake. Write a letter to Science and tell them how their actions would impact your research, both in the long term and in the short term. Also, you can pass on this open letter to other bioinformatics researchers you know. Dr Sean Eddy, Alvin Goldfarb Professor of Computational Biology, Howard Hughes Medical Institute, Washington University in St. Louis, USA Dr Ewan Birney Team Leader, Genomic Annotation European Bioinformatics Institute, UKFrom birney@ebi.ac.uk Sun Dec 10 13:44:31 2000 Date: Sun, 10 Dec 2000 13:44:31 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Update on Don Kennedy's address.
The address for don kennedy we gave out in our letter kennedyd@kennedyd.pobox.stanford.edu seems to bounce. kennedyd@stanford.edu seems not to bounce (hopefully because it is getting delivered) ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From nodrogluap@yahoo.com Sun Dec 10 15:44:27 2000 Date: Sun, 10 Dec 2000 07:44:27 -0800 (PST) From: Paul Gordon nodrogluap@yahoo.com Subject: [Bioperl-l] An open letter to bioinformatcis researchers
[Not cross-posted] Just an idea... Depending on the wording of the MTA, I'm sure we could have a nice little bioperl script/module that facilitates the downloading of the data into an aggregate form... You'd be redistributing code, not the data it downloads. > The terms of the Celera/Science agreement will give > us access to the genome sequence, but not > unencumbered access. Celera is suggesting publishing > their data under a MTA (Material Transfer Agreement) > which would prevent large scale downloads and > incorporation of this data into GenBank/EMBL/DDBJ. In > order to download the data, you and your institution > will have to sign a contract guaranteeing that you > will not "redistribute" the Celera data. __________________________________________________ Do You Yahoo!? Yahoo! Shopping - Thousands of Stores. Millions of Products. http://shopping.yahoo.com/From jmh.neefs@pandora.be Sun Dec 10 21:25:13 2000 Date: Sun, 10 Dec 2000 22:25:13 +0100 From: Jean-Marc Neefs jmh.neefs@pandora.be Subject: [Bioperl-l] RE: An open letter to bioinformatcis researchers
Dear Ewan and Sean, I would like to add my 2 cents. The biggest trouble with hiding the genome is keeping information away from the scientists, and slowing down research. On the other hand, playing devil's advocate, this could be another call to the public effort for quicker finishing. Anyway, Celera only has a small window of opportunity before the public data become available, and we all will have enough laboratory work to analyse and confirm the coming data deluge. To end on a positive note: keep up the good work on Ensembl. I learn more and more about it each day and find it more and more useful. I will contact Science. Kind Regards, Jean-Marc Neefs Senior Bioinformatics Scientist -----Original Message----- From: Ewan Birney [SMTP:birney@ebi.ac.uk] Sent: Saturday, December 09, 2000 8:03 PM To: bioperl-l@bioperl.org; biojava-l@biojava.org; biopython@biopython.org; bioxml-dev@bioxml.org; ensembl-dev@ebi.ac.uk; apollo@ebi.ac.uk Subject: An open letter to bioinformatcis researchers Dear fellow bioinformatics developers: By now you have probably heard that Celera Genomics has submitted their human genome paper to the journal Science. Science and Celera have agreed to special terms for the release of the human genome sequence data. It will be made available through the Celera website, and will not be submitted to the international DNA database consortium (GenBank, EMBL and DDBJ). Science's statement regarding the agreement is at: http://www.sciencemag.org/feature/data/announcement/genomesequenceplan.shl All major journals, including Science, have a policy of deposition of sequence data with the "appropriate data bank". The accepted community standard is submission to GenBank/EMBL/DDBJ. The reason for this deposition is to make the results of the work openly available for future research. This principle was specifically mentioned in the Clinton/Blair statement on human genome sequencing - http://www.usinfo.state.gov/topical/global/biotech/00031401.htm - who strongly upheld the view that "unencumbered access" to genome data was critical. The terms of the Celera/Science agreement will give us access to the genome sequence, but not unencumbered access. Celera is suggesting publishing their data under a MTA (Material Transfer Agreement) which would prevent large scale downloads and incorporation of this data into GenBank/EMBL/DDBJ. In order to download the data, you and your institution will have to sign a contract guaranteeing that you will not "redistribute" the Celera data. Science believes that the deal is an adequate compromise because it provides us the right to download the data and publish our results. We believe Science is thinking in terms of single gene biology, not large scale bioinformatics. It is probably not hard for you to imagine scenarios in bioinformatics in which "publication" and "redistribution" are virtually the same thing; we cannot imagine Celera allowing us to incorporate data into Pfam, for example, nor into Ensembl. We are asking for your support in writing to Science to politely insist that genome sequence papers should be accompanied by unencumbered deposition to GenBank/EMBL/DDBJ. Please note that we have no issue with Celera either keeping this data unpublished for commercial reasons, nor with them combining their data with freely available data from the public genome projects. We would defend their right to do either. Our view is simply that the genome community has established a clear principle that published genome data must be deposited in the international databases, that bioinformatics is fueled by this principle, and that Science therefore threatens to set a precedent that undermines our research. We encourage you to express your views on this matter to Donald Kennedy (kennedyd@kennedyd.pobox.stanford.edu), the Editor-in-Chief of Science, and/or to Barbara Jasny (bjasny@aaas.org), the managing editor in charge of genomics papers at Science. Here is a Q/A about some points. * Why does this matter? A classic example of how our field began to have an impact on molecular biology was Russ Doolittle's discovery of a significant sequence similarity between a viral oncogene and a cellular growth factor receptor. Russ could not have found that result if he did not have an aggregate database of previously published sequences. We have come a long way from Russ and his son typing data into the NEWAT protein sequence database by hand. Throughout the 80's the international database community fought hard to insist that DNA sequence data be deposited into the public domain databases. Journals now generally require deposition as a condition of accepting a paper. The forming of these databases and the international agreements on data sharing between the European, American and Japanase databases fostered the rapid development of bioinformatics research. We now all take for granted the fact that large DNA databases are accessible from a single point of contact, and the identifiers are coordinated worldwide. Bioinformatics research relies on open data with minimal legal encumberances submitted to public databases. Without these databases there is no real substrate for bioinformatics research. * What would happen if this precedent was set? There are a number of consequences if Science set a precedent that allowed people to publish DNA data under a variety of MTAs. - One would not be able to form a single DNA database on which to do bioinformatics research, and the derivative databases (Swissprot, PIR, Pfam, PROSITE, etc.) would not be legal. - Bench biologists would have to visit a number of websites and possibly enter into a number of different contracts for access to DNA data. Unexpected informative homologies could become prohibitively difficult to find. - You may need to get a legal review before you can publish the results of an analysis, if your analysis is large-scale and detailed enough that it could be reasonably interpreted as a "redistribution" of the primary sequence data. You could be sued for breach of contract for a Web Supplement page that discloses extensive sequence data supporting your results. - Scientific openness will be undermined. Efforts to engage the community in cooperative annotation of large genomes, for instance, would be blocked -- we can't usefully annotate a genome we can't freely redistribute. * Celera paid for it. Can't they set their own access terms? Absolutely. We have no issue with Celera's commercial data gathering, and their right to set their own access terms to their data. We do feel, though, that scientific publications carry a certain ethical responsibility. The purpose of a paper is to enable the community to efficiently build on your work. There is always a tension between disclosing your work to your competitors (this is not unique to private companies!) and receiving scientific credit for your work via publication. This tension is natural, and maintaining a consistent and acceptable balance is the reason that scientist and journals establish community standards that dictate how data are required to be disclosed. In this case, the clearly accepted community standard is that DNA sequence data are deposited in Genbank/EMBL/DDBJ upon publication. We certainly do not blame Celera (much) for seeking a special deal that lets them have their cake and eat it too -- they would understandably like scientific credit for their terrific and important work in human sequencing, and they would also like a profitable business model. We do blame Science for failing to take a strong stand in upholding accepted scientific publication practices. We cannot accept that it is necessary to sacrifice ethics for expediency. * Science claims they are honouring their own policy. What gives? Science now claims that all their policy really requires is that archival data be available via a publicly accessible database. We think this is a conveniently revisionist view of their own policy, which states (in Instructions to Authors): "archival data sets (such as sequence and structural data) must be deposited with the appropriate data bank and the identifier code should be sent to Science for inclusion in the published manuscript (coordinates must be released at the time of publication)" Notice the use of the definitive article "THE appropiate data bank", the notion of "deposition", and the additional rider that the identifier code should be sent. The spirit of this statement seems clear to us. Science's statement anticipates that there is an appropriate, single, aggregrate community database for each sort of archival data, whether DNA sequence, protein structure coordinates, or something else. Sensibly, they don't name every possible database for every possible archival data set. They expect that recognized community standards exist. In no way does Science's statement seem consistent with the view that an individual lab could start its own "public" DNA sequence database and send a meaningless internal database identifier; to try to read it that way is a post hoc rationalisation. * What can Science do? This is a done deal. It's true that this is a done deal. Science and Celera have mutually agreed to the general terms of data release. But there are two ways that we can minimize the damage. First, the details of the agreement are not set. In particular, there is no definition of allowed "publication" versus prohibited "redistribution". Science could specify definitions that did not interfere with noncommercial uses of the data in bioinformatics, allowing us redistribution rights if it made sense in the context of our project (for example, a genome annotation project like Ensembl). Second, and preferably, Science -- or even the peer reviewers -- can uphold Science's own data access policy, and reject the paper. Incidentally, they might also choose to enforce Science's policy on prior publication, which states "...the main findings of a paper should not have been reported in the mass media. Authors are, however, permitted to present their data at open meetings but should not overtly seek media attention." If I issued a press release upon submission of a manuscript to Science, like Celera did, Science would rightly fire it back to me without review. * What can I do? Agitate. Let Science know that you care. They consider this deal to be a trial balloon for future genome papers. Even if we can't change the deal with Celera, we can try to make sure it's a one-time-only deal that's viewed as a Big Mistake. Write a letter to Science and tell them how their actions would impact your research, both in the long term and in the short term. Also, you can pass on this open letter to other bioinformatics researchers you know. Dr Sean Eddy, Alvin Goldfarb Professor of Computational Biology, Howard Hughes Medical Institute, Washington University in St. Louis, USA Dr Ewan Birney Team Leader, Genomic Annotation European Bioinformatics Institute, UKFrom birney@ebi.ac.uk Mon Dec 11 16:17:37 2000 Date: Mon, 11 Dec 2000 16:17:37 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] LiveSeq
Joseph - Thanks for your commits. I think it would be nice Joseph to talk through your objects first on the list and get feedback, in particular for the naming of your objects. Once commited we can't do a great deal about changing their names. This is partly my and partly heikki's fault for not making sure that you talked things through on the list first. Don't lose sleep over it, but be aware that you are now working in a collaborative enviroment, and you should post to the list before making large commits. I think a nice post about what LiveSeq is and what it does to the list would be a good thing as well. ewan ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From hlapp@gmx.net Mon Dec 11 18:24:57 2000 Date: Mon, 11 Dec 2000 10:24:57 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] LiveSeq
Ewan Birney wrote: > > I think a nice post about what LiveSeq is and what it does to the list > would be a good thing as well. > I agree. The code has been committed to the main trunk, but the logs say that it (only?) works with bioperl 0.6.2 (the main trunk *is* different, some modules e.g. changed their API). I noted that test scripts are pending; any code submissions without tests are likely to be excluded from the 0.7 branch. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From dblock@gene.pbi.nrc.ca Mon Dec 11 20:06:57 2000 Date: Mon, 11 Dec 2000 14:06:57 -0600 (CST) From: David Block dblock@gene.pbi.nrc.ca Subject: [Bioperl-l] Problems with revcom and translate in PrimarySeqI
I've been working with the main trunk - I know, I know, but I'm a big boy, I can handle it. I get tons of errors when trying to either revcom or translate a PrimarySeq. It seems that can_call_new=1 for PrimarySeq objects, but new is not implemented properly, because it simply defaults to RootI's new. That expects a class as first argument, but gets an object instance, which screws it up, since the class is now Bio::PrimarySeq=HASH(0x<somebignumber>) This leads to the error: Can't locate object method "_initialize" via package "Bio::PrimarySeq=HASH(0x11392918)" at /home/dave/bioperl/bioperl-live/Bio/Root/RootI.pm line 79. Two possible fixes - RootI could check to see if $class being sent to it is a reference, and if so, make $class the class of $class (now that was clear), or Bio::PrimarySeq could return 0 for can_call_new, which would lead to proper behaviour in this case (I think). Code for the first fix (due to Damian Conway, OOPerl): sub new { my ($caller, @arg) = @_; my $caller_is_obj = ref($caller); my $class = $caller_is_obj || $caller; my $self = bless {}, $class; This is what we do in Workbench, and it works fine. -- David Block dblock@gene.pbi.nrc.ca http://bioinfo.pbi.nrc.ca/dblock/wiki Plant Biotechnology Institute National Research Council of Canada Saskatoon, SaskatchewanFrom dblock@gene.pbi.nrc.ca Mon Dec 11 20:10:12 2000 Date: Mon, 11 Dec 2000 14:10:12 -0600 (CST) From: David Block dblock@gene.pbi.nrc.ca Subject: [Bioperl-l] revcom/translate fix
setting can_call_new to 0 does work for Workbench... -- David Block dblock@gene.pbi.nrc.ca http://bioinfo.pbi.nrc.ca/dblock/wiki Plant Biotechnology Institute National Research Council of Canada Saskatoon, SaskatchewanFrom birney@ebi.ac.uk Mon Dec 11 20:23:27 2000 Date: Mon, 11 Dec 2000 20:23:27 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Problems with revcom and translate in PrimarySeqI
On Mon, 11 Dec 2000, David Block wrote: > I've been working with the main trunk - I know, I know, but I'm a big boy, > I can handle it. > > I get tons of errors when trying to either revcom or translate a > PrimarySeq. It seems that can_call_new=1 for PrimarySeq objects, but new > is not implemented properly, because it simply defaults to RootI's > new. That expects a class as first argument, but gets an object instance, > which screws it up, since the class is now > Bio::PrimarySeq=HASH(0x<somebignumber>) > > This leads to the error: > > Can't locate object method "_initialize" via package > "Bio::PrimarySeq=HASH(0x11392918)" at > /home/dave/bioperl/bioperl-live/Bio/Root/RootI.pm line 79. > > Two possible fixes - RootI could check to see if $class being sent to it > is a reference, and if so, make $class the class of $class (now that was > clear), or > Bio::PrimarySeq could return 0 for can_call_new, which would lead to > proper behaviour in this case (I think). > > Code for the first fix (due to Damian Conway, OOPerl): > sub new { > my ($caller, @arg) = @_; > my $caller_is_obj = ref($caller); > my $class = $caller_is_obj || $caller; > my $self = bless {}, $class; > > This is what we do in Workbench, and it works fine. The first fix is the right fix. We should propagate this across all new modules. Jason, Hilmar - this is going to be a gotcha if we don't fix it, but the main area it will effect is primaryseq/seq. e. > -- > David Block > dblock@gene.pbi.nrc.ca > http://bioinfo.pbi.nrc.ca/dblock/wiki > Plant Biotechnology Institute > National Research Council of Canada > Saskatoon, Saskatchewan > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From jason@chg.mc.duke.edu Mon Dec 11 22:27:11 2000 Date: Mon, 11 Dec 2000 17:27:11 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] Problems with revcom and translate in PrimarySeqI
Yes, this was a problem I fixed in my local copy of the code, but did not propigate out because I am confused. Do we want to have a generic new routine in RootI which can be overriden by classes that need special attention (Bio::SeqIO::* classes explicitly describe new otherwise they will get the new from the class they are extends - Bio::SeqIO)? I am fine with a generic 'new' in RootI, but there has been a little bit back and forth on this. What did we finally decide was the Right way? -Jason On Mon, 11 Dec 2000, Ewan Birney wrote: > On Mon, 11 Dec 2000, David Block wrote: > > > I've been working with the main trunk - I know, I know, but I'm a big boy, > > I can handle it. > > > > I get tons of errors when trying to either revcom or translate a > > PrimarySeq. It seems that can_call_new=1 for PrimarySeq objects, but new > > is not implemented properly, because it simply defaults to RootI's > > new. That expects a class as first argument, but gets an object instance, > > which screws it up, since the class is now > > Bio::PrimarySeq=HASH(0x<somebignumber>) > > > > This leads to the error: > > > > Can't locate object method "_initialize" via package > > "Bio::PrimarySeq=HASH(0x11392918)" at > > /home/dave/bioperl/bioperl-live/Bio/Root/RootI.pm line 79. > > > > Two possible fixes - RootI could check to see if $class being sent to it > > is a reference, and if so, make $class the class of $class (now that was > > clear), or > > Bio::PrimarySeq could return 0 for can_call_new, which would lead to > > proper behaviour in this case (I think). > > > > Code for the first fix (due to Damian Conway, OOPerl): > > sub new { > > my ($caller, @arg) = @_; > > my $caller_is_obj = ref($caller); > > my $class = $caller_is_obj || $caller; > > my $self = bless {}, $class; > > > > This is what we do in Workbench, and it works fine. > > The first fix is the right fix. We should propagate this across all new > modules. Jason, Hilmar - this is going to be a gotcha if we don't fix it, > but the main area it will effect is primaryseq/seq. > > e. > > > > -- > > David Block > > dblock@gene.pbi.nrc.ca > > http://bioinfo.pbi.nrc.ca/dblock/wiki > > Plant Biotechnology Institute > > National Research Council of Canada > > Saskatoon, Saskatchewan > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > <birney@ebi.ac.uk>. > ----------------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From jason@chg.mc.duke.edu Mon Dec 11 23:05:52 2000 Date: Mon, 11 Dec 2000 18:05:52 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] final proposal: Bio::DB::WebSeqDBI
The final proposal before I commit the code (all tests pass on my machine). 2 new modules Bio::DB::WebSeqDBI - ISA Bio::DB::RandomAccessI Bio::DB::NCBIHelper ISA Bio::DB::WebSeqDBI rewrites of Bio::DB::GenBank, Bio::DB::GenPept, Bio::DB::SwissProt. Bio::DB::WebSeqI - This interface encapsulates the standard data retrieval methods from a Web Sequence Database. Implementing classes must implement the method get_request while takes as arguments a hash of qualifiers - uids, format, etc with which to query the database and returns a HTTP::Request object. The WebSeqDBI class manages a LWP::UserAgent for obtaining data from the web dbs and turning that data stream into a Bio::SeqIO. Because of the way LWP works right now, it is not possible to take a data stream from webserver and transform it into a Bio::SeqIO, rather, one must read all the data from the server and then either store that in a tempfile or transform it into a IO::String which can be treated as a filehandle. Also a pain, the retrieval method from NCBI has some HTML 'contamination' which needs to be screened out through a method call to postprocess_data. One issue I am not sure how to best deal with, the temporary file removal at the end of the life of the Bio::DB::WebSeqDBI object. The following code illustrates a case this will remove files too soon. my $seqdb = new Bio::DB::Genbank(-retrievaltype=>'tempfile'); my $seqio = $seqdb->get_Stream_by_id($accession); undef $seqdb; # this will remove the seqdb object and cleanup the # tempfile that was created my $seq = $seqio->next_seq(); # bomb because no file exists now. Anyone with better ideas on this feel free to let me know. Bio::DB::NCBIHelper - Since the Bio::DB::GenBank and Bio::DB::GenPept are so similar I wrote a class that encapsulates all the of common functionality for retrieving sequence data from these databases. I'm sure it will all make much more sense once I check the code in, I just wanted to check and see if anyone has comments or wants clarification before I checkin major reworks to the current modules. Is the name WebSeqDBI misleading - (ie looks like it would be a DBI module...?) We like to use 'I' at the end of a module name to denote interfaces. -Jason Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From dblock@gene.pbi.nrc.ca Mon Dec 11 23:18:01 2000 Date: Mon, 11 Dec 2000 17:18:01 -0600 (CST) From: David Block dblock@gene.pbi.nrc.ca Subject: [Bioperl-l] final proposal: Bio::DB::WebSeqDBI
This is good stuff... comments below. On Mon, 11 Dec 2000, Jason Stajich wrote: > The final proposal before I commit the code (all tests pass on my > machine). > > 2 new modules > Bio::DB::WebSeqDBI - ISA Bio::DB::RandomAccessI > Bio::DB::NCBIHelper ISA Bio::DB::WebSeqDBI > > Is the name WebSeqDBI misleading - (ie looks like it would be a DBI > module...?) We like to use 'I' at the end of a module name to denote > interfaces. Yes, DBI is pretty sacred, don't you think? How about WebSeqDB_I? > > -Jason > Jason Stajich > jason@chg.mc.duke.edu > Center for Human Genetics > Duke University Medical Center > http://www.chg.duke.edu/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > -- David Block dblock@gene.pbi.nrc.ca http://bioinfo.pbi.nrc.ca/dblock/wiki Plant Biotechnology Institute National Research Council of Canada Saskatoon, SaskatchewanFrom lapp@gnf.org Mon Dec 11 23:47:39 2000 Date: Mon, 11 Dec 2000 15:47:39 -0800 From: Hilmar Lapp lapp@gnf.org Subject: [Bioperl-l] final proposal: Bio::DB::WebSeqDBI
Jason Stajich wrote: > > One issue I am not sure how to best deal with, the temporary file removal > at the end of the life of the Bio::DB::WebSeqDBI object. The following > code illustrates a case this will remove files too soon. > > my $seqdb = new Bio::DB::Genbank(-retrievaltype=>'tempfile'); > my $seqio = $seqdb->get_Stream_by_id($accession); > undef $seqdb; # this will remove the seqdb object and cleanup the > # tempfile that was created > my $seq = $seqio->next_seq(); # bomb because no file exists now. > Provided that things work the same way as in e.g. C (and it ought to be so, because it's the OS that dictates it), the tempfile should not be physically removed as long as there is a stream (filehandle) open on it (it may be invisible to directory listings though). Since SeqIO::* modules keep a file handle open until $seqio->close() is called, there should be no problem. Am I missing something? Have you tested for the behaviour your mentioning? > > Is the name WebSeqDBI misleading - (ie looks like it would be a DBI > module...?) We like to use 'I' at the end of a module name to denote > interfaces. > I agree with David, it's somewhat misleading. I don't have a strong view though. In general, I wouldn't have considered it as an interface anyway (why does it qualify as one?), so why not simply omit the trailing 'I'? Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 -------------------------------------------------------------From lapp@gnf.org Tue Dec 12 00:17:43 2000 Date: Mon, 11 Dec 2000 16:17:43 -0800 From: Hilmar Lapp lapp@gnf.org Subject: [Bioperl-l] Problems with revcom and translate in PrimarySeqI
Jason Stajich wrote: > > Yes, this was a problem I fixed in my local copy of the code, but did not > propigate out because I am confused. > > Do we want to have a generic new routine in RootI which can be overriden > by classes that need special attention (Bio::SeqIO::* classes > explicitly describe new otherwise they will get the new from the class > they are extends - Bio::SeqIO)? I am fine with a generic 'new' in RootI, > but there has been a little bit back and forth on this. > > What did we finally decide was the Right way? > I don't know, to be honest. I've still not seen an agreed-upon template. For me initialization chaining is still hard to imagine without chaining to the inherited method, and how do you know that you will stay inheriting right from RootI. So the safest way would then be that RootI::new() becomes the routine creating the object, all other just poke with the hash. E.g.: package Bio::Root::RootI; sub new { my ($class, @args) = @_; my $obj = bless {}, ref($class) || $class; return $obj; } package SomePkg::ClassA; use Bio::Root::RootI; @ISA = (Bio::Root::RootI); sub new { my ($class, @args) = @_; my $obj = $class->SUPER::new(@_); # do our own initialization # $obj->{'somekey'} = "blabla"; ... return $obj; } package SomePkg::ClassB; use SomePkg::ClassA; @ISA = (SomePkg::ClassA); sub new { my ($class, @args) = @_; my $obj = $class->SUPER::new(@_); # do our own initialization # ... return $obj; } This should work (it actually does on my machine) whether or not new() is called on a class or an object. We don't even need can_call_new(). Am I missing something? Too complicated or too simple? Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 -------------------------------------------------------------From murad@godel.bioc.columbia.edu Mon Dec 11 19:27:38 2000 Date: Mon, 11 Dec 2000 20:27:38 +0100 From: Murad Nayal murad@godel.bioc.columbia.edu Subject: [Bioperl-l] Bio::SeqIO::embl probelms reading swissprot
Hello All, It seems that Bio::SeqIO::embl is having some problems reading swissprot.dat file. (bioperl 0.6.2 and swissprot 38). for example it does not match alternative formats for the DR record which leads it to not instantiate the corresponding DBLink object and occasionally crash. I have started fixing some of this stuff but I thought I'd check with the list first. -is Bio::SeqIO::embl 'supposed' to be able to read swissprot? or is there another implementation of SeqIO to do that (which I couldn't find in the 0.6.2 distribution). -Have these bugs been reported/fixed already in the 0.7 distribution. -when will 0.7 be available? (is read access to CVS available now to everyone?). PS: I should mention that I am encountering these problems when accessing the .dat file via an Index! Regards -- Murad Nayal M.D. Ph.D. Department of Biochemistry and Molecular Biophysics College of Physicians and Surgeons of Columbia University 630 West 168th Street. New York, NY 10032 Tel: 212-305-6884 Fax: 212-305-6926From lapp@gnf.org Tue Dec 12 02:37:16 2000 Date: Mon, 11 Dec 2000 18:37:16 -0800 From: Hilmar Lapp lapp@gnf.org Subject: [Bioperl-l] Bio::SeqIO::embl probelms reading swissprot
Murad Nayal wrote: > > -is Bio::SeqIO::embl 'supposed' to be able to read swissprot? or is > there another implementation of SeqIO to do that (which I couldn't find > in the 0.6.2 distribution). > SeqIO::embl is supposed to parse EMBL nucleotide entries. Swissprot is parsed by SeqIO::swiss. Normally you shouldn't have to worry because the right parser is instantiated automatically by Bio::SeqIO->new(), based on what you specify for -format (Bio::SeqIO->new('-file' => "myswissprot.dat", '-format' => "swiss") would return an Bio::SeqIO compatible stream of seq objects, which is actually a Bio::SeqIO::swiss object). Have you tried to instantiate Bio::SeqIO::* classes directly, and if so, why? > -Have these bugs been reported/fixed already in the 0.7 distribution. > As I mentioned, it is not a bug. The swissprot parser as of 0.6.2 has swissprot *writing* discouraged though, due to bugs. These are fixed in the development trunk, hence also in 0.7. > -when will 0.7 be available? (is read access to CVS available now to > everyone?). > Anonymous CVS access is available, see the website (somewhere under http://www.bioperl.org). However, due to the present 0.7-related code transitions I discourage taking the current codebase for anything serious unless you know what you're taking. It may stabilize again in 2-3 weeks. 0.7 is due by the end of January at the latest. > PS: I should mention that I am encountering these problems when > accessing the .dat file via an Index! > I have no idea what effect this may have. What is the particular way you're accessing the file? Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 -------------------------------------------------------------From murad@godel.bioc.columbia.edu Mon Dec 11 20:43:05 2000 Date: Mon, 11 Dec 2000 21:43:05 +0100 From: Murad Nayal murad@godel.bioc.columbia.edu Subject: [Bioperl-l] Bio::SeqIO::embl problems reading swissprot clarification
It seems that I was a bit too quick to post. looking more carefully now I see my previous message was somewhat inaccurate (sorry about that, i am just starting to get familiar with bioperl). to clarify: -I used Bio::Index::EMBL to index a swissprot file and subsequently retrieve sequences from it by id. the documentation for this class specifies that it can be used to index both embl and swissprot files. There is no Bio::Index::Swiss class. -the problem i encountered arouse when I tried to retrieve sequences using a Bio::Index::EMBL object from the previously indexed file. This object uses a Bio::SeqIO::embl object to read into the sequence file. this fails with multiple errors/warnings. Apparently the thing to do is to write a Bio::Index::Swiss class (which really can be highly similar to Bio::Index::EMBL. as far as I can see a subclass of Bio::Index::EMBL that overrides EMBL::_file_format() to get it to return "swiss" might be sufficient? this should lead to the usage of Bio::SeqIO::swiss instead for input (no need to modify Bio::SeqIO::embl). -Has something like that been done in 0.7? again, of course I might be missing something! I appreciate all your comments/corrections Regards Murad Nayal wrote: > > Hello All, > > It seems that Bio::SeqIO::embl is having some problems reading > swissprot.dat file. (bioperl 0.6.2 and swissprot 38). for example it > does not match alternative formats for the DR record which leads it to > not instantiate the corresponding DBLink object and occasionally crash. > I have started fixing some of this stuff but I thought I'd check with > the list first. > > -is Bio::SeqIO::embl 'supposed' to be able to read swissprot? or is > there another implementation of SeqIO to do that (which I couldn't find > in the 0.6.2 distribution). > > -Have these bugs been reported/fixed already in the 0.7 distribution. > > -when will 0.7 be available? (is read access to CVS available now to > everyone?). > > PS: I should mention that I am encountering these problems when > accessing the .dat file via an Index! > > Regards > > -- > Murad Nayal M.D. Ph.D. > Department of Biochemistry and Molecular Biophysics > College of Physicians and Surgeons of Columbia University > 630 West 168th Street. New York, NY 10032 > Tel: 212-305-6884 Fax: 212-305-6926 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l -- Murad Nayal M.D. Ph.D. Department of Biochemistry and Molecular Biophysics College of Physicians and Surgeons of Columbia University 630 West 168th Street. New York, NY 10032 Tel: 212-305-6884 Fax: 212-305-6926From bradmars@yahoo.com Tue Dec 12 02:59:04 2000 Date: Mon, 11 Dec 2000 18:59:04 -0800 (PST) From: Bradley Marshall bradmars@yahoo.com Subject: [Bioperl-l] game.pm filehandle/chunkable update
I've updated the game.pm module to correctly (I think) support filehandles. You can now do: $in = Bio::SeqIO::game->new(-file=>$file, -format=>'game'); OR: $in = Bio::SeqIO::game->new(-fh=>$file, -format=>'game'); It also recognizes the top-level tag: <bx-game:flavor>chunkable</bx-game:chunkable> To qualify as being chunkable, a document must be laid out as such: <bx-seq:seq bx-seq:id='seq1'> </bx-seq:seq> ALL bx-annotation, bx-feature and bx-computation objects regarding seq1. <bx-seq:seq bx-seq:id='seq2'> </bx-seq:seq> ALL bx-annotation, bx-feature and bx-computation objects regarding seq2. NOTE: You can have NON TOP LEVEL sequences and still be "chunkable", only top-level sequences will be parsed. ie: <bx-seq:seq bx-seq:id='seq1'> </bx-seq:seq> <bx-annotation:annotation seq=seq1> <bx-seq:seq id='seq2'> </bx-seq> </bx-annotation:annotation> counts as being chunkable, but seq2 will be ignored by next_seq. This revision is more memory intensive, since it loads the data from <bx-seq:seq> to the next <bx-seq:seq> into memory as a string and then parses that. Non-chunkable documents are loaded into memory in their entirety before being parsed. FUTURE GOALS: - Allow people to use other parsers (ie ones that don't need expat, which is non-CPAN). These will have to be detected at run-time. That's all for now... Happy parsing! Brad __________________________________________________ Do You Yahoo!? Yahoo! Shopping - Thousands of Stores. Millions of Products. http://shopping.yahoo.com/From lapp@gnf.org Tue Dec 12 03:13:15 2000 Date: Mon, 11 Dec 2000 19:13:15 -0800 From: Hilmar Lapp lapp@gnf.org Subject: [Bioperl-l] Bio::SeqIO::embl problems reading swissprot clarification
Murad Nayal wrote: > > It seems that I was a bit too quick to post. looking more carefully now > I see my previous message was somewhat inaccurate (sorry about that, i > am just starting to get familiar with bioperl). > > to clarify: > > -I used Bio::Index::EMBL to index a swissprot file and subsequently > retrieve sequences from it by id. the documentation for this class > specifies that it can be used to index both embl and swissprot files. > There is no Bio::Index::Swiss class. > > -the problem i encountered arouse when I tried to retrieve sequences > using a Bio::Index::EMBL object from the previously indexed file. This > object uses a Bio::SeqIO::embl object to read into the sequence file. > this fails with multiple errors/warnings. > > Apparently the thing to do is to write a Bio::Index::Swiss class (which > really can be highly similar to Bio::Index::EMBL. as far as I can see a > subclass of Bio::Index::EMBL that overrides EMBL::_file_format() to get > it to return "swiss" might be sufficient? this should lead to the usage > of Bio::SeqIO::swiss instead for input (no need to modify > Bio::SeqIO::embl). > > -Has something like that been done in 0.7? > Not yet. As what you're describing sounds like a bug, could you file a bug report through the web-interface, copying your email as message body? Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 -------------------------------------------------------------From lapp@gnf.org Tue Dec 12 03:15:20 2000 Date: Mon, 11 Dec 2000 19:15:20 -0800 From: Hilmar Lapp lapp@gnf.org Subject: [Bioperl-l] game.pm filehandle/chunkable update
Sounds great Brad, thanks. (Is there already a test-suite, and if not, could you put together one when you have some spare time?) Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 -------------------------------------------------------------From bradmars@yahoo.com Tue Dec 12 04:02:55 2000 Date: Mon, 11 Dec 2000 20:02:55 -0800 (PST) From: Bradley Marshall bradmars@yahoo.com Subject: [BioXML-dev] Re: [Bioperl-l] game.pm filehandle/chunkable update
Yup, It's at t/game.t Brad --- Hilmar Lapp <lapp@gnf.org> wrote: > Sounds great Brad, thanks. (Is there already a > test-suite, and if not, > could you put together one when you have some spare > time?) > > Hilmar > -- > ------------------------------------------------------------- > Hilmar Lapp email: > lapp@gnf.org > GNF, San Diego, Ca. 92121 phone: > +1-858-812-1757 > ------------------------------------------------------------- > _______________________________________________ > BioXML-dev mailing list - BioXML-dev@bioxml.org > http://bioxml.org/mailman/listinfo/bioxml-dev __________________________________________________ Do You Yahoo!? Yahoo! Shopping - Thousands of Stores. Millions of Products. http://shopping.yahoo.com/From dblock@gene.pbi.nrc.ca Tue Dec 12 05:57:09 2000 Date: Mon, 11 Dec 2000 23:57:09 -0600 (CST) From: David Block dblock@gene.pbi.nrc.ca Subject: [Bioperl-l] Problems with revcom and translate in PrimarySeqI
On Mon, 11 Dec 2000, Hilmar Lapp wrote: > > This should work (it actually does on my machine) whether or not new() is > called on a class or an object. We don't even need can_call_new(). I agree with this. Be strict in what you emit, tolerant in what you accept (Larry Wall). can_call_new seems to be irrelevant with this code. Can we prune it - or I guess deprecate it. That eliminates a lot of duplicate code. Am I > missing something? Too complicated or too simple? > > Hilmar > > -- David Block dblock@gene.pbi.nrc.ca http://bioinfo.pbi.nrc.ca/dblock/wiki Plant Biotechnology Institute National Research Council of Canada Saskatoon, SaskatchewanFrom hlapp@gmx.net Tue Dec 12 06:20:00 2000 Date: Mon, 11 Dec 2000 22:20:00 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Re: Bio::DB::WebSeqDB and Bio::DB::GenBank
Jason Stajich wrote: > > BTW, have we ever thought about adding a ASN1.1 reader to SeqIO? I'm not > sure how useful, but would best way to get data + features from NCBI rather > than the sometime unhappy GenBank formats that come out of NCBI. This may > be a beast to write though so I'm not sure it is really that important. > ASN.1 has been NCBI's format of choice for years, long before they even thought about adopting XML. Their ASN.1 schema is probably also much stabler than the XML equivalent. If we only had a parser for it. There is even a module out on CPAN: http://search.cpan.org/doc/GBARR/Convert-ASN1-0.07/lib/Convert/ASN1.pod I don't know how useful this could be. Any people on the list with experience or feelings in this regard? Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From Francis Ouellette
On Mon, 11 Dec 2000, Hilmar Lapp wrote: > > ASN.1 has been NCBI's format of choice for years, long before they even > thought about adopting XML. Their ASN.1 schema is probably also much > stabler than the XML equivalent. If we only had a parser for it. There > is even a module out on CPAN: > http://search.cpan.org/doc/GBARR/Convert-ASN1-0.07/lib/Convert/ASN1.pod > > I don't know how useful this could be. Any people on the list with > experience or feelings in this regard? It should be noted that ASN.1 is a much richer format than GB/EMBL, and it can hold many types of anotations not present in the GB/EMBL format ... for example, things like 1) alignments or 2) quality of base call (from Ace/phred output). Here we store all of our data in binary asn.1 in house (saves space) and can then write out anything to what ever format ... (typically GB or FASTA, but we can invent our formats as well, like we are working on for SNPs, who also come into our system in ASN.1) There is obviously a cost at doing this (you need to work with the ncbi toolkit is the major one), but you gain from inheriting 12 years of code developed by pretty good programmers (like using bioperl I guess :-) There are converters out there (asn<->xml) and one need not dwelve into asn.1 world if yu don't want to, but understanding it, and working with it will give you access to a richer data format and richer data model ... my Can$0.02 f. -- | B.F. Francis Ouellette Tel: (604) 875-3815 | | Director, Bioinformatics Core Facility Fax: (425) 740-6978 | | CMMT, UBC, Canada http://www.cmmt.ubc.ca | | francis@cmmt.ubc.ca http://www.bioinformatics.ca |From hlapp@gmx.net Tue Dec 12 07:50:02 2000 Date: Mon, 11 Dec 2000 23:50:02 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Re: [Bioperl-guts-l] Notification: incoming/872
bioperl-bug-admin@bioperl.org wrote: > > JitterBug notification > > new message incoming/872 > > Message summary for PR#872 > From: murad@godel.bioc.columbia.edu > Subject: Bio::Index::EMBL can't read swissprot files > Date: Mon, 11 Dec 2000 22:59:45 -0500 > 0 replies 0 followups > > ====> ORIGINAL MESSAGE FOLLOWS <==== > > >From murad@godel.bioc.columbia.edu Mon Dec 11 22:59:45 2000 > Received: from localhost (localhost [127.0.0.1]) > by pw600a.bioperl.org (8.9.3/8.9.3) with ESMTP id WAA31898 > for <bioperl-bugs@pw600a.bioperl.org>; Mon, 11 Dec 2000 22:59:45 -0500 > Date: Mon, 11 Dec 2000 22:59:45 -0500 > From: murad@godel.bioc.columbia.edu > Message-Id: <200012120359.WAA31898@pw600a.bioperl.org> > To: bioperl-bugs@bioperl.org > Subject: Bio::Index::EMBL can't read swissprot files > > Full_Name: Murad Nayal > Module: Bio::Index::EMBL > Version: 0.1 > OS: IRIX > Submission from: godel.bioc.columbia.edu (156.111.6.57) > > Below is the email message describing the problem. > I am also including, at the end of the email, > a suggested module. Bio::Index::Swiss that I > adapted from Bio::Index::EMBL. This modules seems > to solve the problem mentioned. > Sorry for not giving you the right information. Such a module is indeed missing from the 0.6.x branch, but it is there in the development trunk, so it has been solved already some time ago. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From heikki@ebi.ac.uk Tue Dec 12 10:11:28 2000 Date: Tue, 12 Dec 2000 10:11:28 +0000 From: Heikki Lehvaslaiho heikki@ebi.ac.uk Subject: [Bioperl-l] LiveSeq
Ewan Birney wrote: > > Joseph - > > Thanks for your commits. > > I think it would be nice Joseph to talk through your objects first on the > list and get feedback, in particular for the naming of your objects. Once > commited we can't do a great deal about changing their names. > > This is partly my and partly heikki's fault for not making sure that you > talked things through on the list first. Don't lose sleep over it, but be > aware that you are now working in a collaborative enviroment, and you > should post to the list before making large commits. > > I think a nice post about what LiveSeq is and what it does to the list > would be a good thing as well. My apologies for not announcing this to the mailing list beforehand. I had to be away yesterday and I had been putting pressure on Joseph to commit his code for so long that he very dutifully did it on first possible occasion after getting his login. LiveSeq has actually been announced but so long ago that no one can remember (see below for the text). Now that the files are in the repository, we'll start cleaning the code and adding tests. Any suggestions on naming and improving the code are welcome, -Heikki Subject: [Bioperl-l] Bio::Variation committed [Project: Computational Mutation Expression Toolkit] Date: Wed, 19 Jul 2000 14:35:46 +0100 From: Heikki Lehvaslaiho <heikki@ebi.ac.uk> Hi, I just committed Bio::Variation files to bioperl-live. All in all 10 classes and 185 tests in seven t files. 'cvs update -d' and you can see them all. Easiest way of getting grips with what these classes can do is to have a look at test input files, e.g. t/mutation.dat. You'll need Bio::LiveSeq to actually generate the information from sequences, but those files are not committed, yet. -Heikki Heikki Lehvaslaiho wrote: > > Dear Bioperlers, > > We'd like to announce a project to add classes into Bioperl to handle > sequence variations. > > The Computational Mutation Expression Toolkit project consists of two > sets of Bioperl classes: > > Bio::LiveSeq > > Bio::LiveSeq name space contains a set of modules to read in (EMBL > formatted) sequences and create a data structure (double linked list) > to represent the DNA sequence. A model of an eucaryotic gene is built > by creating virtual exon, transcript and translation objects which are > dependant on the DNA sequence object. This novel strategy allows us to > handle biological sequences in a way that makes it extremely easy to > deal with sequence variations and coordinate system changes. The > results can be written out as Bio::Variation objects. > > Bio::Variation > > Bio::Variation name space contains modules to store sequence variation > information as differences between the reference sequence and changed > sequences. Also included are classes to write out and recrete objects > from EMBL-like flat files and XML. Lastly, there are simple classes to > create some sequence change objects. At the moment, we do not have > methods to create bioperl sequence objects from diffs but they should > be easy to add. > > We've set up web pages with more information about the project: > > http://www.ebi.ac.uk/mutations/toolkit/ > > We'll start committing code to CVS shortly (or after summer holidays). > > Heikki Lehvaslaiho & Joseph Insana > heikki@ebi.ac.uk insana@ebi.ac.uk >From birney@ebi.ac.uk Tue Dec 12 10:21:16 2000 Date: Tue, 12 Dec 2000 10:21:16 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] final proposal: Bio::DB::WebSeqDBI
On Mon, 11 Dec 2000, Jason Stajich wrote: > The final proposal before I commit the code (all tests pass on my > machine). > > 2 new modules > Bio::DB::WebSeqDBI - ISA Bio::DB::RandomAccessI > Bio::DB::NCBIHelper ISA Bio::DB::WebSeqDBI > > rewrites of Bio::DB::GenBank, Bio::DB::GenPept, Bio::DB::SwissProt. > > Bio::DB::WebSeqI - > > This interface encapsulates the standard data retrieval methods from a > Web Sequence Database. Implementing classes must implement the method > get_request while takes as arguments a hash > of qualifiers - uids, format, etc with which to query the database and > returns a HTTP::Request object. The WebSeqDBI class manages a > LWP::UserAgent for obtaining data from the web dbs and turning that data > stream into a Bio::SeqIO. > > Because of the way LWP works right now, it is not possible to take a data > stream from webserver and transform it into a Bio::SeqIO, rather, one must > read all the data from the server and then either store that in a tempfile > or transform it into a IO::String which can be treated as a filehandle. > Also a pain, the retrieval method from NCBI has some HTML 'contamination' > which needs to be screened out through a method call to postprocess_data. > > One issue I am not sure how to best deal with, the temporary file removal > at the end of the life of the Bio::DB::WebSeqDBI object. The following > code illustrates a case this will remove files too soon. > > my $seqdb = new Bio::DB::Genbank(-retrievaltype=>'tempfile'); > my $seqio = $seqdb->get_Stream_by_id($accession); > undef $seqdb; # this will remove the seqdb object and cleanup the > # tempfile that was created > my $seq = $seqio->next_seq(); # bomb because no file exists now. > > Anyone with better ideas on this feel free to let me know. > > Bio::DB::NCBIHelper - > > Since the Bio::DB::GenBank and Bio::DB::GenPept are so similar I wrote a > class that encapsulates all the of common functionality for retrieving > sequence data from these databases. > > I'm sure it will all make much more sense once I check the code in, I just > wanted to check and see if anyone has comments or wants clarification > before I checkin major reworks to the current modules. > > Is the name WebSeqDBI misleading - (ie looks like it would be a DBI > module...?) We like to use 'I' at the end of a module name to denote > interfaces. I know where you are coming from, but I do think we have to do something different here in the naming. WebDBSeqI ? > > -Jason > Jason Stajich > jason@chg.mc.duke.edu > Center for Human Genetics > Duke University Medical Center > http://www.chg.duke.edu/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From birney@ebi.ac.uk Tue Dec 12 10:25:39 2000 Date: Tue, 12 Dec 2000 10:25:39 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Bio::SeqIO::embl problems reading swissprot clarification
On Mon, 11 Dec 2000, Hilmar Lapp wrote: > > Not yet. As what you're describing sounds like a bug, could you file a bug > report through the web-interface, copying your email as message body? > Sounds to me like we need to make the easy-to-write Bio::Index::Swiss module. ;) > Hilmar > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp@gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From insana@ebi.ac.uk Tue Dec 12 12:26:16 2000 Date: Tue, 12 Dec 2000 12:26:16 +0000 (GMT) From: Joseph Insana insana@ebi.ac.uk Subject: [Bioperl-l] LiveSeq committed modules
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi. As Heikki explained in his email I committed the modules in bona fide, since a shared environment was needed to continue the work on this project and BioPerl seemed the good place for this. Quoting from the bioperl.org/UserInfo webpage, I may be considered a bioperl or "unofficial" member]s[ who have standalone bio-related perl code that they wish to place under CVS [...] As I wrote in the log, the tests would have followed. And - as I'll write them - they will effectively follow, together with sample scripts and such. I have also been asked to clean up the code and the purpose of the CVS should be to facilitate that process. Additional information on what the LiveSeq is (and can do) can be found at http://www.ebi.ac.uk/mutations/toolkit and all the methods should have their spice of pod documentation. So here we go, thanks for the warm welcome. Bye, Joseph Insana - -- http://www.ebi.ac.uk/~insana -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.2 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE6Nhl0VyhgTs2EgB4RAkKSAKDDYw2hAN/3Bvjni4R0nLKqMBLl3wCgxRQ/ Rdwdqz6tov7fRAHYheDSg/A= =R3DN -----END PGP SIGNATURE-----From gordonp@niji.imb.nrc.ca Tue Dec 12 13:55:57 2000 Date: Tue, 12 Dec 2000 09:55:57 -0400 (AST) From: Paul Gordon gordonp@niji.imb.nrc.ca Subject: [Bioperl-l] Re: Bio::DB::WebSeqDB and Bio::DB::GenBank
> ASN.1 has been NCBI's format of choice for years, long before they even > thought about adopting XML. Their ASN.1 schema is probably also much > stabler than the XML equivalent. If we only had a parser for it. There > is even a module out on CPAN: > http://search.cpan.org/doc/GBARR/Convert-ASN1-0.07/lib/Convert/ASN1.pod > > I don't know how useful this could be. Any people on the list with > experience or feelings in this regard? I've tried it previously, albeit not in _great_ detail, but it didn't seem to handle the comprehensive NCBI ASN.1 specification and is a bit hackish in dealing with macros (or whatever the ASN.1 term is that I can't remember). I don't know if it would be easy to rescue or not... ________________________________________________________________________ Paul Gordon Paul.Gordon@nrc.ca Genomic Technologies http://maggie.cbr.nrc.ca Institute for Marine Biosciences National Research Council CanadaFrom lewisg@mail.nih.gov Tue Dec 12 14:56:16 2000 Date: Tue, 12 Dec 2000 09:56:16 -0500 From: Geer, Lewis (NCBI) lewisg@mail.nih.gov Subject: [Bioperl-l] Re: Bio::DB::WebSeqDB and Bio::DB::GenBank
The sequence XML DTD is derived algorithmically from the ASN.1 specification, so barring bugs, the XML DTD should be as stable as the ASN.1 specification. However, brand-new XML DTDs, like the simplified BLAST output in standalone BLAST, are not based on a existing ASN.1 specification and can change at a faster clip. If you want to know which XML DTD is new and which is not, send email or post. Lewis > -----Original Message----- > From: Paul Gordon [mailto:gordonp@niji.imb.nrc.ca] > Sent: Tuesday, December 12, 2000 8:56 AM > To: Bioperl > Subject: Re: [Bioperl-l] Re: Bio::DB::WebSeqDB and Bio::DB::GenBank > > > > ASN.1 has been NCBI's format of choice for years, long > before they even > > thought about adopting XML. Their ASN.1 schema is probably also much > > stabler than the XML equivalent. If we only had a parser > for it. There > > is even a module out on CPAN: > > > http://search.cpan.org/doc/GBARR/Convert-ASN1-0.07/lib/Convert > /ASN1.pod > > > > I don't know how useful this could be. Any people on the list with > > experience or feelings in this regard? > > I've tried it previously, albeit not in _great_ detail, but > it didn't seem > to handle the comprehensive NCBI ASN.1 specification and is a > bit hackish > in dealing with macros (or whatever the ASN.1 term is that I can't > remember). I don't know if it would be easy to rescue or not... > > ______________________________________________________________ > __________ > Paul Gordon Paul.Gordon@nrc.ca > Genomic Technologies http://maggie.cbr.nrc.ca > Institute for Marine Biosciences > National Research Council Canada > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l >From murad@godel.bioc.columbia.edu Tue Dec 12 09:22:32 2000 Date: Tue, 12 Dec 2000 10:22:32 +0100 From: Murad Nayal murad@godel.bioc.columbia.edu Subject: [Bioperl-l] Alignment::Clustalw and Bio::RootI
I am trying to do something that I probably shouldn't. I am trying to combine some 0.7 modules, specifically Bio::Tools::Alignment::Clustalw and support modules, with the current 0.62 release. one immediate problem I have encountered is that Clustalw inherits from Root::RootI (not from Object). However RootI in 0.62 does not have a new method (it does in 0.7). (hence I can't instantiate a Clustalw). the questions: 1- is Root::Object being phased out? 2- how foolish is it to start mingling 0.62 and 0.7 objects. I really need a clustalw wrapper and don't want to write one since it is already available in 0.7. I suppose if I replace RootI 0.62 with RootI 0.7, then 0.62 objects (which inherit from Object) will hook into Object methods while 0.7 objects (really just Clustalw and co) will access the RootI methods!!. 3- how many times can one say object in one sentence and still have it mean something. many thanks for the feedback. -- Murad Nayal M.D. Ph.D. Department of Biochemistry and Molecular Biophysics College of Physicians and Surgeons of Columbia University 630 West 168th Street. New York, NY 10032 Tel: 212-305-6884 Fax: 212-305-6926From jason@chg.mc.duke.edu Tue Dec 12 16:39:52 2000 Date: Tue, 12 Dec 2000 11:39:52 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] Alignment::Clustalw and Bio::RootI
On Tue, 12 Dec 2000, Murad Nayal wrote: > > > I am trying to do something that I probably shouldn't. I am trying to > combine some 0.7 modules, specifically Bio::Tools::Alignment::Clustalw > and support modules, with the current 0.62 release. one immediate > problem I have encountered is that Clustalw inherits from Root::RootI > (not from Object). However RootI in 0.62 does not have a new method (it > does in 0.7). (hence I can't instantiate a Clustalw). > > the questions: > > 1- is Root::Object being phased out? Yes! > 2- how foolish is it to start mingling 0.62 and 0.7 objects. I really > need a clustalw wrapper and don't want to write one since it is already > available in 0.7. I suppose if I replace RootI 0.62 with RootI 0.7, then > 0.62 objects (which inherit from Object) will hook into Object methods > while 0.7 objects (really just Clustalw and co) will access the RootI > methods!!. Going to be very difficult if not impossible to mix versions right now. Once we started down this path, we acknowledged there would be a lot of incompatibilities between versions because of Root::Object dependence. You will have an easier time checking out the main branch and learning with that then struggling with compatibility issues. The Clustal module is just a wrapper around the clustalw program, are you trying to do something so tricky that you can't just run clustalw separately and then feed it into AlignIO (oh shoot, which is only on main-trunk too, well....). Let me know if I can help steer you in the right direction. Peter S (the author) is away on vacation for a few more weeks so I agreed to stand watch on the AlignIO and Bio::Tools::Alignment in the meantime. > > 3- how many times can one say object in one sentence and still have it > mean something. obfuscation obviates obfuscation > > > many thanks for the feedback. > > > > -- > Murad Nayal M.D. Ph.D. > Department of Biochemistry and Molecular Biophysics > College of Physicians and Surgeons of Columbia University > 630 West 168th Street. New York, NY 10032 > Tel: 212-305-6884 Fax: 212-305-6926 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From jason@chg.mc.duke.edu Tue Dec 12 16:45:16 2000 Date: Tue, 12 Dec 2000 11:45:16 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] final proposal: Bio::DB::WebSeqDBI
On Mon, 11 Dec 2000, Hilmar Lapp wrote: > Jason Stajich wrote: > > > > One issue I am not sure how to best deal with, the temporary file removal > > at the end of the life of the Bio::DB::WebSeqDBI object. The following > > code illustrates a case this will remove files too soon. > > > > my $seqdb = new Bio::DB::Genbank(-retrievaltype=>'tempfile'); > > my $seqio = $seqdb->get_Stream_by_id($accession); > > undef $seqdb; # this will remove the seqdb object and cleanup the > > # tempfile that was created > > my $seq = $seqio->next_seq(); # bomb because no file exists now. > > > > Provided that things work the same way as in e.g. C (and it ought to be so, > because it's the OS that dictates it), the tempfile should not be > physically removed as long as there is a stream (filehandle) open on it (it > may be invisible to directory listings though). Since SeqIO::* modules keep > a file handle open until $seqio->close() is called, there should be no > problem. Am I missing something? Have you tested for the behaviour your > mentioning? It always helps to test something before crying wooolfie. This is in fact not a problem since the filehandle is opened in the method to create a SeqIO, so it is only unlinked when the SeqIO object is destroyed. I have added this to the test routines as well. > > > > > Is the name WebSeqDBI misleading - (ie looks like it would be a DBI > > module...?) We like to use 'I' at the end of a module name to denote > > interfaces. > > > > I agree with David, it's somewhat misleading. I don't have a strong view > though. In general, I wouldn't have considered it as an interface anyway > (why does it qualify as one?), so why not simply omit the trailing 'I'? Ewan's suggestion BioDBSeqI has been used, since semantically it has to be an interface since no one should instantiate a WebDBSeqI object directly. I think I'm ready to finalize some documentation and commit. All tests pass here assuming NCBI is not rejecting me for too many requests during the testing phase... =) Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From jason@chg.mc.duke.edu Tue Dec 12 16:50:36 2000 Date: Tue, 12 Dec 2000 11:50:36 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] Re: Bio::DB::WebSeqDB and Bio::DB::GenBank
On Mon, 11 Dec 2000, Francis Ouellette wrote: > It should be noted that ASN.1 is a much richer format than GB/EMBL, > and it can hold many types of anotations not present in the GB/EMBL > format ... for example, things like 1) alignments or 2) quality of > base call (from Ace/phred output). > > Here we store all of our data in binary asn.1 in house > (saves space) and can then write out anything to what ever format ... > (typically GB or FASTA, but we can invent our formats as well, like we > are working on for SNPs, who also come into our system in ASN.1) > > There is obviously a cost at doing this (you need to work with the > ncbi toolkit is the major one), but you gain from inheriting 12 years > of code developed by pretty good programmers (like using bioperl I > guess :-) > > There are converters out there (asn<->xml) and one need not dwelve > into asn.1 world if yu don't want to, but understanding it, and > working with it will give you access to a richer data format and > richer data model ... > Francis - thanks for the insight. I think we should try in earnest to add functionality for reading/writing NCBI XML and/or ASN1.1 in bioperl. There are some obvious advantages and we will be able to provide a useful platform for people with ASN1.1 databases as well as cleaner data retrieval from GenBank, etc. But I think it will have to be post 0.7 since it represents a fair amount of work. I volunteer for investigating feasibilty once we have 0.7 out the door. > > f. > > -- > | B.F. Francis Ouellette Tel: (604) 875-3815 | > | Director, Bioinformatics Core Facility Fax: (425) 740-6978 | > | CMMT, UBC, Canada http://www.cmmt.ubc.ca | > | francis@cmmt.ubc.ca http://www.bioinformatics.ca | > > > > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From hlapp@gmx.net Tue Dec 12 18:55:49 2000 Date: Tue, 12 Dec 2000 10:55:49 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] GenBank XML report format & DTD
"Geer, Lewis (NCBI)" wrote: > > The sequence XML DTD is derived algorithmically from the ASN.1 > specification, so barring bugs, the XML DTD should be as stable as the ASN.1 > specification. However, brand-new XML DTDs, like the simplified BLAST > output in standalone BLAST, are not based on a existing ASN.1 specification > and can change at a faster clip. If you want to know which XML DTD is new > and which is not, send email or post. > First, I have difficulties figuring out how to make the NCBI query tools to report sequences in XML format. The documentation pages I was able to find on NCBI's website (http://www.ncbi.nlm.nih.gov/entrez/query/static/linking.html, http://www.ncbi.nlm.nih.gov/entrez/utils/utils_index.html) only mention GenBank/GenPept, FASTA, and ASN.1 report format options, and if you try, dopt=XML indeed yields an error. What did I miss? If we could retrieve XML other than through the HTML forms, we can certainly use that. Assuming the XML DTD evolves independent from the ASN.1 was of course stupid from me. Second, yes, a pointer to the most recent DTD would help, otherwise we'd have to find a way through the website (assuming that it's documented there at some place). Thanks for the post. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From hlapp@gmx.net Tue Dec 12 19:09:43 2000 Date: Tue, 12 Dec 2000 11:09:43 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] LiveSeq committed modules
Joseph Insana wrote: > > So here we go, thanks for the warm welcome. > > Bye, > Joseph Insana Thanks for warmly introducing yourself, Joseph. If you enter a room full of people, some things are just not the same as they are when you're alone. There is always a risk of stepping on other's toes. And if you do so accidentally because you didn't watch out, no-one will usually blame you. You may hear an 'Ouch' though. This doesn't mean that those people were unwilling to welcome you. Welcome in the Bioperl community, Joseph. And thanks for the submission. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From jason@chg.mc.duke.edu Tue Dec 12 19:26:40 2000 Date: Tue, 12 Dec 2000 14:26:40 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] Bio::DB::WebDBSeqI
Checked in new modules Bio::DB::WebDBSeqI, Bio::DB::NCBIHelper which provide common functionality for connecting to Webbased Sequence databases. Bio::DB::GenBank, Bio::DB::GenPept, Bio::DB::SwissProt, t/DB.t were all updated to migrate to this new code. The interface to the original modules has not changed, however, more options are supported and LWP is fully supported for those behind firewalls. For example; use Bio::DB::GenBank; my $db = new Bio::DB::GenBank; $db->ua->proxy('protocol', 'hostname'); my $seq = $db->get_Seq_by_acc($accession); Realize, as always, NCBI Entrez does not distinguish queries for unique identifiers (genbankid) and accession numbers so the routines get_Seq_by_id() or get_Seq_by_acc() are identitical in this implementation. Bio::DB::SwissProt talks to expasy right now, but other swissprot providers could be added. Also, about temporary files. I am using File::Temp, which behaves wonderfully on my solaris machines. I'd appreciate those with different architectures testing out and letting me know if we are having any problems. I have also updated Bio::Seq::LargePrimarySeq to use File::Temp as well and find it behaves nicely. Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From lewisg@mail.nih.gov Tue Dec 12 19:34:35 2000 Date: Tue, 12 Dec 2000 14:34:35 -0500 From: Geer, Lewis (NCBI) lewisg@mail.nih.gov Subject: [Bioperl-l] RE: GenBank XML report format & DTD
Hilmar, It's not documented, unfortunately. Please send mail to info@ncbi.nlm.nih.gov if you would like to see documentation. The form of the url is given below: http://www.ncbi.nlm.nih.gov/entrez/viewer.cgi?val=5174476&db=Nucleotide&dopt =xml&txt=on where val is the gi of the sequence. viewer.cgi is the program used by the entrez query engine, query.fcgi, to display sequence records. There is a new version of viewer.cgi that takes accessions as an argument, but I am unsure of its development status. The DTDs can be found at ftp://ncbi.nlm.nih.gov/toolbox/xmlspecs/ Lewis > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp@gmx.net] > Sent: Tuesday, December 12, 2000 1:56 PM > To: Geer, Lewis (NCBI) > Cc: 'Paul Gordon'; Bioperl > Subject: GenBank XML report format & DTD > First, I have difficulties figuring out how to make the NCBI > query tools > to report sequences in XML format. The documentation pages I > was able to > find on NCBI's website > (http://www.ncbi.nlm.nih.gov/entrez/query/static/linking.html, > http://www.ncbi.nlm.nih.gov/entrez/utils/utils_index.html) > only mention > GenBank/GenPept, FASTA, and ASN.1 report format options, and > if you try, > dopt=XML indeed yields an error. What did I miss? If we could retrieve > XML other than through the HTML forms, we can certainly use that. > Assuming the XML DTD evolves independent from the ASN.1 was of course > stupid from me. > > Second, yes, a pointer to the most recent DTD would help, > otherwise we'd > have to find a way through the website (assuming that it's documented > there at some place). Thanks for the post. > > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- >From hlapp@gmx.net Tue Dec 12 19:44:58 2000 Date: Tue, 12 Dec 2000 11:44:58 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Alignment::Clustalw and Bio::RootI
Murad Nayal wrote: > > 2- how foolish is it to start mingling 0.62 and 0.7 objects. I really Very foolish. Just don't. If 0.7 is too far away for you and you do need features of the main trunk I suggest you use the date option of cvs and checkout a copy of the main trunk before say Nov 10th (Jason, when did we start the RootI transition?). It used to be pretty stable before the current transition. Getting the cvs checkout options right (I'd have to figure out myself) might take you some time, but mingling the two branches will take you 10x longer. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From hlapp@gmx.net Tue Dec 12 20:01:04 2000 Date: Tue, 12 Dec 2000 12:01:04 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Bio::DB::WebDBSeqI
Jason Stajich wrote: > > Also, about temporary files. I am using File::Temp, which behaves Seems to be another external dependency. I don't have it in Perl 5.005_03. Did it come with the core 5.6 dist? Since we're piling up external depencies anyway, just add it to Makefile.PL ... A general question to people on the list: what is your general feeling about the number of external dependencies? Do you feel fine with a growing number, or does it scare you, do you think it might significantly heighten the barrier to using the package, or is it just normal for you that installing one package means installing another ten. Do you think maintaining a collection of the respective external modules on the bioperl FTP site mitigates the additional trouble, or are you afraid that these most of the time will be outdated? Does it make sense to have someone responsible for keeping that stuff up-to-date? I think Chris has done this so far, voluntarily. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From ajm6q@virginia.edu Tue Dec 12 19:54:43 2000 Date: Tue, 12 Dec 2000 14:54:43 -0500 (EST) From: Aaron J Mackey ajm6q@virginia.edu Subject: [Bioperl-l] Bio::DB::WebDBSeqI
On Tue, 12 Dec 2000, Jason Stajich wrote: > Checked in new modules Bio::DB::WebDBSeqI, Bio::DB::NCBIHelper > which provide common functionality for connecting to Webbased Sequence > databases. > > Bio::DB::GenBank, Bio::DB::GenPept, Bio::DB::SwissProt, t/DB.t were all > updated to migrate to this new code. Thank you for doing this, we'll all appreciate it greatly. > use Bio::DB::GenBank; > my $db = new Bio::DB::GenBank; > $db->ua->proxy('protocol', 'hostname'); Excellent interface decision (to just use UserAgent's), I think. > Also, about temporary files. I am using File::Temp, which behaves > wonderfully on my solaris machines. I'd appreciate those with different > architectures testing out and letting me know if we are having any > problems. I've been pondering how to use UserAgent's callback mechanism to implement "What We Really Want". You could do it if you didn't mind forking and using shared memory (via IPC::Shareable or somesuch): have one process which executes the request with a callback that captures the data for one sequence, stores it away in shared memory, and then loops/waits until that sequence is used (the memory is cleared), after which it again collects enough data for another sequence, loops/waits etc. The other process implements next_seq(), and grabs the sequence data from shared memory, clears shared memory, and builds the Seq object. next_seq() would potentially have to wait until the shared memory sequence data is marked "ready", and there's other timing issues you'd need to keep track of but it wouldn't be that hard. Of course this wouldn't work anytime soon on Win32 or Mac ports. There might be other ways of doing IPC to get around using shared memory (bidirectional communication between the two processes in a server/client mode, etc), but the idea is the same: a "server" process which reads enough data for one sequence and then stalls until another is requested. Coding the beast is left as an exercise ;) -AaronFrom jason@chg.mc.duke.edu Tue Dec 12 20:08:06 2000 Date: Tue, 12 Dec 2000 15:08:06 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] Alignment::Clustalw and Bio::RootI
This will get you state of the main trunk as of Nov 12. cvs -d :pserver:cvs@cvs.bioperl.org:/home/repository co -D 2000-11-12 bioperl-live Should be recent enough to have all the changes. On Tue, 12 Dec 2000, Hilmar Lapp wrote: > Murad Nayal wrote: > > > > 2- how foolish is it to start mingling 0.62 and 0.7 objects. I really > > Very foolish. Just don't. > > If 0.7 is too far away for you and you do need features of the main > trunk I suggest you use the date option of cvs and checkout a copy of > the main trunk before say Nov 10th (Jason, when did we start the RootI > transition?). It used to be pretty stable before the current transition. > Getting the cvs checkout options right (I'd have to figure out myself) > might take you some time, but mingling the two branches will take you > 10x longer. > > Hilmar > > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From jason@chg.mc.duke.edu Tue Dec 12 20:09:29 2000 Date: Tue, 12 Dec 2000 15:09:29 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] Bio::DB::WebDBSeqI
It is a standard CPAN module. Not sure that it comes standard with perl 5.6.0. but I added the dependency to Makefile.PL. Sorry, I know that adding extra modules is annoying, but it makes coding much cleaner, IMHO. As for them being out of date, we could write some code to insure that our copies are up-to-date via a cron job and the CPAN module, but that may be more than necessary. A bioperl bundle in CPAN perhaps? -Jason On Tue, 12 Dec 2000, Hilmar Lapp wrote: > Jason Stajich wrote: > > > > Also, about temporary files. I am using File::Temp, which behaves > > Seems to be another external dependency. I don't have it in Perl > 5.005_03. Did it come with the core 5.6 dist? > > Since we're piling up external depencies anyway, just add it to > Makefile.PL ... > > A general question to people on the list: what is your general feeling > about the number of external dependencies? Do you feel fine with a > growing number, or does it scare you, do you think it might > significantly heighten the barrier to using the package, or is it just > normal for you that installing one package means installing another ten. > Do you think maintaining a collection of the respective external modules > on the bioperl FTP site mitigates the additional trouble, or are you > afraid that these most of the time will be outdated? > > Does it make sense to have someone responsible for keeping that stuff > up-to-date? I think Chris has done this so far, voluntarily. > > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From dblock@gene.pbi.nrc.ca Tue Dec 12 20:24:15 2000 Date: Tue, 12 Dec 2000 14:24:15 -0600 (CST) From: David Block dblock@gene.pbi.nrc.ca Subject: [Bioperl-l] Bio::DB::WebDBSeqI
On Tue, 12 Dec 2000, Hilmar Lapp wrote: > Jason Stajich wrote: > > > > Also, about temporary files. I am using File::Temp, which behaves > > Seems to be another external dependency. I don't have it in Perl > 5.005_03. Did it come with the core 5.6 dist? > > Since we're piling up external depencies anyway, just add it to > Makefile.PL ... > > A general question to people on the list: what is your general feeling > about the number of external dependencies? Do you feel fine with a > growing number, or does it scare you, do you think it might > significantly heighten the barrier to using the package, or is it just > normal for you that installing one package means installing another ten. For those of us with broadband access and big hard drives, it is little more than a momentary inconvenience. CPAN is one of the great strengths of perl, and if we can leverage other people's work, so much the better. However, at 28.8 kbps and with a small hard drive, what are you doing in bioinformatics? I'd rather not re-invent any more wheels than I have to. > Do you think maintaining a collection of the respective external modules > on the bioperl FTP site mitigates the additional trouble, or are you > afraid that these most of the time will be outdated? > A clear description of how to use CPAN would be all that's really necessary, don't you think? > Does it make sense to have someone responsible for keeping that stuff > up-to-date? I think Chris has done this so far, voluntarily. > Better to have a minimum requirement, and not worry about this week's beta release. > Hilmar > (my $0.02 Cdn) -- David Block dblock@gene.pbi.nrc.ca http://bioinfo.pbi.nrc.ca/dblock/wiki Plant Biotechnology Institute National Research Council of Canada Saskatoon, SaskatchewanFrom murad@godel.bioc.columbia.edu Tue Dec 12 14:18:29 2000 Date: Tue, 12 Dec 2000 15:18:29 +0100 From: Murad Nayal murad@godel.bioc.columbia.edu Subject: [Bioperl-l] Alignment::Clustalw and Bio::RootI
> > > 2- how foolish is it to start mingling 0.62 and 0.7 objects. I really > > need a clustalw wrapper and don't want to write one since it is already > > available in 0.7. I suppose if I replace RootI 0.62 with RootI 0.7, then > > 0.62 objects (which inherit from Object) will hook into Object methods > > while 0.7 objects (really just Clustalw and co) will access the RootI > > methods!!. > > Going to be very difficult if not impossible to mix versions right now. > Once we started down this path, we acknowledged there would be a lot of > incompatibilities between versions because of Root::Object dependence. You > will have an easier time checking out the main branch and learning with > that then struggling with compatibility issues. The Clustal module is > just a wrapper around the clustalw program, are you trying to do something > so tricky that you can't just run clustalw separately and then feed it > into AlignIO (oh shoot, which is only on main-trunk too, well....). > > Let me know if I can help steer you in the right direction. Peter S (the > author) is away on vacation for a few more weeks so I agreed to stand > watch on the AlignIO and Bio::Tools::Alignment in the meantime. Thanks Jason, I wasn't sure of the maturity of the 0.7 code so the question was which option will give me the least amount of trouble. I actually tried briefly to use RootI 0.7 in place of RootI 0.62 and it seems to work for the examples I tried. the reason I think this is so is that the 0.62 objects simply don't see the new RootI as Root::Object stands in the way. in any event. I will retreat to 0.62 or some stable version of the main truck the minute problems start arising. needless to say, waiting anxiously for the release of 0.7 :-) MuradFrom cstrassel@netgenics.com Tue Dec 12 20:31:26 2000 Date: Tue, 12 Dec 2000 15:31:26 -0500 From: Strassel, Chris cstrassel@netgenics.com Subject: [Bioperl-l] (no subject)
Hi all, I've been learning the modules for parsing genbank records. Pretty impressive. I am about to make a couple of additions to provide some functionality I need, and wanted to ask a couple questions before I begin... I have seen some postings about parsing fuzzy locations for features, but I get the impression that this isn't a function that exists yet. Can anyone confirm? Same for sequence versions (i.e. the gene index number on the version line). The author line(s) are not currently parsed. Has anyone done/tried to do this? Comments would be appreciated. Finally, I am unsure about how to go about adding functionality. Create my own objects that inherit from the bioperl objects? Add functions directly to the bioperl objects? Something else? Thanks in advance, ChrisFrom murad@godel.bioc.columbia.edu Tue Dec 12 14:27:05 2000 Date: Tue, 12 Dec 2000 15:27:05 +0100 From: Murad Nayal murad@godel.bioc.columbia.edu Subject: [Bioperl-l] Alignment::Clustalw and Bio::RootI
Hilmar Lapp wrote: > > Murad Nayal wrote: > > > > 2- how foolish is it to start mingling 0.62 and 0.7 objects. I really > > Very foolish. Just don't. > > If 0.7 is too far away for you and you do need features of the main > trunk I suggest you use the date option of cvs and checkout a copy of > the main trunk before say Nov 10th (Jason, when did we start the RootI > transition?). It used to be pretty stable before the current transition. > Getting the cvs checkout options right (I'd have to figure out myself) > might take you some time, but mingling the two branches will take you > 10x longer. > I know it sounds like a heap of trouble. I will probably end up getting the Nov 10 version of the main trunk as you suggested. thanks for looking into it. MuradFrom Brian.Osborne@osip.com Tue Dec 12 20:40:48 2000 Date: Tue, 12 Dec 2000 20:40:48 -0000 From: Osborne, Brian Brian.Osborne@osip.com Subject: [Bioperl-l] Bio::DB::WebDBSeqI
Bioperl, > Do you think maintaining a collection of the respective external modules > on the bioperl FTP site mitigates the additional trouble, or are you > afraid that these most of the time will be outdated? Yes, I am afraid of that. In addition, I think of CPAN as one of Perl's great strengths. 'perl -e shell -MCPAN' is the best installation program I've ever used, I think. Brian O. -----Original Message----- From: David Block [mailto:dblock@gene.pbi.nrc.ca] Sent: Tuesday, December 12, 2000 3:24 PM To: Hilmar Lapp Cc: Jason Stajich; Bioperl Subject: Re: [Bioperl-l] Bio::DB::WebDBSeqI On Tue, 12 Dec 2000, Hilmar Lapp wrote: > Jason Stajich wrote: > > > > Also, about temporary files. I am using File::Temp, which behaves > > Seems to be another external dependency. I don't have it in Perl > 5.005_03. Did it come with the core 5.6 dist? > > Since we're piling up external depencies anyway, just add it to > Makefile.PL ... > > A general question to people on the list: what is your general feeling > about the number of external dependencies? Do you feel fine with a > growing number, or does it scare you, do you think it might > significantly heighten the barrier to using the package, or is it just > normal for you that installing one package means installing another ten. For those of us with broadband access and big hard drives, it is little more than a momentary inconvenience. CPAN is one of the great strengths of perl, and if we can leverage other people's work, so much the better. However, at 28.8 kbps and with a small hard drive, what are you doing in bioinformatics? I'd rather not re-invent any more wheels than I have to. > Do you think maintaining a collection of the respective external modules > on the bioperl FTP site mitigates the additional trouble, or are you > afraid that these most of the time will be outdated? > A clear description of how to use CPAN would be all that's really necessary, don't you think? > Does it make sense to have someone responsible for keeping that stuff > up-to-date? I think Chris has done this so far, voluntarily. > Better to have a minimum requirement, and not worry about this week's beta release. > Hilmar > (my $0.02 Cdn) -- David Block dblock@gene.pbi.nrc.ca http://bioinfo.pbi.nrc.ca/dblock/wiki Plant Biotechnology Institute National Research Council of Canada Saskatoon, Saskatchewan _______________________________________________ Bioperl-l mailing list Bioperl-l@bioperl.org http://bioperl.org/mailman/listinfo/bioperl-lFrom Malcolm.Cook@ppgx.com Tue Dec 12 20:53:45 2000 Date: Tue, 12 Dec 2000 12:53:45 -0800 From: Malcolm Cook Malcolm.Cook@ppgx.com Subject: [Bioperl-l] Bio::DB::WebDBSeqI
> A general question to people on the list: what is your general feeling > about the number of external dependencies? > Do you feel fine with a > growing number, yes - if they are in CPAN, that's all I need. Why is bioperl not in CPAN? > or does it scare you, do you think it might > significantly heighten the barrier to using the package, More like heighten the shoulders of the giants you're standing on > or is it just > normal for you that installing one package means installing another ten. yes, and there are some modules that automate this. The CPAN module will offer to automatically install dependent modules (if they are declared some way, I'm not sure how, worth looking into). THus, all i've had to issue is one command: install module, and all the extra modules are installed too (of course this runs the risk of breaking code that wants specific older version of modules - yech). > Do you think maintaining a collection of the respective external modules > on the bioperl FTP site mitigates the additional trouble, no - use CPAN for this. You can require versions of modules if need be. > or are you > afraid that these most of the time will be outdated? > yup. > Does it make sense to have someone responsible for keeping that stuff > up-to-date? I think Chris has done this so far, voluntarily. > > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l >From dag@sonsorol.org Tue Dec 12 21:10:51 2000 Date: Tue, 12 Dec 2000 16:10:51 -0500 From: chris dagdigian dag@sonsorol.org Subject: [Bioperl-l] external module dependencies in bioperl (was Re: [Bioperl-l] Bio::DB::WebDBSeqI)
I have updated our external/ FTP area at ftp://bioperl.org/pub/external/ so that it contains the File::Temp module. This is a FTP directory that I manually maintain so that it may not be always up to date. It is present for convenience only...people should check for the availability of newer versions. I've also updated our 'externals' web page at http://bioperl.org/Core/external.shtml to reflect this new 07 dependency. This is where I have been trying to track and explain all of our various dependencies. My $.02 is that external dependencies are fine as long as we document the need for them well. The LWP stuff is a perfect example of when and where it makes sense to rely on other people's code rather than try to roll our own HTTP handling code. That being said-- if File::Temp is so great we should be consistent about it....in release 07 all of our modules that need to create temporary files in a portable way should be changed so that they use File::Temp. -Chris -- Chris Dagdigian (Home:Work) Blackstone Technology Group dag@sonsorol.org : dagdigian@ComputeFarm.com http://www.sonsorol.org : http://www.computefarm.com http://open-bio.org : Mobile (617) 877-5498 -- Schedule & full contact info http://www.sonsorol.org/dag/contact.html --From jason@chg.mc.duke.edu Tue Dec 12 21:15:48 2000 Date: Tue, 12 Dec 2000 16:15:48 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] (no subject)
Chris - We'd love to have your additions to the module(s) if they are general purpose. There are a couple of options depending on how deep you want to wade into the bioperl code pool. You can obviously make the changes you need by inheriting from the Bio::SeqIO::genbank module and adding extra functionality and going on your merry way at your own location. If it is general purpose functionality (better author parsing, fuzzy feature location, etc) then we'd enjoy having it to be added to the main branch because otherwise someone else has to write this eventually... If you do this, your coding needs to be additions/corrections to a recent version of the module(s) checked out via cvs (see http://cvs.bioperl.org for info if you haven't already). You can send your changes as diffs to the list and someone with a developer account can volunteer to make the changes - or probably more effective, send it to the release coordinator (as we are working towards the 0.7 release of bioperl) Hilmar Lapp <hlapp@gmx.net>, or to the bioperl project coordinator Ewan Birney (birney@ebi.ac.uk), or to me. Usually we will look over the code, bless it as okay, and commit it to the main trunk. If you think you might be contributing more that 1 or 2 bug fixes to the bioperl project, you could also consider becoming a bioperl developer and getting an account on our development server. If that suits your fancy let Ewan know. Glad to have you involved. -Jason On Tue, 12 Dec 2000, Strassel, Chris wrote: > Hi all, > > I've been learning the modules for parsing genbank records. Pretty > impressive. I am about to make a couple of additions to provide some > functionality I need, and wanted to ask a couple questions before I begin... > > I have seen some postings about parsing fuzzy locations for features, but I > get the impression that this isn't a function that exists yet. Can anyone > confirm? > > Same for sequence versions (i.e. the gene index number on the version line). > > The author line(s) are not currently parsed. Has anyone done/tried to do > this? Comments would be appreciated. > > Finally, I am unsure about how to go about adding functionality. Create my > own objects that inherit from the bioperl objects? Add functions directly to > the bioperl objects? Something else? > > Thanks in advance, > Chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From todd@andrew2.stanford.edu Wed Dec 13 00:05:52 2000 Date: Tue, 12 Dec 2000 16:05:52 -0800 From: Todd Richmond todd@andrew2.stanford.edu Subject: [Bioperl-l] External dependencies [was Bio::DB::WebDBSeqI]
On 12/12/00 12:01 PM, "Hilmar Lapp" <hlapp@gmx.net> wrote: > A general question to people on the list: what is your general feeling > about the number of external dependencies? Do you feel fine with a > growing number, or does it scare you, do you think it might > significantly heighten the barrier to using the package, or is it just > normal for you that installing one package means installing another ten. Speaking as someone who uses the bioperl package on a Mac - every external dependency you add is one more thing that's likely to break for me. Anything that has to be compiled is especially bad unless binaries are made available, since most Mac users don't have a compiler handy (and even if they do it's not as simple as "make"). Hopefully most of these concerns will go away with OSX, but they are definitely a concern right now. -- Dr Todd Richmond http://cellwall.stanford.edu/todd Carnegie Institution email: todd@andrew2.stanford.edu Department of Plant Biology fax: 1-650-325-6857 260 Panama Street phone: 1-650-325-1521 x431 Stanford, CA 94305From Eugene.Leitl@lrz.uni-muenchen.de Tue Dec 12 23:52:29 2000 Date: Wed, 13 Dec 2000 00:52:29 +0100 (CET) From: Eugene.Leitl@lrz.uni-muenchen.de Eugene.Leitl@lrz.uni-muenchen.de Subject: [Bioperl-l] Genomics Gets a New Code: GEML
http://www.wired.com/news/print/0,1294,40621,00.html Genomics Gets a New Code: GEML by Kristen Philipkoski 2:00 a.m. Dec. 12, 2000 PST The Internet uses HTML, and soon perhaps genomics will use GEML. At least that's what Rosetta Inpharmatics (RSTA), the creators of Genetic Expression Markup Language, or GEML, is hoping. The prestigious science journal Nature adopted the language on Monday, which should give a significant boost to its acceptance in the scientific community. See also: Myhrvold: Genomics Will Rule Genome Map Heralds Cheap Drugs Genetic Data Glut Looms Gene Researchers Get SNPpy Check yourself into Med-Tech Standardization of data is a big worry for genetic researchers at the moment, with the unprecedented glut of information generated by the Human Genome Project, an effort to locate every human gene. A working draft of the map was completed in June. The project has spawned over 400 individual databases at companies and academic institutions, containing information about the jobs that genes and proteins perform -- data that researchers need to share and exchange in order to make discoveries that will benefit human health. "It's not who's got the best technology, but who knows best how to share the information," said Friedrich von Bohlen, CEO of Lion Bioscience (LEON) at a conference in October. "We have to integrate all of the types of data in the world and in the end bring intelligence to the system." GEML is a standardized format that helps scientists do just that. In November, Rosetta launched the GEML community, a group of organizations -- including Harvard University, Agilent Technologies, Spotfire, and Europroteome -- that will develop and promote the language. "Standardization of gene expression data sets is necessary for both the exchange and publication of genomic research," said Annett Thomas, managing director at the Nature Publishing Group, in a statement. Nature and its sister publication Nature Genetics have published some of the most cutting edge genetic research. The GEML format is designed to consistently label genetic information coming from biochips -- chips that can show researchers tens of thousands of genes at a time, and point out which are active. Companies like Affymetrix and Agilent have developed biochips that can look at up to 60,000 genes at a time. Other companies have their own solutions to the standardization problem. Lion Bioscience has its own standardization platform, and IBM's life sciences unit is working on a product called DiscoveryLink -- a virtual database that will allow scientists to mine information from different types of files, from graphic to database to text, to find genetic or protein information. Physiome Sciences has developed a similar technology using an XML-based language called CellML. It helps researchers create models of living systems to predict which drugs will work before they begin clinical trials. Scientists can create a mathematical representation of any type of cell -- from heart, to lung, to kidney -- and perform simulations to test drugs. According to Metcalfe's law, penned by 3Com founder Robert Metcalfe, the more people who use any system, the more valuable it becomes. And since researchers will now be required to submit papers using GEML, the value of the language should increase exponentially.From hlapp@gmx.net Wed Dec 13 07:48:50 2000 Date: Tue, 12 Dec 2000 23:48:50 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Bio::DB::WebDBSeqI
"Osborne, Brian" wrote: > > Bioperl, > > > Do you think maintaining a collection of the respective external modules > > on the bioperl FTP site mitigates the additional trouble, or are you > > afraid that these most of the time will be outdated? > > Yes, I am afraid of that. In addition, I think of CPAN as one of Perl's > great > strengths. 'perl -e shell -MCPAN' is the best installation program I've > ever used, I think. > I agree. Just tried it, it's excellent. It upgraded on-the-fly my File::Spec upon unmet version required by File::Temp. Now I have File::Spec::tmpdir(), too. Cool. This should make LWP installation and the tens of packages it depends on a smooth ride. I can imagine that this is also superior to downloading all packages from a bioperl FTP site, unless you don't have internet connection (but then, how do you download bioperl?). Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From krbou@pgsgent.be Wed Dec 13 08:03:47 2000 Date: Wed, 13 Dec 2000 09:03:47 +0100 From: Kris Boulez krbou@pgsgent.be Subject: [Bioperl-l] Bio::DB::WebDBSeqI
Quoting Hilmar Lapp (hlapp@gmx.net): > > A general question to people on the list: what is your general feeling > about the number of external dependencies? Do you feel fine with a > growing number, or does it scare you, do you think it might > significantly heighten the barrier to using the package, or is it just > normal for you that installing one package means installing another ten. > Do you think maintaining a collection of the respective external modules > on the bioperl FTP site mitigates the additional trouble, or are you > afraid that these most of the time will be outdated? > What I find a problem with CPAN modules is that it is not clear what these modules in their turn depend on (BioPerl needs module A, A needs B). It is also sometimes hard to find which module a certain module is located (finding IO::Scalar in IO-stringy-1.211 is not straightforward). I do think that we should not depend on external packages like we do now with 'expat' which is needed for one of the XML modules. There are companies where only the installation of CPAN modules is allowed. P.S. I know about CPAN.pm, but had some bad experiences with it :( Kris,From hlapp@gmx.net Wed Dec 13 08:45:52 2000 Date: Wed, 13 Dec 2000 00:45:52 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Re: GenBank XML report format & DTD
"Geer, Lewis (NCBI)" wrote: > > It's not documented, unfortunately. Please send mail to > info@ncbi.nlm.nih.gov if you would like to see documentation. The form of > the url is given below: > http://www.ncbi.nlm.nih.gov/entrez/viewer.cgi?val=5174476&db=Nucleotide&dopt > =xml&txt=on Thanks for the hint. The actual nucleotide sequence is bit-encoded, right? (Easy to decode, sure.) Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From Jean-Marc.BONNEVILLE@ujf-grenoble.fr Wed Dec 13 10:54:51 2000 Date: Wed, 13 Dec 2000 11:54:51 +0100 From: Jean-Marc BONNEVILLE Jean-Marc.BONNEVILLE@ujf-grenoble.fr Subject: [Bioperl-l] Celera deal
Dear Mrs Jasny, I am writing to you as the managing editor in charge of genomics paper at Science. I heard Science has agreed to publish the Celera sequence data on the human genome without them being deposited in the GeneBank database. I am shocked by this deal, and here is why. I am a bench scientist working in plant biology. We published last year the description of some Arabidopsis EMS mutants controlling endoreplication. We now have a T-DNA insertion mutant allelic to one of these EMS mutants, and the insertion dirupts a gene whose best homolog is a human gene. This 3-month old finding was totally unexpected, and I learned that from the very first Blast. I would not know that by today if the human gene in question had not been deposited in A PUBLIC SEQUENCE DATABASE WITH A SINGLE GATE. My conviction is that if money considerations lead to the fact that sequences will have to be searched in a daedalus of semi-private Websites, the kind of approach I am presently involved in will be considerably slowed down. The deal you have with Celera does not deserve science and does not honor Science. I think the non-deposition of sequence data in a public database is a strong case for rejection of a scientific paper, and I hope the referees will raise this fact with no precedent, and that they can stand some pressure... Let me quote to finish the french writer Francois Rabelais, who wrote four centuries ago: "Science sans conscience n'est que ruine de l'ame." Best regards -- Jean-Marc Bonneville ============================================================================== Dr Jean-Marc Bonneville Laboratoire de Génétique Moléculaire des Plantes CNRS/Universite J. Fourier BP 53 38041 GRENOBLE Cedex 09 FRANCE tel (+33)4 76 51 48 92 fax (+33)4 76 51 43 36From birney@ebi.ac.uk Wed Dec 13 10:45:23 2000 Date: Wed, 13 Dec 2000 10:45:23 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Bio::DB::WebDBSeqI
On Tue, 12 Dec 2000, Malcolm Cook wrote: > > A general question to people on the list: what is your general feeling > > about the number of external dependencies? > > Do you feel fine with a > > growing number, > > yes - if they are in CPAN, that's all I need. > > Why is bioperl not in CPAN? Bioperl is in CPAN - has been for a long time. ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From birney@ebi.ac.uk Wed Dec 13 10:49:35 2000 Date: Wed, 13 Dec 2000 10:49:35 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] External dependencies [was Bio::DB::WebDBSeqI]
On Tue, 12 Dec 2000, Todd Richmond wrote: > On 12/12/00 12:01 PM, "Hilmar Lapp" <hlapp@gmx.net> wrote: > > > A general question to people on the list: what is your general feeling > > about the number of external dependencies? Do you feel fine with a > > growing number, or does it scare you, do you think it might > > significantly heighten the barrier to using the package, or is it just > > normal for you that installing one package means installing another ten. > > Speaking as someone who uses the bioperl package on a Mac - every external > dependency you add is one more thing that's likely to break for me. Anything > that has to be compiled is especially bad unless binaries are made > available, since most Mac users don't have a compiler handy (and even if > they do it's not as simple as "make"). Hopefully most of these concerns will > go away with OSX, but they are definitely a concern right now. Most (?all) our dependencies are pure perl modules, so they should work for Macs... I personally believe that it is ok to have dependencies on thing like LWP, File::Temp etc but not ok to have dependencies on New::SuperClever::MySQL::AutoBinder or something else which is likely to be more buggy than bioperl ;) I reckon we have the balance good at the moment > > -- > Dr Todd Richmond http://cellwall.stanford.edu/todd > Carnegie Institution email: todd@andrew2.stanford.edu > Department of Plant Biology fax: 1-650-325-6857 > 260 Panama Street phone: 1-650-325-1521 x431 > Stanford, CA 94305 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From Brian.Osborne@osip.com Wed Dec 13 13:44:25 2000 Date: Wed, 13 Dec 2000 13:44:25 -0000 From: Osborne, Brian Brian.Osborne@osip.com Subject: [Bioperl-l] Bio::DB::WebDBSeqI
Kris, >>P.S. I know about CPAN.pm, but had some bad experiences with it :( That's odd. I use CPAN.pm and the CPAN "shell" frequently, and I've never seen it fail in finding and installing "dependent" modules. This is on Linux. And you're always upgrading to the latest CPAN as well? Brian O. -----Original Message----- From: Kris Boulez [mailto:krbou@pgsgent.be] Sent: Wednesday, December 13, 2000 3:04 AM To: Hilmar Lapp Cc: Jason Stajich; Bioperl Subject: Re: [Bioperl-l] Bio::DB::WebDBSeqI Quoting Hilmar Lapp (hlapp@gmx.net): > > A general question to people on the list: what is your general feeling > about the number of external dependencies? Do you feel fine with a > growing number, or does it scare you, do you think it might > significantly heighten the barrier to using the package, or is it just > normal for you that installing one package means installing another ten. > Do you think maintaining a collection of the respective external modules > on the bioperl FTP site mitigates the additional trouble, or are you > afraid that these most of the time will be outdated? > What I find a problem with CPAN modules is that it is not clear what these modules in their turn depend on (BioPerl needs module A, A needs B). It is also sometimes hard to find which module a certain module is located (finding IO::Scalar in IO-stringy-1.211 is not straightforward). I do think that we should not depend on external packages like we do now with 'expat' which is needed for one of the XML modules. There are companies where only the installation of CPAN modules is allowed. P.S. I know about CPAN.pm, but had some bad experiences with it :( Kris, _______________________________________________ Bioperl-l mailing list Bioperl-l@bioperl.org http://bioperl.org/mailman/listinfo/bioperl-lFrom Brian.Osborne@osip.com Wed Dec 13 14:05:11 2000 Date: Wed, 13 Dec 2000 09:05:11 -0500 From: Osborne, Brian Brian.Osborne@osip.com Subject: [Bioperl-l] Celera deal
Bioperl, Jean-Marc adds : >> "Science sans conscience n'est que ruine de l'ame." One way to translate this is might be "To do science without awareness is to you lose your soul." Brian O. -----Original Message----- From: Jean-Marc BONNEVILLE [mailto:Jean-Marc.BONNEVILLE@ujf-grenoble.fr] Sent: Wednesday, December 13, 2000 5:55 AM To: bjasny@aaas.org Cc: Bioperl-l@bioperl.org; Jean-Marc.Bonneville@ujf-grenoble.fr Subject: [Bioperl-l] Celera deal Dear Mrs Jasny, I am writing to you as the managing editor in charge of genomics paper at Science. I heard Science has agreed to publish the Celera sequence data on the human genome without them being deposited in the GeneBank database. I am shocked by this deal, and here is why. I am a bench scientist working in plant biology. We published last year the description of some Arabidopsis EMS mutants controlling endoreplication. We now have a T-DNA insertion mutant allelic to one of these EMS mutants, and the insertion dirupts a gene whose best homolog is a human gene. This 3-month old finding was totally unexpected, and I learned that from the very first Blast. I would not know that by today if the human gene in question had not been deposited in A PUBLIC SEQUENCE DATABASE WITH A SINGLE GATE. My conviction is that if money considerations lead to the fact that sequences will have to be searched in a daedalus of semi-private Websites, the kind of approach I am presently involved in will be considerably slowed down. The deal you have with Celera does not deserve science and does not honor Science. I think the non-deposition of sequence data in a public database is a strong case for rejection of a scientific paper, and I hope the referees will raise this fact with no precedent, and that they can stand some pressure... Let me quote to finish the french writer Francois Rabelais, who wrote four centuries ago: "Science sans conscience n'est que ruine de l'ame." Best regards -- Jean-Marc Bonneville ============================================================================ == Dr Jean-Marc Bonneville Laboratoire de Génétique Moléculaire des Plantes CNRS/Universite J. Fourier BP 53 38041 GRENOBLE Cedex 09 FRANCE tel (+33)4 76 51 48 92 fax (+33)4 76 51 43 36 _______________________________________________ Bioperl-l mailing list Bioperl-l@bioperl.org http://bioperl.org/mailman/listinfo/bioperl-lFrom krbou@pgsgent.be Wed Dec 13 14:47:28 2000 Date: Wed, 13 Dec 2000 15:47:28 +0100 From: Kris Boulez krbou@pgsgent.be Subject: [Bioperl-l] Bio::DB::WebDBSeqI
Quoting Osborne, Brian (Brian.Osborne@osip.com): > Kris, > > >>P.S. I know about CPAN.pm, but had some bad experiences with it :( > > That's odd. I use CPAN.pm and the CPAN "shell" frequently, and I've never > seen it fail in finding and installing "dependent" modules. This is on > Linux. And you're always upgrading to the latest CPAN as well? > I tried it out a few weeks ago and found that all of a sudden it started upgrading my perl from 5.005_03 to 5.6, without I asking for it. Perhaps I'm just getting an achronism :) P.S. After having read a bit through the doc for CPAN.pm, I wonder if it wouldn't make sense to make a Bundle of all the modules needed for Bioperl (or does this happen automatically when trying to install bioperl using CPAN.pm ?). Kris,From Jonathan_Epstein@nih.gov Wed Dec 13 17:21:29 2000 Date: Wed, 13 Dec 2000 12:21:29 -0500 From: Jonathan Epstein Jonathan_Epstein@nih.gov Subject: [Bioperl-l] XBLAST replacement?
Does anyone have a BioPerl-friendly replacement for XBLAST? http://analysis.molbiol.ox.ac.uk/pise_html/xblast.html http://bioweb.pasteur.fr/docs/man/man/xblast.1.html The idea is to mask out one domain in a set of selected sequences so that one can then perform a more sensitive search on a second domain. It doesn't look too hard to implement, but thought I would ask before reinventing the wheel. TIA, -Jonathan Jonathan Epstein Jonathan_Epstein@nih.gov Head, Unit on Biologic Computation (301)402-4563 Office of the Scientific Director Bldg 31, Room 2A47 Nat. Inst. of Child Health & Human Development 31 Center Drive National Institutes of Health Bethesda, MD 20892From lewisg@mail.nih.gov Wed Dec 13 17:43:26 2000 Date: Wed, 13 Dec 2000 12:43:26 -0500 From: Geer, Lewis (NLM/NCBI) lewisg@mail.nih.gov Subject: [Bioperl-l] Re: GenBank XML report format & DTD
Yes, the sequence is bit encoded, but it shouldn't be. I've asked the developer if he could fix it and he's looking into it -- there is a performance issue (but not with the decoding algorithm itself). Lewis > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp@gmx.net] > Sent: Wednesday, December 13, 2000 3:46 AM > To: Geer, Lewis (NLM/NCBI) > Cc: Bioperl > Subject: [Bioperl-l] Re: GenBank XML report format & DTD > > > "Geer, Lewis (NCBI)" wrote: > > > > It's not documented, unfortunately. Please send mail to > > info@ncbi.nlm.nih.gov if you would like to see > documentation. The form of > > the url is given below: > > > http://www.ncbi.nlm.nih.gov/entrez/viewer.cgi?val=5174476&db=N > ucleotide&dopt > > =xml&txt=on > > Thanks for the hint. The actual nucleotide sequence is bit-encoded, > right? (Easy to decode, sure.) > > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l >From hlapp@gmx.net Wed Dec 13 17:56:42 2000 Date: Wed, 13 Dec 2000 09:56:42 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] External dependencies [was Bio::DB::WebDBSeqI]
Ewan Birney wrote: > > > > Speaking as someone who uses the bioperl package on a Mac - every external > > dependency you add is one more thing that's likely to break for me. Anything > > that has to be compiled is especially bad unless binaries are made > > available, since most Mac users don't have a compiler handy (and even if > > they do it's not as simple as "make"). Hopefully most of these concerns will > > go away with OSX, but they are definitely a concern right now. > > Most (?all) our dependencies are pure perl modules, so they should work > for Macs... > Well, the expat library is compiled I guess, and XML::Parser depends on it, that is, all modules reading XML. Has anyone managed to get expat built on Mac? Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From schattner@alum.mit.edu Wed Dec 13 20:00:11 2000 Date: Wed, 13 Dec 2000 12:00:11 -0800 From: Peter Schattner schattner@alum.mit.edu Subject: [Bioperl-l] Developing Improved Bioperl Documentation
Hello Kris & Brian I gather that the two of you have volunteered to develop improved documentation for (release 0.7 of) Bioperl. I would be glad to contribute to this important task assuming you could still use some help. I think that a good starting point would be a proposed outline / table-of-contents for the documentation. That would make it clear what topics would be covered and should make it easier for people to sign up to write different sections. Has either of you written such an outline? Or have you been envisioning a different way of organizing the work? If you have written an outline, could you send me a copy? If not (and you agree that having an outline might be useful), I’d be glad to take a shot a writing one. Let me know what you think. Cheers Peter Schattner (PS Sorry for not contacting you quicker but I was out of town for several weeks and away from my computer :-) . However, I’m back and have time available now.)From birney@ebi.ac.uk Wed Dec 13 20:52:32 2000 Date: Wed, 13 Dec 2000 20:52:32 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Gene Interface discussion
Hilmar and myself have a had a quick trans-timezone phone call yesterday (Hilmar at the end of his day and myself at the start) to bash out the Gene interface. Key points - (a) We call the interface GeneStructureI as "Gene" is a bit too ambiguous (b) GeneStructureI has-a array of TranscriptI (c) TranscriptI has-a array of ExonI There is a method to get out all the Exons from a GeneStructure which must be equivalent to getting all the Exons out of a Transcript and making that list unique on the start/end/strand of the exons (d) GeneStructureI and TranscriptI inheriet from SeqFeatureI The definition of SeqFeatureI is extended somewhat. SeqFeatureI objects are now allowed to have component SeqFeatures (sub_SeqFeature call) which are on different sequences to the parent SeqFeature. There is a new method to SeqFeatureI - is_single_sequence which returns TRUE if all component SeqFeatures are on the same sequence as this SeqFeature, and returns FALSE if not. This will allow clients to easily find (and possibly skip) these "expanded" seq features. The ->start and ->end calls on a non single sequence composite seqfeature should return the start and end point of the component sequence features which lie on the "focus" sequence of this seqfeature (ie, whatever ->entire_seq and ->seqname implies). Clients should be aware that when is_single_sequence == 0, concepts like "overlap" and "length" are not necessarily easy to define or interpret. This is for the client code to deal with. (apologies to the clients - we can't do much more for you inside the objects) EMBL/GenBank join(AL00012:120..123,1..5) should be parsed into a SeqFeature::Generic structure supporting the above calls. (e) The interface definition does not indicate where or how additional information (annotation, dblink) is stored. This is left up to implementations to add if wished, for example, inherieting off the DBLinkContainerI interface Here is the complete proposal (Note to Hilmar - I have just dreamt up this business of dealing with the difference between utr/cds/all exons being arguments to the exon call. An alternative could be methods exon, cds, utr. Feel free to complain loudly). All interfaces in the Bio::SeqFeature:: namespace GeneStructureI - inheriets from SeqFeatureI (inherieted methods, start,end,strand,seq,entire_seq,seqname,primary_tag,source_tag is_single_sequence, sub_SeqFeatures); Notes: sub_SeqFeatures must delegate to ->transcripts. : primary_tag must be 'genestructure' methods # returns an array of TranscriptI @transcripts = $gs->transcripts(); # returns an array of exons. Allowed arguments 'all','cds','utr' # this call must be equivalent to # foreach $t ( $gs->transcripts() ) { # get exons, make unique start/end/strand # } @exons = $gs->exons('all'); @cds = $gs->exons('cds'); @utr = $gs->exons('utr'); # GeneStructureI must implement this, even if it returns an empty list @promotors = $gs->promotors(); # could be empty # GeneStructureI must implement this, even if it returns an empty list @polya = $gs->polya(); # could be empty TranscriptI - inheriets from SeqFeatureI (inherieted methods, start,end,strand,seq,entire_seq,seqname,primary_tag,source_tag is_single_sequence, sub_SeqFeatures); Notes : sub_SeqFeatures delegates to ->exons('all'),promotor,polya; : primary_tag must be 'transcript' # returns an array of exons. Allowed arguments 'all','cds','utr' @exons = $tr->exons('all'); @cds = $tr->exons('cds'); @utr = $tr->exons('utr'); $promotor = $tr->promotor(); # could be undef, meaning unknown $polya = $tr->polya(); # could be undef, meaning unknown Transcript must have the following two methods $transcript->cdna(); # returns a Bio::PrimarySeqI of the cDNA $transcript->protein(); # returns a Bio::PrimarySeqI of the protein ExonI - inheriets from SeqFeatureI, cannot be composite, primary_tag must return one of 'exon' or 'cds' or 'utr' There are no additional requirements to the ExonI interface, though of course, implementations may require their own system To Do list: (a) discuss this proposal. Sane? Any more issues to be worked out? I am not 100% on the exons('argument') style call. The exon primary_tag is actually a hard thing to provide. Should the primary_tag change depending on the argument - this is very nasty for the implementation objects. (b) figure out how to get these things in and out of EMBL/GenBank format without loss of information (c) Ditto with GAME Implementations: Hilmar/Ewan to do bioperl implementations Hilmar to do bioperl parsing modules Ewan/Hilmar to do the interfaces files Ewan to do Ensembl definitions when appropiate (when Ensembl moves to bioperl 0.7 compliancy) Ewan/Jason to look at EMBL/GenBank dumping issues Brad to look at Game dumping/reading issues ewan ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From M.W.E.J.Fiers@plant.wag-ur.nl Wed Dec 13 23:28:19 2000 Date: Thu, 14 Dec 2000 00:28:19 +0100 From: Fiers, M.W.E.J. M.W.E.J.Fiers@plant.wag-ur.nl Subject: [Bioperl-l] Computation.pm
Hi I've commited a Bio::SeqFeature::Computation.pm to the cvs. It is a variation on the Generic.pm SeqFeature object but more geared towards containing the results of a computation result. A big goal of writing this is also to make the game export more straightforward. Since I'm kind of new to this, I hope I did everything alright. Biggest changes are * The subSeqFeature's are grouped into subset's. i.e. $feat->add_sub_SeqFeature($subfeat,'repeat') * There is a data structure exactly like tag, but named score * It contains a computation_id (like game does) * Lot of items like program_name and database_version added It is, as far as I've tested, completely compatible with the Generic object, meaning that score and sub_SeqFeature can be used as in the Generic. I still have to write the .t for this Greetings Mark FiersFrom david.lapointe@umassmed.edu Thu Dec 14 00:52:59 2000 Date: Wed, 13 Dec 2000 19:52:59 -0500 From: David Lapointe david.lapointe@umassmed.edu Subject: [Bioperl-l] Genomics Gets a New Code: GEML
Is this the same project as at the NCGR ? See http://www.ncgr.org/research/genex/geml.html. > The Internet uses HTML, and soon perhaps genomics will use GEML. > At least that's what Rosetta Inpharmatics (RSTA), the creators of > Genetic Expression Markup Language, or GEML, is hoping. The > prestigious science journal Nature adopted the language on Monday, > which should give a significant boost to its acceptance in the > scientific community. -- .david David Lapointe If you're not living on the edge, you're taking up too much space.From lapp@gnf.org Thu Dec 14 01:48:07 2000 Date: Wed, 13 Dec 2000 17:48:07 -0800 From: Hilmar Lapp lapp@gnf.org Subject: [Bioperl-l] Computation.pm
"Fiers, M.W.E.J." wrote: > > Hi > > I've commited a Bio::SeqFeature::Computation.pm to the cvs. It is a > variation on the Generic.pm SeqFeature object but more geared towards > containing the results of a computation result. > A big goal of writing this is also to make the game export more > straightforward. > > Since I'm kind of new to this, I hope I did everything alright. > > Biggest changes are > * The subSeqFeature's are grouped into subset's. i.e. > $feat->add_sub_SeqFeature($subfeat,'repeat') > * There is a data structure exactly like tag, but named score > * It contains a computation_id (like game does) > * Lot of items like program_name and database_version added > Thanks for the submission Mark. Have you checked against Bio::Tools::AnalysisResult? This one is supposed to be the base class for analysis result parsers. It's not a feature though. Do we really want to have all-encompassing feature objects? Could it be smarter to have features which can have an object attached that describes their computational origin? I personally tend to prefer to have classes as slim as they can be, and rather have more classes and a more complex object tree (objects having several other objects attached) which of course increases the traversing complexity. In addition I think we need to keep the number of classes in Bio::SeqFeature as comprehensive as possible, because people using the package will try to comprehend which class to use for what. That is, the extent of overlap between classes should be as little as possible. In the case of SeqFeature::Computation this would mean that _every_ feature object derived from a computation shall inherit from it (rather than some do and some deliberately don't). What do people feel about these aspects? Do you disagree, do you think this would lead us towards imposing to strict requirements on module implementors? In other words, what degree of controlling the bazaar's growth is sensible? Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 -------------------------------------------------------------From todd@andrew2.stanford.edu Thu Dec 14 03:52:43 2000 Date: Wed, 13 Dec 2000 19:52:43 -0800 From: Todd Richmond todd@andrew2.stanford.edu Subject: [Bioperl-l] External dependencies and Macs [long]
On 12/13/00 2:49 AM, "Ewan Birney" <birney@ebi.ac.uk> wrote: > Most (?all) our dependencies are pure perl modules, so they should work > for Macs... Pure perl <=> fully portable > I personally believe that it is ok to have dependencies on thing like LWP, > File::Temp etc but not ok to have dependencies on > New::SuperClever::MySQL::AutoBinder or something else which is likely to > be more buggy than bioperl ;) Let me give you an example of what us poor Mac users are up against. Here's a brief blow-by-blow of my attempt to get the current external dependencies installed on my machine: 1st Pass: External Module Ace, Aceperl, is not installed on this computer. The Bio::DB::Ace in Bioperl needs it for access of ACeDB database External Module File::Temp, Temporary File creation, is not installed on this computer. The in Bioperl needs it for Bio::DB::WebDBSeqI, Bio::Seq::LargePrimarySeq External Module XML::Writer, Parsing + writing of XML documents, is not installed on this computer. The Bio::SeqIO::game,Bio::Variation::* in Bioperl needs it for Bio::Variation code, GAME parser External Module IO::Scalar, IO handle to read or write to a scalar, is not installed on this computer. The Bio::Tools::Blast::Run::Webblast in Bioperl needs it for remote http Blast jobs External Module IO::String, IO handle to read or write to a string, is not installed on this computer. The Bio::DB::*,Bio::Variation::*,Bio::Tools::Blast::Run::Webblast in Bioperl needs it for GenBank+GenPept sequence retrieval, Variation code External Module XML::Parser::PerlSAX, Parsing of XML documents, is not installed on this computer. The Bio::SeqIO::game,Bio::Variation::* in Bioperl needs it for Bio::Variation code, GAME parser 0) Attempt to use CPAN. Quit because it tries to install Perl 5.6 before it will do anything else. Since Perl 5.6 is available for Macs, abort this and do this stuff manually. 1) Manually go gather the necessary files from CPAN a) unpack the files with Chris Nandor's untarzipme.plx because Stuffit chokes on some tar.z files with a strange disk error. Installation is done using Chris's installme.plx which uses Makefile.PL. No simple "make","make test","make install" for us poor Mac users. Test scripts have to be run manually one by one. 2) First, install File::Temp. Should be easy, only one little file... a) unpack distribution, make, and try the tests b) discover that I need Test.pm (not part of the standard Mac installation; only Test::Harness is) c) download Test-1.15 d) discover that Test-1.15 needs Perl 5.004_05 e) manually edit this out - since MacPerl is sitting at 5.004 f) manually install Test.pm, since Makefile.PL fails on Mac... g) try test on Test.pm -> this fails -> Test::Harness version 1.1601 required--this is only version 1.1502. h) try to get Test::Harness 1.1601. Discover that I have to download whole perl 5.004_05 distribution because it doesn't come separately i) run tests - success! j) run tests on File::Temp. Fails because it requires Perl 5.005. Hand-edit the change to require 5.004... k) run tests on File::Temp. Fails because File::Spec version 0.8 required--this is only version 0.7. l) grab File::Spec 0.8 and install. Passes the appropriate tests (who cares about VMS?) m) run test on File::Temp. What the hell is Errno.pm? Something required by File::Spec. Grab that and install it. Oops - fails in a messy way. Requires a compiler. Well maybe things will work okay without it. Of course, all the tests for File::Spec fail, which mean the tests for File::Temp fail. Move on. 3) Try installing IO::Scalar a) unpack distribution (IO::Stringy), make, and try tests b) Not so good... 4 out of 18 tests for IO::Scalar fail (specifically 5,6,17, and 18 in case anyone cares) 4) How about IO::String? a) unpack distribution (IO::String), manually install, and try tests b) run tests - Fails -> Perl 5.00503 required--this is only version 5.004, stopped. c) manually edit to require 5.004 instead d) Try again - all tests fail -> "Can't locate object method "TIEHANDLE" via package "IO::String=GLOB(0xa1f6568)". Is this why 5.00503 is required? 5) How about XML::Writer? a) unpack distribution (XML-Writer-0.4), make, and try tests b) Everything works! 6) How about XML::Parser::PerlSAX? a) unpack distribution (XML-Writer-0.4), make, and try tests b) 3 out of 45 tests fail... 7) How about Ace? a) unpack distribution (AcePerl-1.67), make, and try tests b) Hmmm, guess I should take the pure Perl option and skip the options that require a compiler c) Initial failure at generating the makefile, change "ace" folder name to "Ace", retry d) Generates Makefile but apparently in nonstandard format, as the Mac installation process won't do anything with it. Manually move the files after consulting the makefile. e) Tests? Well I don't intend to use the AceDB stuff and the network connection stuff doesn't work, so we'll skip these. 8) Okay - let's try the bioperl stuff again External Module File::Temp, Temporary File creation, is not installed on this computer. The in Bioperl needs it for Bio::DB::WebDBSeqI, Bio::Seq::LargePrimarySeq ## Small note - it is installed! How does bioperl check for dependencies? ## Checking if your kit is complete... Warning: the following files are missing in your kit: :Bio:LiveSeq:AARange.pm :Bio:LiveSeq:Analyser.pm :Bio:LiveSeq:Chain.pm :Bio:LiveSeq:ChainI.pm :Bio:LiveSeq:DNA.pm :Bio:LiveSeq:Exon.pm :Bio:LiveSeq:Gene.pm :Bio:LiveSeq:IO:BioPerl.pm :Bio:LiveSeq:IO:Loader.pm :Bio:LiveSeq:IO:SRS.pm :Bio:LiveSeq:Intron.pm :Bio:LiveSeq:Mutation.pm :Bio:LiveSeq:Mutator.pm :Bio:LiveSeq:Prim_Transcript.pm :Bio:LiveSeq:Range.pm :Bio:LiveSeq:Repeat_Region.pm :Bio:LiveSeq:Repeat_Unit.pm :Bio:LiveSeq:SeqI.pm :Bio:LiveSeq:Transcript.pm :Bio:LiveSeq:Translation.pm Please inform the author. Keep in mind that this is just prepping for the bioperl install. Only one of the six external dependencies installs correctly and passes all tests. I'll spare the list all of the failed tests after you've actually installed bioperl - it isn't pretty ... ToddFrom M.W.E.J.Fiers@plant.wag-ur.nl Thu Dec 14 04:01:54 2000 Date: Thu, 14 Dec 2000 05:01:54 +0100 From: Fiers, M.W.E.J. M.W.E.J.Fiers@plant.wag-ur.nl Subject: [Bioperl-l] Computation.pm
Hi Hilmar, I have seen the Bio::Tools::Analysis object and thought it to be a class which the parsers inherit (as you say) and not so much a class to actually contain the results of the computation. But correct me if I am wrong. The computation object has the advantage that it can store more complex sets of results. You can, for example, store a set of exons and introns and retrieve them separatly. There are also advantages to having all computation results inheriting from the same class, it will make parsing of the objects much more straigthforward. A large advantage when producing, for example, game output, and there is a lot of standard information related to analysis results. If you do not accept the computation object, it might be an option to merge some of its structures to SeqFeature::Generic and others to the AnalysisResult object, just to have somewhat more uniformity to storage? I haven't had a change to take an in depth look at the Analysisresult object, so maybe my arguments are false. I won't be able to read my mail before monday, so answers could be slow. I do agree to slim and comprehensible class sets. Thanks, Mark "Fiers, M.W.E.J." wrote: > > Hi > > I've commited a Bio::SeqFeature::Computation.pm to the cvs. It is a > variation on the Generic.pm SeqFeature object but more geared towards > containing the results of a computation result. > A big goal of writing this is also to make the game export more > straightforward. > > Since I'm kind of new to this, I hope I did everything alright. > > Biggest changes are > * The subSeqFeature's are grouped into subset's. i.e. > $feat->add_sub_SeqFeature($subfeat,'repeat') > * There is a data structure exactly like tag, but named score > * It contains a computation_id (like game does) > * Lot of items like program_name and database_version added > Thanks for the submission Mark. Have you checked against Bio::Tools::AnalysisResult? This one is supposed to be the base class for analysis result parsers. It's not a feature though. Do we really want to have all-encompassing feature objects? Could it be smarter to have features which can have an object attached that describes their computational origin? I personally tend to prefer to have classes as slim as they can be, and rather have more classes and a more complex object tree (objects having several other objects attached) which of course increases the traversing complexity. In addition I think we need to keep the number of classes in Bio::SeqFeature as comprehensive as possible, because people using the package will try to comprehend which class to use for what. That is, the extent of overlap between classes should be as little as possible. In the case of SeqFeature::Computation this would mean that _every_ feature object derived from a computation shall inherit from it (rather than some do and some deliberately don't). What do people feel about these aspects? Do you disagree, do you think this would lead us towards imposing to strict requirements on module implementors? In other words, what degree of controlling the bazaar's growth is sensible? Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- _______________________________________________ Bioperl-l mailing list Bioperl-l@bioperl.org http://bioperl.org/mailman/listinfo/bioperl-lFrom lapp@gnf.org Thu Dec 14 04:29:59 2000 Date: Wed, 13 Dec 2000 20:29:59 -0800 From: Hilmar Lapp lapp@gnf.org Subject: [Bioperl-l] Computation.pm
"Fiers, M.W.E.J." wrote: > > Hi Hilmar, > > I have seen the Bio::Tools::Analysis object and thought it to be a class > which the parsers inherit (as you say) and not so much a class to actually > contain the results of the computation. But correct me if I am wrong. As I already said, this is indeed the difference. > > The computation object has the advantage that it can store more complex sets > of results. You can, for example, store a set of exons and introns and > retrieve them separatly. > Hmm. Not sure why we need the Computation object for this. GeneStructure already implements this. > There are also advantages to having all computation results inheriting from > the same class, it will make parsing of the objects much more > straigthforward. A large advantage when producing, for example, game output, > and there is a lot of standard information related to analysis results. I don't see so much why it should ease parsing, because I guess you have to get the information out of your data source anyway (the Computation class doesn't parse, it only stores, right?). But I agree with you that there are advantages if every object obtained from a computation is guaranteed to implement a certain interface pertaining to computation-specific attributes. > > If you do not accept the computation object, it might be an option to merge Remember what Ewan said? The ones with the working code win. There is no such thing as someone rejecting a module (provided it works :) I was trying to get a picture of how and where this fits into the Bioperl framework, and I wanted to solicit people's feedback to the points I brought up. > > I haven't had a change to take an in depth look at the Analysisresult I don't think that solves any of your problems, as you correctly realized. Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 -------------------------------------------------------------From krbou@pgsgent.be Thu Dec 14 08:29:33 2000 Date: Thu, 14 Dec 2000 09:29:33 +0100 From: Kris Boulez krbou@pgsgent.be Subject: [Bioperl-l] Re: Developing Improved Bioperl Documentation
Quoting Peter Schattner (schattner@alum.mit.edu): > Hello Kris & Brian > > I gather that the two of you have volunteered to develop improved > documentation for (release 0.7 of) Bioperl. I would be glad to > contribute to this important task assuming you could still use some > help. > I think we certainly can. Real life always takes more time then expected :) > I think that a good starting point would be a proposed outline / > table-of-contents for the documentation. That would make it clear what > topics would be covered and should make it easier for people to sign up > to write different sections. Has either of you written such an outline? > Or have you been envisioning a different way of organizing the work? > > If you have written an outline, could you send me a copy? If not (and > you agree that having an outline might be useful), I’d be glad to take a > shot a writing one. > I don't think something like this exists yet. My plan is to start writing cookbook-like documentation (see an earlier mail from Ewan), based on the SYNOPSIS part of each module's documentation. In the mean time chekcing these section for correctness. Kris,From birney@ebi.ac.uk Thu Dec 14 09:02:38 2000 Date: Thu, 14 Dec 2000 09:02:38 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] External dependencies and Macs [long]
On Wed, 13 Dec 2000, Todd Richmond wrote: > On 12/13/00 2:49 AM, "Ewan Birney" <birney@ebi.ac.uk> wrote: > > > Most (?all) our dependencies are pure perl modules, so they should work > > for Macs... > > Pure perl <=> fully portable > > > I personally believe that it is ok to have dependencies on thing like LWP, > > File::Temp etc but not ok to have dependencies on > > New::SuperClever::MySQL::AutoBinder or something else which is likely to > > be more buggy than bioperl ;) > > Let me give you an example of what us poor Mac users are up against. Here's > a brief blow-by-blow of my attempt to get the current external dependencies > installed on my machine: > Wow. I guess it is going to be MacOS X to save the day here. We are going to need (with your help Todd) good instructions on what is or is not going to work from bioperl on Macs. I think getting the entire bioperl suite to work on Macs looks as if we need to fix lots of "Perl on Mac" issues. But I hope a good subset of bioperl can work ok on Macs and we can have good walk through instructions about what errors to ignore/skip over etc... Sounds good? ewan... ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From birney@ebi.ac.uk Thu Dec 14 09:10:07 2000 Date: Thu, 14 Dec 2000 09:10:07 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Computation.pm
On Wed, 13 Dec 2000, Hilmar Lapp wrote: > > Thanks for the submission Mark. Have you checked against > Bio::Tools::AnalysisResult? This one is supposed to be the base class for > analysis result parsers. It's not a feature though. > > Do we really want to have all-encompassing feature objects? Could it be > smarter to have features which can have an object attached that describes > their computational origin? > My feeling is that we should not do this by implementation inheritance but by interface inheritance of composition/delegation model. I would suggest something like this: Bio::ComputationalResultI enforces that the implementation has a ->computation call returning a Bio::Tools::Computation object which has parameters "program" etc (is that what we need?) (This is like Bio::DBLinkContainerI - object which contains an array of DBLinks) Then we can have *implementations* inheriet from say, SeqFeatureI and ComputationalResultI, indicating that this object is both a SeqFeature and has a ComputationalResult or GeneStructureI and ComputationalResultI or WeirdNewAnalysisObject and ComputationalResultI etc etc... What do people feel --- I really like having "lots of interfaces and fewer implementations" --- I think it future proofs us and is good for bioinformatics where you want to mix-and-match attributes on objects very often. It also probably means I should program more in java and less in perl ;) Oh well... ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From hlapp@gmx.net Thu Dec 14 09:28:37 2000 Date: Thu, 14 Dec 2000 01:28:37 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Re: [Bioperl-guts-l] bioperl-live MANIFEST file mismatch (Bio/liveseq/Analyser.pm)
Chris Dagdigian wrote: > > The current bioperl-live snapshot will not pass 'perl Makefile.PL' because > the MANIFEST file has an entry for liveseq/Analyser.pm which does not > actually exist in the current distro (or maybe was removed?). It was removed from the repository. (I have no idea if it was on purpose.) Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From M.W.E.J.Fiers@plant.wag-ur.nl Thu Dec 14 14:31:46 2000 Date: Thu, 14 Dec 2000 15:31:46 +0100 From: Fiers, M.W.E.J. M.W.E.J.Fiers@plant.wag-ur.nl Subject: [Bioperl-l] Computation.pm
Hi >Hmm. Not sure why we need the Computation object for this. GeneStructure >already implements this. Yes, I agree, but computation.pm can, without adaption, store more types; for example if you would like to store Acceptor/Donor site location, transcriptions factor binding site's or whatever. And another pre to using the computation object in my view is that you do not have to know what is in there, you can get an array from the object containing the names of the stored subfeature types. Genestructure is, on the other side, more sophisticated than Computation because it can return a cds. In my opinion, the genestructure object could inherit from computation? >Remember what Ewan said? The ones with the working code win. There is no >such thing as someone rejecting a module (provided it works :) Actually i didn't remember, but I would like to see a consensus, and if the bioperl community does not think that it is a valuable addition, I will remove it. Concerning Ewan's remarks of producing an interface, I could agree, it won't be a problem to rewrite. MarkFrom MEColosimo@alumni.carnegiemellon.edu Thu Dec 14 17:01:42 2000 Date: Thu, 14 Dec 2000 12:01:42 -0500 From: Marc Colosimo MEColosimo@alumni.carnegiemellon.edu Subject: [Bioperl-l] Re: External dependencies and Macs
I am also trying to use BioPerl on a Mac and have found several things difficult. I for one really hate down loading a package, setting it up and running it to find out that i'm missing X number of things. I don't think it matters which platform any one is on, I think to some extent everyone has this same problem. As for external 'C' libraries, I think that considerable thought should be put in before using them. Most of us on this list probably can build or tweak with the C stuff to get it to work. However, I think my time is better spent using BioPerl than to get it to work. I have three suggestions, two of which I hate to suggest. 1) there should be a bioperl script for installing the package (using PERL itself). I think there is a UNIX install script and until MAC OS X come rolling out I can't use it. Plus, the user should have PERL. 2) This was already, suggested - include the externals (PERL modules) in the BioPerl distribution. And 3) change the 'C' libraries over to use JAVA (BioJava anyone). The last one I think is sort of evil, but JAVA is pretty standard over systems and the compiled code is fast (not as fast a being native). One other thing (This should be one it's own), line endings are a pain to handle. The Mac uses a different code for \n than PC/UNIX (actually so does the PC vs UNIX but it probably does not show up as a problem). If I down load a fasta file (large one with many sequences in it) as a UNIX file (I could have it translated when decompressing, but that messes up stuff for MAME :). Stuff that looks for end of line or \n don't find it and end up choking. I don't know if there is a fix for this, but it is something to think about when getting data from the web. MarcFrom dag@sonsorol.org Thu Dec 14 17:27:53 2000 Date: Thu, 14 Dec 2000 12:27:53 -0500 From: Chris Dagdigian dag@sonsorol.org Subject: [Bioperl-l] new dependency on XML::Node in bioperl-live?
Hi folks, In testing the build of bioperl-live on my new OpenBSD 2.8 box I got a new error message from the Variation_IO.t test script telling me (among other things) that I now need the XML::Node module. This module is not mentioned in our Makefile.PL and I have not listed it on our web page (http://bioperl.org/Core/external.shtml) that attempts to track our dependencies. if someone can tell me that this dependency is here to stay I would appreciate it...I'll add it to the makefile and update the web/FTP sites accordingly. Thanks! Chris Chris Dagdigian -- Blackstone Technology Group (Work ) dagdigian@computefarm.com (Home) dag@sonsorol.org (Web ) http://ComputeFarm.com, http://open-bio.org, http://sonsorol.org (More ) Full contact info and schedule -- http://sonsorol.org/dag/contact.htmlFrom mcs2+@pitt.edu Thu Dec 14 17:26:21 2000 Date: Thu, 14 Dec 2000 12:26:21 -0500 From: Martin Schmidt mcs2+@pitt.edu Subject: [Bioperl-l] newbie question
Hi, I am just getting started with BioPerl and am unable to get any of it to compile. Just for starters, I have tried to use the restriction.pl script that comes with BioPerl release 6.2 I changed the line... use lib "/home/sac"; to use lib "macintosh HD:MacPerl:bioperl-0.6.2:libwww-perl-5.48:lib"; and this does not give an error message. The next line does give an error. use Bio::Seq; It says "# Can't locate Bio/Seq.pm in @INC. I can't find 'Bio::Seq' in the BioPerl package, nor can I find it at the BioPerl web site. Any advice would be appreciated. thanks, Martin -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Martin Schmidt, Ph.D. * * Department of Molecular Genetics and Biochemistry * * University of Pittsburgh School of Medicine * * Pittsburgh, PA 15261 * * Tel. (412) 648-9243 * * FAX (412) 624-1401 * * Email mcs2@pitt.edu * * www.pitt.edu/~mcs2/ * * * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~From dag@sonsorol.org Thu Dec 14 17:36:56 2000 Date: Thu, 14 Dec 2000 12:36:56 -0500 (EST) From: Chris Dagdigian dag@sonsorol.org Subject: [Bioperl-l] coming soon: Bundle::BioPerl
I have been playing happily with the CPAN.pm module and it's features that allow automated installation of modules from the CPAN archives. This includes experiencing firsthand how it attempts to arbitrarily upgrade me to perl 5.06 so I see why some people stay away. heh. Regardless someone here made the great suggestion to create a Bundle::BioPerl module that lists all of our CPAN-specific requrements. This would allow end users to install (most) of our external modules in one shot: $ perl -MCPAN -e shell cpan> install Bundle::Bioperl cpan> quit I hereby volunteer to create this module and have obtained a PAUSE login that will allow me to upload it to CPAN. Once it is ready for testing I'll post a URL. Assuming it survives the testing process I'll upload it to the master repository shortly afterwards. Regards, Chris -- Chris Dagdigian (Home:Work) Blackstone Technology Group dag@sonsorol.org : dagdigian@ComputeFarm.com http://www.sonsorol.org : http://www.computefarm.com http://open-bio.org : Mobile (617) 877-5498 -- Schedule & full contact info http://www.sonsorol.org/dag/contact.html --From heikki@ebi.ac.uk Thu Dec 14 17:48:43 2000 Date: Thu, 14 Dec 2000 17:48:43 +0000 From: Heikki Lehvaslaiho heikki@ebi.ac.uk Subject: [Bioperl-l] new dependency on XML::Node in bioperl-live?
Chris, Strange you have not got that message before as Bio::Variation::IO::xml.pm has been using it for a long time. The XML::Node gives a greatly simplified interface to XML::Parser (although there is a glitch in attibute parsing that forces one to use full path to the element.) I do not intend to rewrite the module in near future, so I guess you have to add it into the dependacy list. -Heikki Chris Dagdigian wrote: > > Hi folks, > > In testing the build of bioperl-live on my new OpenBSD 2.8 box I got a new > error message from the Variation_IO.t test script telling me (among other > things) that I now need the XML::Node module. > > This module is not mentioned in our Makefile.PL and I have not listed it on > our web page (http://bioperl.org/Core/external.shtml) that attempts to > track our dependencies. > > if someone can tell me that this dependency is here to stay I would > appreciate it...I'll add it to the makefile and update the web/FTP sites > accordingly. > > Thanks! > Chris > > Chris Dagdigian -- Blackstone Technology Group > (Work ) dagdigian@computefarm.com (Home) dag@sonsorol.org > (Web ) http://ComputeFarm.com, http://open-bio.org, http://sonsorol.org > (More ) Full contact info and schedule -- http://sonsorol.org/dag/contact.html > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________From hlapp@gmx.net Thu Dec 14 18:49:28 2000 Date: Thu, 14 Dec 2000 10:49:28 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] newbie question
Martin Schmidt wrote: > > It says "# Can't locate Bio/Seq.pm in @INC. > > I can't find 'Bio::Seq' in the BioPerl > package, nor can I find it at the BioPerl > web site. > It must be there. It is one of the central modules. I don't know what happened, but I'm pretty sure it's in the distribution unless someone has tampered with it (the dist) lately. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From todd@andrew2.stanford.edu Thu Dec 14 18:52:53 2000 Date: Thu, 14 Dec 2000 10:52:53 -0800 From: Todd Richmond todd@andrew2.stanford.edu Subject: [Bioperl-l] External dependencies and Macs [long]
On 12/14/00 1:02 AM, "Ewan Birney" <birney@ebi.ac.uk> wrote: > Wow. > > I guess it is going to be MacOS X to save the day here. > > We are going to need (with your help Todd) good instructions on what is or > is not going to work from bioperl on Macs. I think getting the entire > bioperl suite to work on Macs looks as if we need to fix lots of "Perl on > Mac" issues. > > But I hope a good subset of bioperl can work ok on Macs and we can have > good walk through instructions about what errors to ignore/skip over > etc... > I'm certainly willing to do what I can to help out, though I'm putting most of my faith in MacOS X. I think there are too many things that are broken in too many separate modules to fix easily. Certainly many of them will be beyond my ability to trouble-shoot. Todd -- Dr Todd Richmond http://cellwall.stanford.edu/todd Carnegie Institution email: todd@andrew2.stanford.edu Department of Plant Biology fax: 1-650-325-6857 260 Panama Street phone: 1-650-325-1521 x431 Stanford, CA 94305From vastrik@mappi.helsinki.fi Thu Dec 14 18:53:18 2000 Date: Thu, 14 Dec 2000 20:53:18 +0200 (EET) From: vastrik@mappi.helsinki.fi vastrik@mappi.helsinki.fi Subject: [Bioperl-l] newbie question
Quoting Martin Schmidt <mcs2+@pitt.edu>: > Hi, > I am just getting started with BioPerl and > am unable to get any of it to compile. > Just for starters, I have tried to use the > restriction.pl script that comes with BioPerl > release 6.2 > > I changed the line... > use lib "/home/sac"; > > to > > use lib "macintosh HD:MacPerl:bioperl-0.6.2:libwww-perl-5.48:lib"; > > and this does not give an error message. > > The next line does give an error. > > use Bio::Seq; > > It says "# Can't locate Bio/Seq.pm in @INC. Either: -make an alias of Bio folder and stick it into MacPerl/lib (or should I say MacPerl:lib). or -open 'Preferences' from MacPerl's 'Edit' menu and add the path there. Both work for me (not that I use MacPerl much). Rgds., imrFrom hlapp@gmx.net Thu Dec 14 18:57:45 2000 Date: Thu, 14 Dec 2000 10:57:45 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Re: External dependencies and Macs
Marc Colosimo wrote: > Mac uses a different code for \n than PC/UNIX (actually so does the PC vs UNIX but it > probably does not show up as a problem). If I down load a fasta file (large one with > many sequences in it) as a UNIX file (I could have it translated when decompressing, but > that messes up stuff for MAME :). Stuff that looks for end of line or \n don't find it > and end up choking. I don't know if there is a fix for this, but it is something to > think about when getting data from the web. Hm. Is there no conversion utility for Macs? Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From Shailesh L Mistry"
On Thu, 14 Dec 2000, Fiers, M.W.E.J. wrote: > Hi > > > >Hmm. Not sure why we need the Computation object for this. GeneStructure > >already implements this. > > Yes, I agree, but computation.pm can, without adaption, store more types; > for example if you would like to store Acceptor/Donor site location, > transcriptions factor binding site's or whatever. > And another pre to using the computation object in my view is that you do > not have to know what is in there, you can get an array from the object > containing the names of the stored subfeature types. > Genestructure is, on the other side, more sophisticated than Computation > because it can return a cds. In my opinion, the genestructure object could > inherit from computation? I can't see computation.pm or GeneStructureI (is this on a different branch) so not sure if this applies, but rather than inheritance shouldn't it be something like derived_from GeneStructureI --------------------->[n] ComputationI in our (BDGP) objects we would have this implemented by the gene's exons and transcripts returning the result span (hsps) has as evidence > >Remember what Ewan said? The ones with the working code win. There is no > >such thing as someone rejecting a module (provided it works :) > > Actually i didn't remember, but I would like to see a consensus, and if the > bioperl community does not think that it is a valuable addition, I will > remove it. > > Concerning Ewan's remarks of producing an interface, I could agree, it won't > be a problem to rewrite. > > Mark > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l >From lapp@gnf.org Thu Dec 14 20:03:04 2000 Date: Thu, 14 Dec 2000 12:03:04 -0800 From: Hilmar Lapp lapp@gnf.org Subject: [Bioperl-l] Bioperl NT update
Shailesh L Mistry wrote: > > 2) A decision needs to be made about the use of the alarm function in > utilities.pm. Is it possible to just put a delay counter in? Or make a bio:alarm > function that can distinguish the platform and either make a count delay or call > the real alarm function? > What's the purpose of the code? Does it need the interrupt feature of alarm()? Couldn't sleep() do as well, or doesn't Perl have that either? > The other minor problems relate to programs that I do not have on my machine > specifically blast, clustalw and TCoffee. > Do I assume that only people who use these programs should worry about the > testing of them or will they definitely be part of the distribution? > I guess they will, however many people will not use every part. Still, if you want to take the effort and get all those programs installed (as far as they are available for NT), that'd be great. I'm wondering whether the respective modules can live with a precomputed output, too. Peter? Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 -------------------------------------------------------------From tam@akkadix.com Thu Dec 14 20:38:33 2000 Date: Thu, 14 Dec 2000 12:38:33 -0800 From: Brian Tam tam@akkadix.com Subject: [Bioperl-l] script to visit biosite question
Hi, I don't know if this is the appropriate forum to post this question, but since it's Perl- and computational biology-related I might as well give it a try. I'm trying to query Genscan (http://genes.mit/edu/GENSCAN.html), a gene prediction program, from the Unix shell with a Perl script. Yes, I know I could paste in the sequences and 'run' the software from the web page itself, but I've got hundreds of sequences to input, too many to do manually. And I get no response from their mail server, so that alternative won't do, either. Anyway, I've created the following user, request, and response objects: use LWP::UserAgent; $ua = new LWP::UserAgent; $ua->agent("AgentName/0.1 " . $ua->agent); my $req = new HTTP::Request POST => 'http://genes.mit.edu/cgi-bin/genescanw.cgi'; $req->content_type('multipart/form-data'); $req->content([-o => Arabidopsis, -n => $a_sequence, -p => 'Predicted CDS and peptides', submit => 'Run GENSCAN']); my $res = $ua->request($req); if ($res->is_success) { print $res->content; } else { print "Bad luck this time\n"; } $a_sequence is a sequence of "A"s, "T"s, "G"s, and "C"s you can randomly distribute as a test case, of course. When I run the script, it never succeeds in returning anything. In fact, I have to kill the process because it just 'hangs'. Am I calling Genscan incorrectly, inputting the parameters wrong, or what? Maybe I just don't have the knack of handling forms yet, but the control names, "-o", "-n", "-p", sure look weird! Any help would be greatly appreciated. Please e-mail responses to btam@akkadix.com. Thanks. ---Brian Tam Scientific Programmer Akkadix Corp. La Jolla, CA 92037From MEColosimo@alumni.carnegiemellon.edu Thu Dec 14 20:53:42 2000 Date: Thu, 14 Dec 2000 15:53:42 -0500 From: Marc Colosimo MEColosimo@alumni.carnegiemellon.edu Subject: [Bioperl-l] Re: External dependencies and Macs
Hilmar Lapp wrote: > > Marc Colosimo wrote: > > Mac uses a different code for \n than PC/UNIX (actually so does the PC vs UNIX but it > > probably does not show up as a problem). If I down load a fasta file (large one with > > many sequences in it) as a UNIX file (I could have it translated when decompressing, but > > that messes up stuff for MAME :). Stuff that looks for end of line or \n don't find it > > and end up choking. I don't know if there is a fix for this, but it is something to > > think about when getting data from the web. > > Hm. Is there no conversion utility for Macs? > There are several ways to handle this. Stuffit Expander (which handles tar zips etc.) will automatically convert UNIX EOL to MAC EOL. I have it turned off because some of the things I do want the UNIX EOL and die if not. I also have BBEdit, which is great and can handle 13Meg text files. I used this to fix my problem. I was just pointing out that these little things exist and if BioPerl, at sometime, directly downloads or accesses files across the net, this could be a minor problem. MarcFrom tam@akkadix.com Thu Dec 14 21:27:27 2000 Date: Thu, 14 Dec 2000 13:27:27 -0800 From: Brian Tam tam@akkadix.com Subject: [Bioperl-l] script to visit biosite question
Sorry, there's a typo in the script I posted. The cgi script in the actual script IS 'genscanw.cgi'. As a check I put the request object in a loop: while (1) { print "dum-dee-dum dum duuuuuuum\n" if ($res->is_success) { last; }; }; When the program is run, the print line gets outputted repeatedly, forever... ---Brian -----Original Message----- From: Mark Wilkinson [mailto:mwilkinson@gene.pbi.nrc.ca] Sent: Thursday, December 14, 2000 12:54 PM To: Brian Tam Subject: Re: [Bioperl-l] script to visit biosite question Brian Tam wrote: > my $req = new HTTP::Request POST => > 'http://genes.mit.edu/cgi-bin/genescanw.cgi'; is this a direct copy/paste from your script, or a typo in your message? the CGI name is "genscanw.cgi", not "genescanw.cgi". M -- --- Dr. Mark Wilkinson Bioinformatics Group National Research Council of Canada Plant Biotechnology Institute 110 Gymnasium Place Saskatoon, SK CanadaFrom cstrassel@netgenics.com Thu Dec 14 21:31:50 2000 Date: Thu, 14 Dec 2000 16:31:50 -0500 From: Strassel, Chris cstrassel@netgenics.com Subject: [Bioperl-l] Parsing fuzzy locations
I am about to embark on trying to write a parser for fuzzy feature locations, and was hoping to gather some advice before starting. Does anyone have any experience that they would be willing to share? Pitfalls? Is this already done somewhere? Is it totally hopeless? Thanks, Chris ============================= Chris Strassel Software Engineer NetGenics, Inc. Statler Office Building 20 Park Plaza Blvd, Suite 637 Boston, MA 02166 (617) 556-0198 Fax: (617) 556-0888From schattner@alum.mit.edu Thu Dec 14 22:41:17 2000 Date: Thu, 14 Dec 2000 14:41:17 -0800 From: Peter Schattner schattner@alum.mit.edu Subject: [Bioperl-l] Bioperl NT update
Hilmar Lapp wrote: > > Shailesh L Mistry wrote: > > > > > The other minor problems relate to programs that I do not have on my machine > > specifically blast, clustalw and TCoffee. > > Do I assume that only people who use these programs should worry about the > > testing of them or will they definitely be part of the distribution? > > > > I guess they will, however many people will not use every part. Still, if > you want to take the effort and get all those programs installed (as far as > they are available for NT), that'd be great. I'm wondering whether the > respective modules can live with a precomputed output, too. Peter? > > Hilmar > The test scripts for the clustalw.pm, standaloneblast.pm and tcoffee.pm modules check for the presence of the respective underlying programs they need on the host machine when in a Unix (or Linux) environment. If they are not found, the respective tests are skipped. However, under NT this approach is untested and probably won’t work. I suspect the underlying programs may not even be available under NT (or Mac). I think the best bet would be to completely skip all the tests relating to these modules unless the host OS is a Unix variant. Alternatively, the tests could presumably live with precomputed output to be used if the Bioperl installation test-harness detects a non-unix environment . However I do not know how to incorporate either of these approaches in the Bioperl installation. Can anyone help? PeterFrom lapp@gnf.org Thu Dec 14 22:58:51 2000 Date: Thu, 14 Dec 2000 14:58:51 -0800 From: Hilmar Lapp lapp@gnf.org Subject: [Bioperl-l] Bioperl NT update
Peter Schattner wrote: > The test scripts for the clustalw.pm, standaloneblast.pm and tcoffee.pm > modules check for the presence of the respective underlying programs > they need on the host machine when in a Unix (or Linux) environment. If > they are not found, the respective tests are skipped. However, under NT > this approach is untested and probably won?t work. I suspect the > underlying programs may not even be available under NT (or Mac). > > I think the best bet would be to completely skip all the tests relating > to these modules unless the host OS is a Unix variant. Alternatively, > the tests could presumably live with precomputed output to be used if > the Bioperl installation test-harness detects a non-unix environment . > However I do not know how to incorporate either of these approaches in > the Bioperl installation. Can anyone help? > $ perl -e 'print $^O,"\n";' (or $OSNAME; see 'perldoc perlvar') lets you figure out the OS of the host. (Could people using Mac and NT send what it prints on their machines?) What is the problem with precomputed results? Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 -------------------------------------------------------------From jason@chg.mc.duke.edu Thu Dec 14 22:59:36 2000 Date: Thu, 14 Dec 2000 17:59:36 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] Bio::SeqFeatureProducer
(Hilmar, if you can help be my verifier that we have completed all the parts of the task, we can start to tick off some of the things from the tasklist...) I commited a new class Bio::SeqFeatureProducer, a test file, and more test data. One can use this class to simplify the way Features from an analysis are added to a sequence (Genes from Bio::Tools::Genscan or Bio::Tools::MZEF). In some cases you will want more fine grained control over how features are added or seen so that more information is produced in a Genbank/EMBL dump. This can be done by cycling through the features on the sequence and processing them. ie I want to dump CDS and predicted protein translation in genbank files from genscan analysis, I have to take each gene on the seq, add a new feature for CDS with a translated field. Perhaps we can work out a way for the GeneStruct objects to be dumped a little more informatively? I suspect we will want to remove Bio::SeqFeatureProducerI as I don't think it is necessary unless we will have many different ways to add_features to sequences with non AnalysisParserI objects... Anyone else writing parsers - GAME w/o seq and gff will certianly fit here with a simple wrapper - should make sure they have a class that implements SeqAnalysisParserI and then SeqFeatureProducer can take advantage of the class. Will have to add them to the %DRIVERVALUES variable in SeqFeatureProducer since we don't have parsers in a central location. Hope this is useful. -jason Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From lapp@gnf.org Thu Dec 14 23:06:36 2000 Date: Thu, 14 Dec 2000 15:06:36 -0800 From: Hilmar Lapp lapp@gnf.org Subject: [Bioperl-l] Parsing fuzzy locations
"Strassel, Chris" wrote: > > I am about to embark on trying to write a parser for fuzzy feature > locations, and was hoping to gather some advice before starting. > I think this is up to the BioXML/BioCorba guys to speak up, because there was already some sort of commitment (it's actually in the task list) that we want to have this in the upcoming 0.7 release in order to support the latest BioCorba spec. So, Jason, Brad, what's up? Does it make sense to simply copy the BioJava approach to this problem? Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 -------------------------------------------------------------From lapp@gnf.org Thu Dec 14 23:16:50 2000 Date: Thu, 14 Dec 2000 15:16:50 -0800 From: Hilmar Lapp lapp@gnf.org Subject: [Bioperl-l] script to visit biosite question
Brian Tam wrote: > > Sorry, there's a typo in the script I posted. The cgi script in the > actual script IS 'genscanw.cgi'. As a check I put the request object > in a loop: > > while (1) { > print "dum-dee-dum dum duuuuuuum\n" > if ($res->is_success) { > last; > }; > }; > That's clearly an infinite loop or a loop done only once. Perl doesn't multi-thread, there is no background mystery going on that might change the result of is_success(). If $res->is_success() returns FALSE it means exactly that, and it won't change. So, either you made an error in setting up the query and it's LWP that complains, or the server returns indeed an error condition (for some reason or another). Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 -------------------------------------------------------------From jdiggans@genelogic.com Thu Dec 14 23:33:54 2000 Date: Thu, 14 Dec 2000 18:33:54 -0500 From: J.C. Diggans jdiggans@genelogic.com Subject: [Bioperl-l] Bioperl NT update
NT gives 'MSWin32', I would imagine that sounds generic enough that 9x platforms would probably print the same thing. - jc Hilmar Lapp wrote: > > Peter Schattner wrote: > > The test scripts for the clustalw.pm, standaloneblast.pm and tcoffee.pm > > modules check for the presence of the respective underlying programs > > they need on the host machine when in a Unix (or Linux) environment. If > > they are not found, the respective tests are skipped. However, under NT > > this approach is untested and probably won?t work. I suspect the > > underlying programs may not even be available under NT (or Mac). > > > > I think the best bet would be to completely skip all the tests relating > > to these modules unless the host OS is a Unix variant. Alternatively, > > the tests could presumably live with precomputed output to be used if > > the Bioperl installation test-harness detects a non-unix environment . > > However I do not know how to incorporate either of these approaches in > > the Bioperl installation. Can anyone help? > > > > $ perl -e 'print $^O,"\n";' > > (or $OSNAME; see 'perldoc perlvar') lets you figure out the OS of the host. > (Could people using Mac and NT send what it prints on their machines?) > > What is the problem with precomputed results? > > Hilmar > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp@gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l -- ------------------------------------------------- James Diggans Phone: 301.987.1756 Gene Logic, Inc. FAX: 301.987.1701 jdiggans@genelogic.com Cell: 301.908.2477 -------------------------------------------------From todd@andrew2.stanford.edu Thu Dec 14 23:55:25 2000 Date: Thu, 14 Dec 2000 15:55:25 -0800 From: Todd Richmond todd@andrew2.stanford.edu Subject: [Bioperl-l] Bioperl NT update
On 12/14/00 2:58 PM, "Hilmar Lapp" <lapp@gnf.org> wrote: > $ perl -e 'print $^O,"\n";' > > (or $OSNAME; see 'perldoc perlvar') lets you figure out the OS of the host. > (Could people using Mac and NT send what it prints on their machines?) > MacOS prints "MacOS" Windows 2000 prints "MSWin32"From mdalphin@amgen.com Fri Dec 15 00:05:32 2000 Date: Thu, 14 Dec 2000 16:05:32 -0800 From: Mark Dalphin mdalphin@amgen.com Subject: [Bioperl-l] Parsing fuzzy locations
"Strassel, Chris" wrote: > I am about to embark on trying to write a parser for fuzzy feature > locations, and was hoping to gather some advice before starting. > > Does anyone have any experience that they would be willing to share? > Pitfalls? Is this already done somewhere? Is it totally hopeless? I have recently written a Genbank LOCATION parser (just the location part, not the whole Feature Table; I do that elsewhere in Perl). It parses the fuzzy locations without a problem. I wrote it in flex and bison, creating a function in C, which I then wrapped in a Perl XS subroutine. It is quick and stable enough for our purposes. Here are some of the problems I encountered. First, the grammar specified at NCBI for the Feature Table locations: http://www.ncbi.nlm.nih.gov/collab/FT/index.html lists a Backus-Naur representation of the grammar which is not completely correct. For example, it does not specifiy parens anywhere except in functions like "join()". Yet Genbank releases locations like "(9.10)..(20.22)". By the specified grammar, that should be "9.10..20.22". There were some minor programming problems I encountered along the way as well, but as this was my first attempt at flex/bison, I can't really complain. It was also less than pretty for me to return an array from C into Perl via XS. That too was due to inexperience. I did look at writing the parser in pure Perl, using "Parse::RecDescent" (see D.Conway, "The man(1) of descent", The Perl Journal, 12:46-58, winter 1998). I suspect the grammer I developed (modified from the Genbank B-N form) would almost work for Parse::RecDescent, but some of the recursions might need to be re-ordered. I went the flex/bison route as we have other programers who wanted a parser that could be accessed via C and C++. I have not looked at giving it to BioPerl in part because linking in the C function would be a problem for most users. I also am not sure how BioPerl plans to carry fuzzy locations; I have my parser return a ref to an array of refs to arrays. The final level of arrays contains: [0]=AccNum [1]=isComplement(0 or 1) [2]=Beg-Position [3]=Beg-Fuzzy-Type [4]=Beg-Fuzzy-Amount [5]=End-Position [6]=End-Fuzzy-Type [7]=End-Fuzzy-Amount Examples: 10..20 --> [0]='', [1]=0, [2]=10, [3]=undef, [4]=undef, [5]=20, [6]=undef, [7]=undef (9.10)..20 --> as above, except: [2]=9, [3]='dot', [4]=1 <9..20 --> as above, except [2]=9, [3]='<', [4]=undef complement(AC000134:10..50) --> [0]='AC000134', [1]=1, [2]=10, [3]=undef, [4]=undef, [5]=50, [6]=undef, [7]=undef Let me know if you wish to see any of this: the tokenizer input for flex, the grammar for bison, the wrapper for Perl XS. I even have a makefile for both the SGI running Irix and our Dec Alphas (running a different version of Perl; this makes the XS output incompatible! Yuck!). Cheers, Mark PS I still think that I should learn ASN.1 and use the NCBI parser directly from the NCBI toolkit. Avoid all the trouble of re-inventing the wheel. I advocate this, despite the fact that I don't enjoy the NCBI code. In my case, I am actually getting the "locations" from a non-Genbank source as well as Genbank, so I needed to create this parser. -- Mark Dalphin email: mdalphin@amgen.com Mail Stop: 29-2-A phone: +1-805-447-4951 (work) One Amgen Center Drive +1-805-375-0680 (home) Thousand Oaks, CA 91320 fax: +1-805-499-9955 (work)From birney@ebi.ac.uk Fri Dec 15 09:17:21 2000 Date: Fri, 15 Dec 2000 09:17:21 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Parsing fuzzy locations
On Thu, 14 Dec 2000, Strassel, Chris wrote: > I am about to embark on trying to write a parser for fuzzy feature > locations, and was hoping to gather some advice before starting. > > Does anyone have any experience that they would be willing to share? > Pitfalls? Is this already done somewhere? Is it totally hopeless? Good luck ;) The discussion we need to have is the object which you'll store the fuzzy locations in. I suspect the best thing is to get stuck in there and then propose back to this list what you think is the best object, we'll kick that idea around and see if we can fit it in... > > Thanks, > Chris > > ============================= > Chris Strassel > Software Engineer > NetGenics, Inc. > Statler Office Building > 20 Park Plaza Blvd, Suite 637 > Boston, MA 02166 > (617) 556-0198 > Fax: (617) 556-0888 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From Shailesh L Mistry"
I am a new user to this bio-perl service. Can anyone help me in finding the source code for the bioinformatics applications like BLAST FASTA CLUSTAL TRANSLATE Thanks -Govind __________________________________________________ Do You Yahoo!? Yahoo! Shopping - Thousands of Stores. Millions of Products. http://shopping.yahoo.com/From kdj@sanger.ac.uk Fri Dec 15 12:54:10 2000 Date: 15 Dec 2000 12:54:10 +0000 From: Keith James kdj@sanger.ac.uk Subject: [Bioperl-l] Parsing fuzzy locations
I have some code which does this, including an EMBL parser and generic feature object which supports fuzzies. This aspect of parsing hasn't been heavily tested and there are odd things it won't cope with e.g. I discovered a couple of days ago that it balked at locations like (343.345)..(343.345) which cropped up in an E. coli EMBL entry. I'm in the process of fixing that now. I've also got some preliminary support for reverse-complementing and sub-sequencing sequences with such features attached. Due to lack of free time I've had to put aside plans to incorporate this into Bioperl, but the Bio:PSU package and method documentation can be obtained from http://www.sanger.ac.uk/Users/kdj/software.html or ftp://ftp.sanger.ac.uk/pub/pathogens/software/biopsu/Bio-PSU-0.04.tar.gz -- -= Keith James - kdj@sanger.ac.uk - http://www.sanger.ac.uk/Users/kdj =- The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambs CB10 1SAFrom mrp@sanger.ac.uk Fri Dec 15 14:56:56 2000 Date: Fri, 15 Dec 2000 14:56:56 +0000 From: Matthew Pocock mrp@sanger.ac.uk Subject: [Bioperl-l] Parsing fuzzy locations
> I did look at writing the parser in pure Perl, using "Parse::RecDescent" (see > D.Conway, "The man(1) of descent", The Perl Journal, 12:46-58, winter 1998). I > suspect the grammer I developed (modified from the Genbank B-N form) would > almost work for Parse::RecDescent, but some of the recursions might need to be > re-ordered. I went the flex/bison route as we have other programers who wanted > a parser that could be accessed via C and C++. RecDescent was what I used back in the mists of time, prety much directly from the genbank B-N. It works fine. You have to think about how to represent the fuzzyness in your location object-model (BioJava just decorates another location object adding boolean properties fuzzyMin and fuzzyMax). As for complement & join, I think these should in the simple case be propogated up as properties of the feature, but your milage may vary. Good luck. MatthewFrom birney@ebi.ac.uk Fri Dec 15 15:16:24 2000 Date: Fri, 15 Dec 2000 15:16:24 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] back to the grind of 0.7
Ok. Over the weekend and on monday I will have time to work again on bioperl. Here is my proposed to do list. (1) Drudgery: Bio::PrimarySeq / PrimarySeqI to be pulled up to QC criteria. Bio::SimpleAlign up to QC criteria as well. (2) More fun stuff. Refactor Bio::Seq set to be more flexible for other formats. I am going to proprose a class Bio::Seq::GenEMBL which derives off Bio::Seq, and has extensions for GenEMBL stuff, being the date, sv etc etc stuff. Does this sound ok? I don't think we need separate GenBank/EMBL classes as the information in the files are pretty concordant (of course, if someone wants to help me with ASN.1 parsing into this, join the party!) (3) See what I can get out of Keith's PSU code into bioperl. Sounds good? Hilmar - how are we doing - there aren't many green lights on the task list, but I think that's because we haven't updated the Wiki... ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From heikki@ebi.ac.uk Fri Dec 15 17:15:04 2000 Date: Fri, 15 Dec 2000 17:15:04 +0000 From: Heikki Lehvaslaiho heikki@ebi.ac.uk Subject: [Bioperl-l] back to the grind of 0.7
Ewan Birney wrote: > Sounds good? Hilmar - how are we doing - there aren't many green lights on > the task list, but I think that's because we haven't updated the Wiki... Updating the wiki page http://www.bioperl.org/wiki/html/BioPerl/ModuleQCList.html is harder than fixing the code! After the initial shock after seeing the HTML, I managed to put those Xs into their proper places on CodonTable and Variation modules. :-) -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________From schattner@alum.mit.edu Fri Dec 15 18:40:25 2000 Date: Fri, 15 Dec 2000 10:40:25 -0800 From: Peter Schattner schattner@alum.mit.edu Subject: [Bioperl-l] Bioperl NT update
Hilmar Lapp wrote: > > Peter Schattner wrote: > > I think the best bet would be to completely skip all the tests relating > > to these modules unless the host OS is a Unix variant. Alternatively, > > the tests could presumably live with precomputed output to be used if > > the Bioperl installation test-harness detects a non-unix environment . > > However I do not know how to incorporate either of these approaches in > > the Bioperl installation. Can anyone help? > > > > $ perl -e 'print $^O,"\n";' > Thanks for the tip. Thanks also to Todd & JC for the info re what to expect in windows & Macs. I will use this info to skip around the relevant tests in Clustalw.t, etc. (Shelly, do I understand from your last post that the problem has gone away - under NT at least?) > > What is the problem with precomputed results? Perhaps I don’t understand what is meant by precomputed results. I assumed you meant making a file with the answers that the test program expects and using those answers if the OS is non-unix. That approach seems less maintainable than simply skipping the tests if the OS is non-unix. -- PeterFrom birney@ebi.ac.uk Fri Dec 15 18:46:19 2000 Date: Fri, 15 Dec 2000 18:46:19 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Bioperl NT update
On Fri, 15 Dec 2000, Peter Schattner wrote: > > > > > > What is the problem with precomputed results? > > Perhaps I don’t understand what is meant by precomputed results. I > assumed you meant making a file with the answers that the test program > expects and using those answers if the OS is non-unix. That approach > seems less maintainable than simply skipping the tests if the OS is > non-unix. I think the tests work best with precomputed results. Trying to call out to executables in the test suite sounds like we are just going ot get ourselves into trouble on unix as well as mac/NT. precomputed results are fine in my view. > > -- Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From hlapp@gmx.net Fri Dec 15 18:59:40 2000 Date: Fri, 15 Dec 2000 10:59:40 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] back to the grind of 0.7
Ewan Birney wrote: > > (2) More fun stuff. > > Refactor Bio::Seq set to be more flexible for other formats. > > I am going to proprose a class > > Bio::Seq::GenEMBL > > which derives off Bio::Seq, and has extensions for GenEMBL stuff, being > the date, sv etc etc stuff. Does this sound ok? > Sounds very ok :-) In addition, if no-one else has done it already, over the weekend I'm going to delete documentation of the deprecated methods (Seq::ary() etc). > I don't think we need separate GenBank/EMBL classes as the information in > the files are pretty concordant (of course, if someone wants to help me > with ASN.1 parsing into this, join the party!) > I wrote an email to info@ncbi.nlm.nih.gov asking for a pointer to documentation of viewer.cgi mentioned by Lewis Geer, but no response yet. So, it might be that XML support is not in a productive state yet. Hmm. > (3) See what I can get out of Keith's PSU code into bioperl. > > Sounds good? Hilmar - how are we doing - there aren't many green lights on > the task list, but I think that's because we haven't updated the Wiki... > Sounds good. Just updated the task list. We're doing okay I think. There may not be many greens, but there are some other colors than only red. In a sense we're behind schedule, but given the fact that this is a volunteer-only game and most of us appear to be heavily involved in their primary jobs, I'm not worried as long as the table keeps changing (and a red-shift is by definition excluded :-) If we hadn't Jason on board, things would look significantly worse. He's been the driving force code-wise so far. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From hlapp@gmx.net Fri Dec 15 19:10:18 2000 Date: Fri, 15 Dec 2000 11:10:18 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Bioperl NT update
Peter Schattner wrote: > > > > What is the problem with precomputed results? > > Perhaps I don?t understand what is meant by precomputed results. I > assumed you meant making a file with the answers that the test program > expects and using those answers if the OS is non-unix. That approach > seems less maintainable than simply skipping the tests if the OS is > non-unix. > I'm not sure either what you mean. Trying to be clear, I do not mean setting up a file with the output you expect from the test script. I meant, instead of calling out to an executable clustalw, precompute the output said executable is supposed to produce, capture it in a file and feed this file into your module. This way your module can be tested on any platform, and it can also be used on any platform whether or not clustalw and friends are there, provided the user has means to obtain the results from another machine than the one he runs bioperl on. This of course requires that your modules can accept either a stream containing input or a pipe fed by one of the external executables. Am I missing something? Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From birney@ebi.ac.uk Fri Dec 15 19:18:41 2000 Date: Fri, 15 Dec 2000 19:18:41 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] wow - external dependencies
OK. Now I understand why people are complaining about external dependencies so much. It is pretty scary out there when you don't have things installed. I propose that we have a less "in your face" set of warnings, with Maekfile.PL telling what will work and what wont work in a nice, polite way, making a file about what needs to be done to get a working system. Perhaps printing out in large letters ** DONT PANIC ** might also help ;) ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From murad@godel.bioc.columbia.edu Fri Dec 15 13:14:16 2000 Date: Fri, 15 Dec 2000 14:14:16 +0100 From: Murad Nayal murad@godel.bioc.columbia.edu Subject: [Bioperl-l] printing UnivAlgn
Hello, I can not find a straightforward way to get a SimpleAlgn from a UnivAlgn. Why would you want a SimpleAlgn you may ask. well, because UnivAlgn seems limited in the number of output formats possible (only fasta and raw, neither is optimal in printing alignments for visual inspection). This seems to still be true in 0.7. also, AlignIO only takes SimpleAlgn (is that a bug or a 'feature'?). so I suppose the questions are 1- how do you get a SimpleAln from UnivAln (short of saving UnivAln to a file in fasta format and reading it back again in SimpleAln, which by the way does not seems to work smoothly? 2- how do you pretty print a UnivAlgn (say clustalw format)? thanks for the help. -- Murad Nayal M.D. Ph.D. Department of Biochemistry and Molecular Biophysics College of Physicians and Surgeons of Columbia University 630 West 168th Street. New York, NY 10032 Tel: 212-305-6884 Fax: 212-305-6926From dag@sonsorol.org Fri Dec 15 19:54:04 2000 Date: Fri, 15 Dec 2000 14:54:04 -0500 (EST) From: Chris Dagdigian dag@sonsorol.org Subject: [Bioperl-l] help test Bundle::BioPerl
Hi folks, BioPerl::Bundle 1.0 is not in CPAN yet but is available at ftp://bioperl.org/pub/dag/Bundle-BioPerl-1.00.tar.gz If you would like to help test the bundle all you need to do is download and install the module locally. Then you can use CPAN.pm to test it's functionality. eg; perl -MCPAN -e 'install Bundle::BioPerl' The POD docs are appended below. Things get really messy when CPAN.pm tries to install the XML modules without the presense of expat... After the module has been uploaded to CPAN anyone will be able to run the command 'install Bundle::BioPerl'.... Regards, Chris ===================================================================== NAME Bundle::BioPerl - A bundle to install external CPAN modules used by BioPerl SYNOPSIS Perl one liner using CPAN.pm: perl -MCPAN -e 'install Bundle::BioPerl' Use of CPAN.pm in interactive mode: $> perl -MCPAN -e shell cpan> install Bundle::BioPerl cpan> quit Just like the manual installation of perl modules, the user may need root access during this process to insure write permission is allowed within the intstallation directory. CONTENTS Bundle::LWP - recommended, used for network access File::Temp - recommended, used for safe/portable tempfile creation IO::Scalar - optional, used only in Bio::Tools::Blast::Run::WebBlast.pm IO::String - recommended, used by Bio::DB:WebDBSeqI HTTP::Request::Common - recommended, used for web access HTTP::Status - recommended, used for web access LWP::UserAgent - recommended, used for web access URI::Escape - recommended, used for web access XML::Parser - recommended for bioperl releases after 0.6.2 XML::Parser::PerlSAX - recommended for bioperl releases after 0.6.2 XML::Writer - recommended for bioperl releases after 0.6.2 XML::Node - recommended for bioperl releases after 0.6.2 DESCRIPTION The BioPerl distribution from http://bioperl.org contains code and modules that may use or require additional 'external' perl modules for advanced functionality. Many of the external modules are not contained within the standard Perl distribution. These external modules can be obtained from the Comprehensive Perl Archive Network (CPAN) located at http://www.cpan.org. This perl module (Bundle::BioPerl) contains NO functionality or real code at all. It is essentially a special perl module meant to be used by the CPAN.pm module to simplify the task of automatically installing multiple modules in one easy step. Essentially users can tell CPAN.pm to 'install Bundle::BioPerl' and CPAN.pm will download, install and configure all of the modules listed in the BioPerl Bundle module. See the SYNOPSIS section or do `perldoc CPAN' to learn about how to use the CPAN.pm module to install bundles. NOTE: This process is complicated by the fact that some BioPerl external modules themselves have their own dependencies and prerequisites. In particular the XML::Parser module requires the prior installation of the 'xpat' package which resides outside of CPAN at http://sourceforge.net/projects/expat/. The `install Bundle::BioPerl' process may need to be repeated several times to complete the full installation of all listed modules. NOTE: This Bundle does not install BioPerl :) Just the additional modules that BioPerl code ocasionally makes use of. You will still need to get the BioPerl distribution from CPAN or http://bioperl.org and install it the usual way: perl Makefile.PL make make test make install CPAN.pm has many features - including the ability to download but not install the modules listed in the BioPerl bundle. `perldoc CPAN' is your friend :) AUTHOR Chris Dagdigian <dag@sonsorol.org> (Author only of this bundle, not any the modules it lists)From jason@chg.mc.duke.edu Fri Dec 15 20:09:45 2000 Date: Fri, 15 Dec 2000 15:09:45 -0500 From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] Bioperl NT update
> Peter Schattner wrote: > > > > > > What is the problem with precomputed results? > > > > Perhaps I don?t understand what is meant by precomputed results. I > > assumed you meant making a file with the answers that the test program > > expects and using those answers if the OS is non-unix. That approach > > seems less maintainable than simply skipping the tests if the OS is > > non-unix. > > > > I'm not sure either what you mean. Trying to be clear, I do not mean > setting up a file with the output you expect from the test script. I > meant, instead of calling out to an executable clustalw, precompute the > output said executable is supposed to produce, capture it in a file and > feed this file into your module. This way your module can be tested on > any platform, and it can also be used on any platform whether or not > clustalw and friends are there, provided the user has means to obtain > the results from another machine than the one he runs bioperl on. > > This of course requires that your modules can accept either a stream > containing input or a pipe fed by one of the external executables. > > Am I missing something? The StandAloneBlast, ClustalW, and TCoffee modules are intended to be wrappers around actually running these programs (one of my problems with Bio::Tools is that we do not have a directory structure that differentiates processing results and running programs!) rather than processing results. The results processing for TCoffee and ClustalW (MSA tools) is done in Bio::AlignIO, while the StandAloneBlast result is sent to Bio::Tools::BPlite or Bio::Tools::Blast. So there really is not way to run these on Win32 or Mac so these tests should just be skipped (except for blast which can be run on Win32). -jason > > Hilmar > > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l >From birney@ebi.ac.uk Fri Dec 15 20:40:46 2000 Date: Fri, 15 Dec 2000 20:40:46 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] PrimarySeq, Seq fixes
I have moved PrimarySeq and Seq over to direct new() syntax. This had knock on effects all over the place (Bio::Seq inherieting modules) which I then chained up. I am not sure whether this was good or bad. On my linux box, the new LargePrimarySeq stuf did not work at all. I tired to fix it by making a filehandle object, but this didn't work. We are failing tests here --- Jason - any chance of looking over this - I can't seem to get a FileHandle to open read/write... I'm moving towards Bio::Seq::GenEMBL e. ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From lapp@gnf.org Fri Dec 15 20:41:38 2000 Date: Fri, 15 Dec 2000 12:41:38 -0800 From: Hilmar Lapp lapp@gnf.org Subject: [Bioperl-l] Bioperl NT update
Jason Stajich wrote: > > > Peter Schattner wrote: > > > > > > > > What is the problem with precomputed results? > > > > > > Perhaps I don?t understand what is meant by precomputed results. I > > > assumed you meant making a file with the answers that the test program > > > expects and using those answers if the OS is non-unix. That approach > > > seems less maintainable than simply skipping the tests if the OS is > > > non-unix. > > > > > > > I'm not sure either what you mean. Trying to be clear, I do not mean > > setting up a file with the output you expect from the test script. I > > meant, instead of calling out to an executable clustalw, precompute the > > output said executable is supposed to produce, capture it in a file and > > feed this file into your module. This way your module can be tested on > > any platform, and it can also be used on any platform whether or not > > clustalw and friends are there, provided the user has means to obtain > > the results from another machine than the one he runs bioperl on. > > > > This of course requires that your modules can accept either a stream > > containing input or a pipe fed by one of the external executables. > > > > Am I missing something? > > The StandAloneBlast, ClustalW, and TCoffee modules are intended to be > wrappers around actually running these programs (one of my problems with > Bio::Tools is that we do not have a directory structure that differentiates > processing results and running programs!) rather than processing results. > The results processing for TCoffee and ClustalW (MSA tools) is done in > Bio::AlignIO, while the StandAloneBlast result is sent to Bio::Tools::BPlite > or Bio::Tools::Blast. So there really is not way to run these on Win32 Okay. Sorry Peter for confusing this. What about Bio::Tools::Run as the home for modules wrapping real execution of something? Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 -------------------------------------------------------------From dag@sonsorol.org Fri Dec 15 21:16:08 2000 Date: Fri, 15 Dec 2000 16:16:08 -0500 From: chris dagdigian dag@sonsorol.org Subject: [Bioperl-l] help test Bundle::BioPerl
Brian-- {cc'ing this to bioperl-l in case others have the same issue...} You are correct that the Bundle::BioPerl distribution only contains one simple perl module called BioPerl.pm. This is actually perfectly correct :) The way that CPAN.pm works with Bundle modules is that it looks inside the POD documentation for a specially typed section called 'CONTENTS'. When it finds this section e CPAN.pm will actually go out and download/install all of the modules that are mentioned in the CONTENTS section of the module POD docs. I know this is weird -- essentially CPAN.pm is reading the documentation of the Bundle::BioPerl module to determine what to install. In real world use you would never actually locally install Bundle::BioPerl (or even look at the embedded POD docs) you would simply invoke 'install Bundle::BioPerl' within the CPAN.pm interactive shell and it would download the Bundle file from CPAN and then start working on the contents. Make sense? This will be very confusing to anyone who has not used the CPAN.pm module to handle automatic installation and upgrading of perl modules. This is sort of a 'fringe' use of CPAN but it is becoming more and more useful. I can't tell you how happy I am at being able to simply type 'perl -MCPAN -e 'install Bundle::libnet'' instead of downloading all those darn modules one at a time... Regards, Chris At 04:06 PM 12/15/00 -0500, you wrote: >Chris, > >I'm having trouble with this file, I'm only seeing BioPerl.pm, Makefile.PL, >Changes, README, and MANIFEST inside the Bundle-Bioperl-1.00 directory. Is >this just me? > >Brian O.From jason@chg.mc.duke.edu Fri Dec 15 22:08:15 2000 Date: Fri, 15 Dec 2000 17:08:15 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] PrimarySeq, Seq fixes
After reading some more IO documentation I realized there are some other options: solution 1: remove Tempfile creation, replace line to wrap $tfh into FileHandle with the following: my $fh = IO::File->new_tmpfile(); Not sure if this work on Win32 or Mac and I'm not sure where the temporary file is stored exactly so I'm guessing its not the best solution... Solution 2: Keep using File::Temp; replace use of FileHandle as the following: my $fh = IO::File->new($file, O_RDWR); Both work on linux 2.2.17 x86 running perl 5.00503 and solaris 2.7 running perl 5.6.0 I think File::Temp says that problems with #2 are that can't guarantee that filename will not be used again by another process - using filehandle is better, but I think as long as we don't close the filehandle until after a new one is opened on the file, there is no problem. So I will implement second solution unless anyone has suggestions or comments to the contrary or can think of how other solutions might work better. -Jason On Fri, 15 Dec 2000, Ewan Birney wrote: > > > I have moved PrimarySeq and Seq over to direct new() syntax. This had > knock on effects all over the place (Bio::Seq inherieting modules) which I > then chained up. I am not sure whether this was good or bad. > > On my linux box, the new LargePrimarySeq stuf did not work at all. I tired > to fix it by making a filehandle object, but this didn't work. We are > failing tests here --- Jason - any chance of looking over this - I can't > seem to get a FileHandle to open read/write... > > > > I'm moving towards Bio::Seq::GenEMBL > > > e. > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > <birney@ebi.ac.uk>. > ----------------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From schattner@alum.mit.edu Sat Dec 16 00:55:23 2000 Date: Fri, 15 Dec 2000 16:55:23 -0800 From: Peter Schattner schattner@alum.mit.edu Subject: [Bioperl-l] printing UnivAlgn
Murad Nayal wrote: > Hello, > > I can not find a straightforward way to get a SimpleAlgn from a > UnivAlgn. Why would you want a SimpleAlgn you may ask. well, because > UnivAlgn seems limited in the number of output formats possible (only > fasta and raw, neither is optimal in printing alignments for visual > inspection). This seems to still be true in 0.7. also, AlignIO only > takes SimpleAlgn (is that a bug or a 'feature'?). > > so I suppose the questions are > 1- how do you get a SimpleAln from UnivAln (short of saving UnivAln to a > file in fasta format and reading it back again in SimpleAln, which by > the way does not seems to work smoothly? > 2- how do you pretty print a UnivAlgn (say clustalw format)? > > thanks for the help. Unfortunately there currently is no straightforward way to convert between SimpleAlign and UnivAln objects. This is neither a "feature" nor a bug. Rather an unfortunate historical consequence of the fact that the two modules were developed independently. In developing AlignIO.pm, I chose to use SimpleAlign for the alignment objects because it was easier and met all of my needs at the time. As I've later needed a few features from UnivAln, I've added them to SimpleAlign recently. If you need some specific feature of UnivAln, let me know and if it's not too complicated I'll see about adding it to the methods of SimpleAlign. As for converting all of UnivAln's capabilities to SimpleAlign format, it's a bear I don't want to take on at this point, but if some else wants to, that would be fine (then UnivAln could just disappear). In the interim, both UnivAln and SimpleAlign (via AlignIO) read and write fasta formatted alignment files so you can always convert between the two alignment objects that way (it's kludgy but it should work) - PeterFrom murad@godel.bioc.columbia.edu Fri Dec 15 20:07:19 2000 Date: Fri, 15 Dec 2000 21:07:19 +0100 From: Murad Nayal murad@godel.bioc.columbia.edu Subject: [Bioperl-l] printing UnivAlgn
is UnivAln being phased out? if not then maybe it is worth it to make UnivAln conform to 'the SimpleAlign interface'. I am guessing this is probably a simple thing to do and would make at least AlignIO able to print either alignment object in all formats etc. Peter Schattner wrote: > > Murad Nayal wrote: > > > Hello, > > > > I can not find a straightforward way to get a SimpleAlgn from a > > UnivAlgn. Why would you want a SimpleAlgn you may ask. well, because > > UnivAlgn seems limited in the number of output formats possible (only > > fasta and raw, neither is optimal in printing alignments for visual > > inspection). This seems to still be true in 0.7. also, AlignIO only > > takes SimpleAlgn (is that a bug or a 'feature'?). > > > > so I suppose the questions are > > 1- how do you get a SimpleAln from UnivAln (short of saving UnivAln to a > > file in fasta format and reading it back again in SimpleAln, which by > > the way does not seems to work smoothly? > > 2- how do you pretty print a UnivAlgn (say clustalw format)? > > > > thanks for the help. > > Unfortunately there currently is no straightforward way to convert between > SimpleAlign and UnivAln objects. This is neither a "feature" nor a bug. > Rather an unfortunate historical consequence of the fact that the two > modules were developed independently. > > In developing AlignIO.pm, I chose to use SimpleAlign for the alignment > objects because it was easier and met all of my needs at the time. As I've > later needed a few features from UnivAln, I've added them to SimpleAlign > recently. If you need some specific feature of UnivAln, let me know and if > it's not too complicated I'll see about adding it to the methods of > SimpleAlign. As for converting all of UnivAln's capabilities to > SimpleAlign format, it's a bear I don't want to take on at this point, but > if some else wants to, that would be fine (then UnivAln could just > disappear). > > In the interim, both UnivAln and SimpleAlign (via AlignIO) read and write > fasta formatted alignment files so you can always convert between the two > alignment objects that way (it's kludgy but it should work) > > - Peter > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l -- Murad Nayal M.D. Ph.D. Department of Biochemistry and Molecular Biophysics College of Physicians and Surgeons of Columbia University 630 West 168th Street. New York, NY 10032 Tel: 212-305-6884 Fax: 212-305-6926From todd@andrew2.stanford.edu Sat Dec 16 02:57:04 2000 Date: Fri, 15 Dec 2000 18:57:04 -0800 From: Todd Richmond todd@andrew2.stanford.edu Subject: [Bioperl-l] PrimarySeq, Seq fixes
On 12/15/00 2:08 PM, "Jason Stajich" <jason@chg.mc.duke.edu> wrote: > After reading some more IO documentation I realized > there are some other options: > > solution 1: > remove Tempfile creation, > replace line to wrap $tfh into FileHandle with the following: > > my $fh = IO::File->new_tmpfile(); > > Not sure if this work on Win32 or Mac and I'm not sure where the > temporary file is stored exactly so I'm guessing its not the best > solution... > This works fine on Macs. As for temporary file storage - we have the ability to specify a TMPDIR in the MacPerl preferences. In fact it's specified by default as the MacOS's temporary folder. I would assume this is where temp files are stored. -- Dr Todd Richmond http://cellwall.stanford.edu/todd Carnegie Institution email: todd@andrew2.stanford.edu Department of Plant Biology fax: 1-650-325-6857 260 Panama Street phone: 1-650-325-1521 x431 Stanford, CA 94305From murad@godel.bioc.columbia.edu Fri Dec 15 23:03:22 2000 Date: Sat, 16 Dec 2000 00:03:22 +0100 From: Murad Nayal murad@godel.bioc.columbia.edu Subject: [Bioperl-l] printing UnivAlgn
This is a multi-part message in MIME format. --------------488DFB5ACD9A6682D9C2A501 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi Peter, Ok, so implementing all of SimpleAlign interface in UnivAln is not the most straightforward thing in the world. for one the internal representation of sequences in the two are very different. nonetheless you only use three functions in AlignIO to output the alignment (at least in AlignIO::clustalw and a couple of other classes). I implemented these functions in UnivAln (in terms of UnivAln interface) and it seems to allow AlignIO to print out UnivAln as you would expect. While I was at it I implemented a function to get a SimpleAlign from UnivAln. these new functions, getSimpleAlign() and eachSeq(), are inefficient, they create brand new LocatableSeqs every time they're called. but to augment UnivAln and have it maintain a permanent set of LocatableSeqs needs some substantial effort to ensure consistency between these sequences and the UnivAln->{seq} array, which is too much work for tonight! :-) the diffs are attached. By the way, I found it useful to modify AlignIO::clustalw a bit to make sure that the sequence name does not exceed the space allocated to it in the printed alignment. diffs for this is attached as well. Regards, Peter Schattner wrote: > > Murad Nayal wrote: > > > is UnivAln being phased out? > > It would be nice if UnivAln were phased out. But since it still has lots of > features that some people may be using this doesn't seem likely to happen very > soon. > > > if not then maybe it is worth it to make > > UnivAln conform to 'the SimpleAlign interface'. I am guessing this is > > probably a simple thing to do > > Well it didn't seem simple to me, but take a look at it and if you can see a > simple way of doing it, do let me know (or better yet, implement it! :-) > > - Peter -- Murad Nayal M.D. Ph.D. Department of Biochemistry and Molecular Biophysics College of Physicians and Surgeons of Columbia University 630 West 168th Street. New York, NY 10032 Tel: 212-305-6884 Fax: 212-305-6926 --------------488DFB5ACD9A6682D9C2A501 Content-Type: text/plain; charset=us-ascii; name="clustalw.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="clustalw.diff" *** /local/lib/perl5/site_perl/5.6.0/Bio/AlignIO/clustalw.pm Fri Dec 15 23:41:43 2000 --- /local/lib/perl5/site_perl/5.6.0/Bio/AlignIO/clustalw.pm.bk1 Fri Dec 15 23:40:46 2000 *************** *** 133,141 **** $substring = ""; } ! $self->_print (sprintf("%-22s %s\n", ! substr($aln->get_displayname($seq->get_nse()),0,20),$substring)) ! or return; } $self->_print (sprintf("\n\n")) or return; $count += 50; --- 133,139 ---- $substring = ""; } ! $self->_print (sprintf("%-22s %s\n",$aln->get_displayname($seq->get_nse()),$substring)) or return; } $self->_print (sprintf("\n\n")) or return; $count += 50; --------------488DFB5ACD9A6682D9C2A501 Content-Type: text/plain; charset=us-ascii; name="UnivAln.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="UnivAln.diff" *** /local/lib/perl5/site_perl/5.6.0/Bio/UnivAln.pm Fri Dec 15 23:42:17 2000 --- /local/lib/perl5/site_perl/5.6.0/Bio/UnivAln.pm.bk1 Mon Oct 2 17:20:59 2000 *************** *** 3368,3416 **** print "Caught internal error"; } - # the following subs were added 12/15/2000 (Murad Nayal) - - sub length_aln() { - my $self = shift; - return $self->width(); - } - - sub get_displayname() { - my $self = shift; - my $name = shift; - my $id = $self->id(); - if(defined($id) && $id ne "_") { - return $id; - } else { - return $name; - } - } - - sub eachSeq() { - my $self = shift; - - my @seqStrings = map {join("",@$_)} $self->seqs(); - my $seqIds = $self->row_ids(); - my @seqs; - foreach my $seqIdx (0..$#seqStrings) { - push(@seqs,Bio::LocatableSeq->new('-seq' => $seqStrings[$seqIdx], - '-id' => $$seqIds [$seqIdx] )); - } - return @seqs; - } - - sub getSimpleAlign() { - my $self = shift; - my $aln = Bio::SimpleAlign->new(); - my @seqStrings = map {join("",@$_)} $self->seqs(); - my $seqIds = $self->row_ids(); - foreach my $seqIdx (0..$#seqStrings) { - $aln->addSeq(Bio::LocatableSeq->new('-seq' => $seqStrings[$seqIdx], - '-id' => $$seqIds [$seqIdx] )); - } - return $aln; - } - 1; __END__ --- 3368,3373 ---- --------------488DFB5ACD9A6682D9C2A501--From j_martin@lbl.gov Sat Dec 16 05:25:28 2000 Date: Fri, 15 Dec 2000 21:25:28 -0800 From: Joel Martin j_martin@lbl.gov Subject: [Bioperl-l] PrimarySeq, Seq fixes
For windowws NT 4 using activestate perl buil 522 (5.003) this is creating the tempfile in the root of the system disk, disregarding what I've set for TEMP and TMP. joel lbnl, jgi, doe and pga informatics drone At 06:57 PM 12/15/00 -0800, you wrote: >On 12/15/00 2:08 PM, "Jason Stajich" <jason@chg.mc.duke.edu> wrote: > > > After reading some more IO documentation I realized > > there are some other options: > > > > solution 1: > > remove Tempfile creation, > > replace line to wrap $tfh into FileHandle with the following: > > > > my $fh = IO::File->new_tmpfile(); > > > > Not sure if this work on Win32 or Mac and I'm not sure where the > > temporary file is stored exactly so I'm guessing its not the best > > solution... > > > >This works fine on Macs. As for temporary file storage - we have the ability >to specify a TMPDIR in the MacPerl preferences. In fact it's specified by >default as the MacOS's temporary folder. I would assume this is where temp >files are stored. > >-- >Dr Todd Richmond http://cellwall.stanford.edu/todd >Carnegie Institution email: todd@andrew2.stanford.edu >Department of Plant Biology fax: 1-650-325-6857 >260 Panama Street phone: 1-650-325-1521 x431 >Stanford, CA 94305 > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@bioperl.org >http://bioperl.org/mailman/listinfo/bioperl-lFrom birney@ebi.ac.uk Sat Dec 16 10:55:43 2000 Date: Sat, 16 Dec 2000 10:55:43 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] printing UnivAlgn
On Fri, 15 Dec 2000, Murad Nayal wrote: > > > is UnivAln being phased out? if not then maybe it is worth it to make > UnivAln conform to 'the SimpleAlign interface'. I am guessing this is > probably a simple thing to do and would make at least AlignIO able to > print either alignment object in all formats etc. Feel free to jump in and sort this out! (whoever codes it, wins the argument...) I agree with Peter tht we should focus efforts on one of them, which seems to be SimpleAlign and move functionality into it. What do you use UnivAln for? > > Peter Schattner wrote: > > > > Murad Nayal wrote: > > > > > Hello, > > > > > > I can not find a straightforward way to get a SimpleAlgn from a > > > UnivAlgn. Why would you want a SimpleAlgn you may ask. well, because > > > UnivAlgn seems limited in the number of output formats possible (only > > > fasta and raw, neither is optimal in printing alignments for visual > > > inspection). This seems to still be true in 0.7. also, AlignIO only > > > takes SimpleAlgn (is that a bug or a 'feature'?). > > > > > > so I suppose the questions are > > > 1- how do you get a SimpleAln from UnivAln (short of saving UnivAln to a > > > file in fasta format and reading it back again in SimpleAln, which by > > > the way does not seems to work smoothly? > > > 2- how do you pretty print a UnivAlgn (say clustalw format)? > > > > > > thanks for the help. > > > > Unfortunately there currently is no straightforward way to convert between > > SimpleAlign and UnivAln objects. This is neither a "feature" nor a bug. > > Rather an unfortunate historical consequence of the fact that the two > > modules were developed independently. > > > > In developing AlignIO.pm, I chose to use SimpleAlign for the alignment > > objects because it was easier and met all of my needs at the time. As I've > > later needed a few features from UnivAln, I've added them to SimpleAlign > > recently. If you need some specific feature of UnivAln, let me know and if > > it's not too complicated I'll see about adding it to the methods of > > SimpleAlign. As for converting all of UnivAln's capabilities to > > SimpleAlign format, it's a bear I don't want to take on at this point, but > > if some else wants to, that would be fine (then UnivAln could just > > disappear). > > > > In the interim, both UnivAln and SimpleAlign (via AlignIO) read and write > > fasta formatted alignment files so you can always convert between the two > > alignment objects that way (it's kludgy but it should work) > > > > - Peter > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > -- > Murad Nayal M.D. Ph.D. > Department of Biochemistry and Molecular Biophysics > College of Physicians and Surgeons of Columbia University > 630 West 168th Street. New York, NY 10032 > Tel: 212-305-6884 Fax: 212-305-6926 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From birney@ebi.ac.uk Sat Dec 16 11:04:45 2000 Date: Sat, 16 Dec 2000 11:04:45 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] printing UnivAlgn
On Sat, 16 Dec 2000, Murad Nayal wrote: > > > Hi Peter, > > Ok, so implementing all of SimpleAlign interface in UnivAln is not the > most straightforward thing in the world. for one the internal > representation of sequences in the two are very different. nonetheless > you only use three functions in AlignIO to output the alignment (at > least in AlignIO::clustalw and a couple of other classes). I implemented > these functions in UnivAln (in terms of UnivAln interface) and it seems > to allow AlignIO to print out UnivAln as you would expect. While I was > at it I implemented a function to get a SimpleAlign from UnivAln. these > new functions, getSimpleAlign() and eachSeq(), are inefficient, they > create brand new LocatableSeqs every time they're called. but to augment > UnivAln and have it maintain a permanent set of LocatableSeqs needs some > substantial effort to ensure consistency between these sequences and the > UnivAln->{seq} array, which is too much work for tonight! :-) I am impressed! It sounds like you need to get a cvs login... > > the diffs are attached. > > By the way, I found it useful to modify AlignIO::clustalw a bit to make > sure that the sequence name does not exceed the space allocated to it in > the printed alignment. diffs for this is attached as well. > > Regards, > > Peter Schattner wrote: > > > > Murad Nayal wrote: > > > > > is UnivAln being phased out? > > > > It would be nice if UnivAln were phased out. But since it still has lots of > > features that some people may be using this doesn't seem likely to happen very > > soon. > > > > > if not then maybe it is worth it to make > > > UnivAln conform to 'the SimpleAlign interface'. I am guessing this is > > > probably a simple thing to do > > > > Well it didn't seem simple to me, but take a look at it and if you can see a > > simple way of doing it, do let me know (or better yet, implement it! :-) > > > > - Peter > > -- > Murad Nayal M.D. Ph.D. > Department of Biochemistry and Molecular Biophysics > College of Physicians and Surgeons of Columbia University > 630 West 168th Street. New York, NY 10032 > Tel: 212-305-6884 Fax: 212-305-6926 ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From murad@godel.bioc.columbia.edu Sat Dec 16 08:47:50 2000 Date: Sat, 16 Dec 2000 09:47:50 +0100 From: Murad Nayal murad@godel.bioc.columbia.edu Subject: [Bioperl-l] printing UnivAlgn
Ewan Birney wrote: > > On Fri, 15 Dec 2000, Murad Nayal wrote: > > > > > > > is UnivAln being phased out? if not then maybe it is worth it to make > > UnivAln conform to 'the SimpleAlign interface'. I am guessing this is > > probably a simple thing to do and would make at least AlignIO able to > > print either alignment object in all formats etc. > > Feel free to jump in and sort this out! (whoever codes it, wins the > argument...) > > I agree with Peter tht we should focus efforts on one of them, which seems > to be SimpleAlign and move functionality into it. What do you use UnivAln > for? > Hi Ewan that's how I feel too. maintaining them as separate classes with incompatible interfaces is 'inelegant' I feel as they represent a single concept. I like SimpleAlign better because it maintains the encapsulation of the sequence idea internally (as LocatableSeq), while UnivAln essentially reimplements the Seq object. that makes its code more complicated than it probably needs to be. I actually haven't needed to use UnivAln specialized facilities yet. I had to deal with it because it is the Alignment object that Tools::Blast::HSP builds out of Blast reports! Regards MuradFrom murad@godel.bioc.columbia.edu Sat Dec 16 08:51:45 2000 Date: Sat, 16 Dec 2000 09:51:45 +0100 From: Murad Nayal murad@godel.bioc.columbia.edu Subject: [Bioperl-l] printing UnivAlgn
Ewan Birney wrote: > > On Sat, 16 Dec 2000, Murad Nayal wrote: > > > > > > > Hi Peter, > > > > Ok, so implementing all of SimpleAlign interface in UnivAln is not the > > most straightforward thing in the world. for one the internal > > representation of sequences in the two are very different. nonetheless > > you only use three functions in AlignIO to output the alignment (at > > least in AlignIO::clustalw and a couple of other classes). I implemented > > these functions in UnivAln (in terms of UnivAln interface) and it seems > > to allow AlignIO to print out UnivAln as you would expect. While I was > > at it I implemented a function to get a SimpleAlign from UnivAln. these > > new functions, getSimpleAlign() and eachSeq(), are inefficient, they > > create brand new LocatableSeqs every time they're called. but to augment > > UnivAln and have it maintain a permanent set of LocatableSeqs needs some > > substantial effort to ensure consistency between these sequences and the > > UnivAln->{seq} array, which is too much work for tonight! :-) > > I am impressed! It sounds like you need to get a cvs login... > Thanks Ewan, that would be very cool! :-). -- Murad Nayal M.D. Ph.D. Department of Biochemistry and Molecular Biophysics College of Physicians and Surgeons of Columbia University 630 West 168th Street. New York, NY 10032 Tel: 212-305-6884 Fax: 212-305-6926From schattner@alum.mit.edu Sat Dec 16 19:26:32 2000 Date: Sat, 16 Dec 2000 11:26:32 -0800 From: Peter Schattner schattner@alum.mit.edu Subject: [Bioperl-l] printing UnivAlgn
Hi Murad First let me ditto Ewan’s comments – I think it’s great that you’re jumping in to the SimpleAlign – UnivAln "jungle". Combining these two modules into a single alignment object would be very desirable – and probably not that difficult. It simply hasn’t been where my interests have been lately – so I am glad to see someone else taking up the challenge. I do agree with Ewan that it probably would be better to add UnivAln functionality to SimpleAlign than the other way around. My only other suggestion is that if you are planning major change(s) to SimpleAlign.pm and/or AlignIO.pm, you first post a summary of what you plan to change to the Bioperl list, so you can get feedback on the possible implications of such changes before you get too deeply involved in coding. Welcome aboard. --- PeterFrom murad@godel.bioc.columbia.edu Sun Dec 17 02:58:35 2000 Date: Sun, 17 Dec 2000 03:58:35 +0100 From: Murad Nayal murad@godel.bioc.columbia.edu Subject: [Bioperl-l] printing UnivAlgn
Peter Schattner wrote: > > Hi Murad > > First let me ditto Ewan?s comments - I think it?s great that you?re > jumping in to the SimpleAlign - UnivAln "jungle". Combining these two > modules into a single alignment object would be very desirable - and > probably not that difficult. It simply hasn?t been where my interests > have been lately - so I am glad to see someone else taking up the challenge. > > I do agree with Ewan that it probably would be better to add UnivAln > functionality to SimpleAlign than the other way around. My only other > suggestion is that if you are planning major change(s) to SimpleAlign.pm > and/or AlignIO.pm, you first post a summary of what you plan to change > to the Bioperl list, so you can get feedback on the possible > implications of such changes before you get too deeply involved in coding. > > Welcome aboard. > > --- Peter Thank you Peter. There are a couple of applications that I want to code using bioperl. I will probably wait till the completion of this work before I suggest major changes. I am thinking that that this experience will probably be a good source of ideas as to what additional facilities might be useful in 'the' Alignment object. I will certainly post the propositions to the list before I start hacking away. I kind of was wondering about the bioperl coding style (these questions did not seem to be answered in the docs). -it seems people like the C++ over the java style in naming functions. i.e. function name words are small letter throughout and separated by an underscore. but this is not universally true in bioperl code. I really like the 'java' style where you capitalize the first letter of every word (except the first) and no underscore. is there a 'bioperl' standard in this regard? -what seems to be very consistent style in bioperl is the use of a dual purpose accessor/mutator function that have the same name as the associated property. The issue that I have with this approach is that you are spending runtime resources to resolve something (whether the function was used as an accessor or a mutator) that was well defined at compile time. as a result I prefer usually to use separate accessor/mutator functions i.e. getProperty() and setProperty(). (again, the java influence). I know it is more of a hassle to write two functions instead of one, but still. -functions that return a set of objects are named using differing conventions in bioperl. one convention that I like is for a function that say returns a set of alignments is just to be named getAlignments() as opposed to names like each_alignment() which I have found confusing since it sounds like representing an internal iterator that you have to call repeatedly to iterate through the sequence (similar to the each keyword in perl). anyway, just random thoughts trying to get clear about the 'style biases' of the group. Regards, Murad NayalFrom heikki@ebi.ac.uk Sun Dec 17 10:35:18 2000 Date: Sun, 17 Dec 2000 10:35:18 +0000 From: Heikki Lehvaslaiho heikki@ebi.ac.uk Subject: [Bioperl-l] Variation_IO.t [was: External dependencies]
We better move this discussion back to bioperl-l so that other can help. I've never ran perl in a Mac. It seems to me that either the XML modules are not installed properly or the Variation_IO.t is not going through the motions as it should. -Heikki Todd Richmond wrote: > > On 12/15/00 1:30 AM, "Heikki Lehvaslaiho" <heikki@ebi.ac.uk> wrote: > > > > > I think I know what is going on. In perl distribution 5.004 and before > > Text::Wrap (version 97.011701) was able to wrap only on word boundary. > > The latest version (98.112902) has this fixed. > > > > This indeed fixes the problem - however, it appears that not all of the > tests are run. It starts with "1..26", then runs through the first 17 tests > successfully and then just stops. Two new files are created: > "mutations.out.xml" and "polymorphism.out.xml", but they are both empty. In > addition, to run the tests on a Mac, I had to remove the "t/" from in front > of the file names within Variation_IO.t, because otherwise it couldn't find > them. This is probably a problem with the calling directory and differences > between Unix and MacOS in defining what that is. Moving Variation_IO.t up > one level outside of the "t" directory failed to fix the problem (which is > where I assume the "make test" would have been run from). > > -- > Dr Todd Richmond http://cellwall.stanford.edu/todd > Carnegie Institution email: todd@andrew2.stanford.edu > Department of Plant Biology fax: 1-650-325-6857 > 260 Panama Street phone: 1-650-325-1521 x431 > Stanford, CA 94305 -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________From Shailesh L Mistry"
On Sun, 17 Dec 2000, Murad Nayal wrote: > > Thank you Peter. There are a couple of applications that I want to code > using bioperl. I will probably wait till the completion of this work > before I suggest major changes. I am thinking that that this experience > will probably be a good source of ideas as to what additional facilities > might be useful in 'the' Alignment object. I will certainly post the > propositions to the list before I start hacking away. Great. You've got a login, so - post before any big commits and discuss things here where there could be controversy, but in general whoever codes it, wins... > > I kind of was wondering about the bioperl coding style (these questions > did not seem to be answered in the docs). > > -it seems people like the C++ over the java style in naming functions. > i.e. function name words are small letter throughout and separated by an > underscore. but this is not universally true in bioperl code. I really > like the 'java' style where you capitalize the first letter of every > word (except the first) and no underscore. is there a 'bioperl' standard > in this regard? > This is what I tend to write. I do prefer underscores. I reserve captialisation to mean "Object" like "SimpleAlign" > -what seems to be very consistent style in bioperl is the use of a dual > purpose accessor/mutator function that have the same name as the > associated property. The issue that I have with this approach is that > you are spending runtime resources to resolve something (whether the > function was used as an accessor or a mutator) that was well defined at > compile time. as a result I prefer usually to use separate > accessor/mutator functions i.e. getProperty() and setProperty(). (again, > the java influence). I know it is more of a hassle to write two > functions instead of one, but still. > In a *long* time of profiling bioperl code, I have never seen an accessor being a peformance bottleneck. I think a dual get/set is fine. > -functions that return a set of objects are named using differing > conventions in bioperl. one convention that I like is for a function > that say returns a set of alignments is just to be named getAlignments() > as opposed to names like each_alignment() which I have found confusing > since it sounds like representing an internal iterator that you have to > call repeatedly to iterate through the sequence (similar to the each > keyword in perl). > > The each_ syntax is my fault. I still like it, but I know that it is not universally loved. get_Alignments would be probably the ideal I think... > anyway, just random thoughts trying to get clear about the 'style > biases' of the group. > > Regards, > > Murad Nayal > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From birney@ebi.ac.uk Sun Dec 17 17:22:03 2000 Date: Sun, 17 Dec 2000 17:22:03 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Bad news for 5.0004 compatibility
It looks like we are seriously out of whack for 5.0004 compatibility. Failed Test Status Wstat Total Fail Failed List of failed ------------------------------------------------------------------------------- t/BPbl2seq.t 2 512 ?? ?? % ?? t/BPpsilite.t 2 512 ?? ?? % ?? t/LiveSeq.t 2 512 44 44 100.00% 1-44 t/StandAloneBla 8 8 100.00% 1-8 t/Variation_IO. 2 512 26 23 88.46% 4-26 t/largefasta.t 2 512 16 15 93.75% 2-16 t/largepseq.t 2 512 6 6 100.00% 1-6 Failed 7/48 test scripts, 85.42% okay. 96/587 subtests failed, 83.65% okay. A fair proportion of these are - Not skipping tests correctly when File::Temp is not installed. (I hope File::Temp becomes part of the standard Perl bundle ...) - The entire LiveSeq modules are not quoting the hash values, ie, going $self->{seq} = $something rather than $self->{'seq'} = $something 5.0004 takes offence to the former. I'm going to start clearing these things up ... ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From birney@ebi.ac.uk Sun Dec 17 18:01:19 2000 Date: Sun, 17 Dec 2000 18:01:19 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Compatibility testing.
Ok. We are much closer now... Failed Test Status Wstat Total Fail Failed List of failed ------------------------------------------------------------------------------- t/Variation_IO. 2 512 26 23 88.46% 4-26 Failed 1/48 test scripts, 97.92% okay. 23/590 subtests failed, 96.10% okay. make: *** [test_dynamic] Error 29 I need to talk to Heikki about this - something about Text::Wrap versions... perl 5.0004 has a really *bad* feature that you can't call exit() in a BEGIN { } block. This causes our "skipping test" feature not work, sadly... I had to rewrite these tests to use a more complex require system with flags. <sigh>. check out t/largepseq.t for an example. ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From birney@ebi.ac.uk Sun Dec 17 18:50:50 2000 Date: Sun, 17 Dec 2000 18:50:50 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Bio::Seq::GenEMBLI proposal
Ok. I have finally given myself some "fun coding" time (if GenBank/EMBL format compatibility is considered fun) to factoring out the more esoteric parts of GenBank/EMBL format off Bio::Seq into its own little area.... The proposal is an interface called Bio::Seq::GenEMBLI and an implementation Bio::Seq::GenEMBL. (interface allows other people to comply with the interface without using the same implementation. A "good thing" tm, in particular for database implementors). At the moment I have just taken what is in the Bio::Seq object and moved it into its own interface, written below. Decisions: (a) should we keep with the each_ syntax, or would people prefer "something returing an array of things" to have a different naming convention? (b) should date's be formatted strings or something else? (if so, what?) (c) should keyword lines be split on keywords and each_keyword methods or not? (d) should the interface extend to cover swissprot, in which case - name change? - additional methods? Here is what I have so far for this interface definition, waiting to be committed once I get the "ok" =head1 NAME Bio::Seq::GenEMBLI - Interface to a Sequence object supporting GenBank/EMBL format =head1 SYNOPSIS # Bio::Seq::GenEMBLI is-a Bio::SeqI, hence you usual # ->seq, ->subseq, ->id, ->top_SeqFeatures() is going to work # additional methods on Bio::Seq::GenEMBLI supporting # EMBL/GenBank format if( $seq->isa('Bio::Seq::GenEMBLI') ) { foreach $date ( $seq->each_date() ) { print "date is $date\n"; # currently formatted string } foreach $key ( $seq->each_keyword() ) { print "key word is $key\n"; } foreach $sec ( $seq->each_secondary_accession() ) { print "secondary accession number $sec\n"; } print "Entry is in ",$seq->division()," and has molecular identifier ", $seq->molecule(),"\n"; } ewan ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From jason@chg.mc.duke.edu Sun Dec 17 19:26:24 2000 Date: Sun, 17 Dec 2000 14:26:24 -0500 From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] tempfile creation and Bio::Tools::Run
Since File::Temp seems to be causing problems, does it make more sense to migrate tempfile creation to RootI and use File::Temp if it exists and fallback on a more simplified method we write in bioperl if it doesn't? We definitely needed to go to tempfile creation for StandAloneBlast rather than hardcoded tmp1.fa as it is concievable to have >1 blast process running in the same directory. Will need to do the same for the ClustalW and TCoffee running as well. As Hilmar has suggested, how about moving modules that run external applications to Bio::Tools::Run - while things that are typically parsing output from programs can stay in Bio::Tools. -JasonFrom birney@ebi.ac.uk Sun Dec 17 19:21:54 2000 Date: Sun, 17 Dec 2000 19:21:54 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Bio::Tools::HMMER refactoring
[apologies for spelling mistakes in the previous email...] Today is becoming a very productive bioperl day. I have moved Bio::Tools::HMMER over to the SimilarityPair object. This was very smooth except for a requirement to do: sub _initialize { my($self,@args) = @_; my $make = $self->SUPER::_initialize(@args); $self->{'alignlines'} = []; # make sure we have actually created the feature2 object # not ideal this... $self->subject(); return $make; } in the Bio::Tools::HMMER::Domain object because it needs the feature2 object created. SimilarityPair seems to make sure feature1 is created - why not do the same thing for feature2? (hilmar?) I don't 100% understand what I should be doing with SeqFeatureAnalysisI here. I have to implement parse =head2 parse Title : parse Usage : $obj->parse(-input=>$inputobj, [ -params=>[@params] ], [ -method => $method ] ) Function: sets up parsing for feature retrieval from an analysis file, or object Example : Returns : void Args : B<input> - object/file where analysis are coming from, B<params> - parameter to use when parsing/running analysis B<method> - method of analysis (optional) =cut Is it ok to (a) assumme that -input is always a filehandle (ie, I can go <$input>)? (b) ignore everything else? What should parse() return? Then I have to implement next_feature (surely next_seq_feature or next_SeqFeature would have been a better name....) I really want to implement next_feature on what I return from parse() because in HMMER, I need to read the whole damn file before I can return a properely parsed seqfeature (don't ask...) (second issue is that I really have a Set of SimilarityPair objects, but that also is another matter). I am not 100% on this interface. Who uses it and is this the best way to do things here? ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From birney@ebi.ac.uk Sun Dec 17 19:23:59 2000 Date: Sun, 17 Dec 2000 19:23:59 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Re: tempfile creation and Bio::Tools::Run
On Sun, 17 Dec 2000, Jason Stajich wrote: > Since File::Temp seems to be causing problems, does it make more sense to > migrate tempfile creation to RootI and use File::Temp if it exists and > fallback on a more simplified method we write in bioperl if it doesn't? sounds good to me. > > We definitely needed to go to tempfile creation for StandAloneBlast rather > than hardcoded tmp1.fa as it is concievable to have >1 blast process running > in the same directory. Will need to do the same for the ClustalW and > TCoffee running as well. > Right. > As Hilmar has suggested, how about moving modules that run external > applications to Bio::Tools::Run - while things that are typically parsing > output from programs can stay in Bio::Tools. > Also sounds good. I was going to suggest to wait for Hilmar to say ok, but as he suggested it and you and I are ok with this, then I think it passes the "main developers" litmus test. Go for it. > -Jason > > > > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From schattner@alum.mit.edu Sun Dec 17 19:49:07 2000 Date: Sun, 17 Dec 2000 11:49:07 -0800 From: Peter Schattner schattner@alum.mit.edu Subject: [Bioperl-l] tempfile creation and Bio::Tools::Run
Jason Stajich wrote: > > > As Hilmar has suggested, how about moving modules that run external > applications to Bio::Tools::Run - while things that are typically parsing > output from programs can stay in Bio::Tools. > > -Jason > Fine with me, Jason. Can you do the moving ...? :-) PeterFrom jason@chg.mc.duke.edu Sun Dec 17 20:36:27 2000 Date: Sun, 17 Dec 2000 15:36:27 -0500 From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] tempfile creation and Bio::Tools::Run
will do. I am also adding tempfile and tempdir methods to RootI. I will fix ClustalW and TCoffee to use tempfile properly and then move these modules to Bio::Tools::Run::Alignment Only problem is we'll lose our logs from cvs but we can always do some diffs when needed. -Jason ----- Original Message ----- From: "Peter Schattner" <schattner@alum.mit.edu> To: "Jason Stajich" <jason@chg.mc.duke.edu> Cc: <bioperl-l@bioperl.org> Sent: Sunday, December 17, 2000 2:49 PM Subject: Re: [Bioperl-l] tempfile creation and Bio::Tools::Run > Jason Stajich wrote: > > > > > > As Hilmar has suggested, how about moving modules that run external > > applications to Bio::Tools::Run - while things that are typically parsing > > output from programs can stay in Bio::Tools. > > > > -Jason > > > > Fine with me, Jason. Can you do the moving ...? :-) > > Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l >From jason@chg.mc.duke.edu Sun Dec 17 21:46:02 2000 Date: Sun, 17 Dec 2000 16:46:02 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] migrated modules to a run dir
Migrated the Bio::Tools::StandAloneBlast.pm, Bio::Tools::Alignment::* to Bio::Tools::Run I have also added the capability of tempfile and tempdir creation from Bio::Root::RootI, all modules that use tempfiles should probably use these methods as it handles the absence of File::Temp. Currently the code automatically deletes temporary files upon destruction of the object that called tempfile. Unfortunately the output files created by Bio::Tools::Run::Alignment::* produce .dnd dendogram files which are created in CWD not in the dir where the temporary files are located (/tmp). So my code to remove them is not looking in the right place... I'll add some better code to allow users to specify whether or not to delete these files (in case you want to use the dendogram files). Will removed them for sure when created by the tests. But is it more correct for them to be put in the current dir or where the sequences files are located. I can imagine arguments for both sides. Can't do any of this till Monday though, so user be warned when you run make test on the main trunk. -Jason Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From schattner@alum.mit.edu Sun Dec 17 23:28:33 2000 Date: Sun, 17 Dec 2000 15:28:33 -0800 From: Peter Schattner schattner@alum.mit.edu Subject: [Bioperl-l] Re: Developing Improved Bioperl Documentation
Kris Boulez wrote: > > > My plan is to start writing cookbook-like documentation (see an earlier > mail from Ewan), based on the SYNOPSIS part of each module's > documentation. In the mean time chekcing these section for correctness. > > Kris, Kris, I gather now that the sort of cookbook that you (and Brian?) are planning is rather different than the type of tutorial that I have been envisioning. I don’t think this is bad. After all, in learning new software (as in coding our favorite computer language) there is a something to be said for "having more than one way to do it". I think that your cookbook would be complementary (and I would hope also complimentary ;-) to the tutorial I would like to write. I plan to go less into the precise syntax for using the various modules (I think that information would fit better in your cookbook) and more into "motivation" - describing what tasks a (computational) biologist could use bioperl for and where they should look to find those capabilities within the Bioperl package. I know that when I was learning bioperl, not knowing what tools were available and where to look for them was one of the bigger stumbling blocks for me. I have attached a outline of the proposed tutorial, below. I would be grateful for feedback for anybody on the list regarding uses of bioperl I’ve omitted, modules that should be included or omitted from a tutorial, or any other suggestions that you think might be helpful in a tutorial of the type I am describing. Thanks. -- Peter ========================= Bioperl Tutorial - outline Introduction What Bioperl is intended to do User required capabilities Software requirements Minimal installation Perl Bioperl "core" Complete installation Perl - CPAN extensions (LWP, File:Temp, etc) Bioperl Perl -extensions: bp-gui, bp-ensembl, bp-biocorba Bioperl c -extensions Non-perl bio-informatics c programs: clustalw, ncbi blast, tcoffee Installation Obtaining the core components Installing the external components / extensions Additional info for non-unix users Where to go for more information Brief intro to Bioperl's objects Motivation: (or why understanding a little about the relationships among Bioperl's basic objects will make the user's life easier) Sequence objects: (Seq, PrimarySeq, LocatableSeq, LiveSeq, LargeSeq) Alignment objects (SimpleAlign, UnivAln) Where to go for more information Using Bioperl Overview of molecular biology tasks where bioperl can help Accessing sequence data from local and remote databases Accessing remote databases (Bio::DB::GenBank, etc) Indexing and accessing local databases (bpindex.pl, bpfetch.pl) Transforming formats of database/ file records Transforming sequence files (SeqIO) Transforming alignment files (AlignIO) Manipulating individual sequences Obtaining basic sequence statistics - eg MW, nucleotide & codon frequencies (SeqStats, SeqWords) Expanding sequences with ambiguous nts or aas (SeqPattern, IUPAC) Reverse-complementing nt seqs (SeqPattern) Translating nt seqs (CodonTable) Identifying aa characteristics - eg charge, hydrophobicity (OddCodes) Identifying restriction enzyme sites (RestrictionEnzyme) Identifying aa cleavage sites (Sigcleave) Searching for "similar" sequences Running BLAST locally (StandAloneBlast) Running BLAST remotely (Blast) Parsing BLAST reports (Blast, BPlite, BPpsilite) Creating and manipulating sequence alignments Aligning 2 sequences with Smith-Waterman (pSW) Aligning 2 sequences with Blast (StandAloneBlast, BPbl2seq) Aligning multiple sequences (Clustalw, TCoffee) Manipulating / displaying alignments (SimpleAlign, UnivAln) Searching for genes and other structures on genomic DNA Parsing reports of gene-searching programs (Genscan, ESTScan. MZEF) Parsing HMM reports (HMMER::Results) Developing machine readable sequence annotations Representing sequence annotations for a single sequence (Annotation, SeqFeature, GeneStructure) Representing and annotating genomic and/or very large sequences (LiveSeq,LargeSeq) Representing related sequences - mutations, polymorphisms etc (Allele, SeqDiff, etc) Sequence XML representations - generation and parsing (SeqIO::game) Graphically displaying annotated sequences (Bioperl – gui) Where to go for more informationFrom hlapp@gmx.net Mon Dec 18 05:20:42 2000 Date: Sun, 17 Dec 2000 21:20:42 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Bio::Tools::Run
Jason Stajich wrote: > > Only problem is we'll lose our logs from cvs but we can always do some diffs > when needed. > Not necessarily. The cvs docs in info-format suggest themselves that if you had something under RCS and want to keep the log information under cvs, simply put the RCS file into repository, provided you can (permission- and access-wise). So, you could have simply moved the ,v files in the repository from one directory to another. I know it doesn't sound clean, but should work. Anyway, I'm too late with this ... :o) Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From hlapp@gmx.net Mon Dec 18 06:37:55 2000 Date: Sun, 17 Dec 2000 22:37:55 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Bio::Tools::HMMER refactoring
Ewan Birney wrote: > > in the Bio::Tools::HMMER::Domain object because it needs the feature2 > object created. SimilarityPair seems to make sure feature1 is created - > why not do the same thing for feature2? (hilmar?) > There's no problem in changing SimilarityPair such that the existence of feature2 is also ensured. The reason I didn't implement it initially is that I have a tendency to safe-guard only those things of which I'm sure they need it (i.e., don't secure yourself from your own bugs popping up). > I don't 100% understand what I should be doing with SeqFeatureAnalysisI > here. I have to implement parse > I'm almost sure you don't want to implement this. I still need to discuss with Jason whether Bio::SeqFeatureAnalysis suffices in the core as implementing class, but probably it does. A parser almost certainly doesn't want to implement SeqFeatureAnalysis, but if the result of its parsing is SeqFeatureI objects (as is the case for Tools::HMMER), it probably should try to implement Bio::SeqAnalysisParserI. So, that's probably what you should implement. The easiest way to do so might be inheriting from Bio::Tools::AnalysisParser. SeqAnalysisParserI requires a method parse(), too, but Tools::AnalysisParser already does most of the job here. Check its documentation, it's not that poor as I just realized ... :) > > Is it ok to > > (a) assumme that -input is always a filehandle (ie, I can go <$input>)? > > (b) ignore everything else? > Neither is ok. A good starting point to understand what's required is probably Bio::Tools::AnalysisParser::parse(), as mentioned above. > > Then I have to implement next_feature (surely next_seq_feature or > next_SeqFeature would have been a better name....) > > I really want to implement next_feature on what I return from parse() > because in HMMER, I need to read the whole damn file before I can return a > properely parsed seqfeature (don't ask...) > That's not a problem (unless reading the whole file is a problem). I realize that in fact you were talking about SeqAnalysisParserI ... The return type of parse() is really void; one purpose is to be able to specify multiple inputs, that is, one purpose is to reset the state of the parser (that's exactly what AnalysisParser::parse() primarily does, together with _initialize_state(), so most likely you want to override this method if you decide to inherit from Tools::AnalysisParser; see Tools::Genscan.pm as an example). next_feature() is required to return one feature at a time, but these can obviously be taken from an array built on parsing the file. It is up to the implementor when the file is parsed. You could do it in your implementation of parse(), since the user is required to call that method before being able to retrieve features by calling next_feature(). The classes I wrote follow Tools::AnalysisParser, meaning that parse() mainly re-initializes, and every call to next_feature() parses the next chunk of data from input. In Genscan.pm the first call to next_feature triggers parsing of the whole prediction section (but not the predicted seqs). > (second issue is that I really have a Set of SimilarityPair objects, but > that also is another matter). > If they are really somewhat independent pairs, you can return one at a time when next_feature() is called. If they rather make up one feature, they maybe should better be encapsulated anyway. Maybe I don't understand. > > I am not 100% on this interface. Who uses it and is this the best way to > do things here? > If you're talking about SeqAnalysisParserI, it is presently implemented by Tools::AnalysisParser and therefore by all classes inheriting from it: Genscan.pm, MZEF.pm, ESTScan.pm, and BPlite.pm should be migrated to it, too. The whole idea Jason and I had in mind for AnalysisParserI and SeqFeatureProducerI is the ability to implement very generic programs for annotating sequences with features. The scope is methods and parsers that really produce something fitting SeqFeatureI. The concept is, a generic program that has a sequence and a parser object implementing SeqAnalysisParserI can obtain features from the parser and add these to the sequence object, which can then for instance be submitted to a module making the annotations persistent in a database. SeqFeatureProducerI is the driver part similarly to the SeqIO system: given a method (by name), it returns a SeqAnalysisParserI implementing object. So, for implementing SeqFeatureProducerI there are two mechanisms we can follow: for each new parser module add code to a single driver Bio::SeqFeatureProducer to make it recognize it, or add a simple module named as the method (ala SeqIO). Presently Jason suggests to follow the first approach for simplicity, and I tend to agree. The overall point is that you do not have to change or add anything to your generic program, and it would still accommodate any new method. You just specify input and method (and update bioperl :-). I realize that SeqFeatureProducer doesn't exactly follow what I just said ... :o| Jason and I need a few more thoughts here I guess. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From hlapp@gmx.net Mon Dec 18 07:35:52 2000 Date: Sun, 17 Dec 2000 23:35:52 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] printing UnivAlgn
Ewan Birney wrote: > > > > -it seems people like the C++ over the java style in naming functions. > > i.e. function name words are small letter throughout and separated by an > > underscore. but this is not universally true in bioperl code. I really > > like the 'java' style where you capitalize the first letter of every > > word (except the first) and no underscore. is there a 'bioperl' standard > > in this regard? > > > > This is what I tend to write. I do prefer underscores. I reserve > captialisation to mean "Object" like "SimpleAlign" > Java naming style is nice, but C++ is either. What confused me in Bioperl and still does from time to time is that both are mixed, like sub_SeqFeature(). Still, I'm probably not the only developer who appreciates it if a package has *consistent* naming style ... > > In a *long* time of profiling bioperl code, I have never seen an accessor > being a peformance bottleneck. I think a dual get/set is fine. > I agree regarding the performance. I even like the dual purpose style. > > -functions that return a set of objects are named using differing > > conventions in bioperl. one convention that I like is for a function > > that say returns a set of alignments is just to be named getAlignments() > > as opposed to names like each_alignment() which I have found confusing > > since it sounds like representing an internal iterator that you have to > > call repeatedly to iterate through the sequence (similar to the each > > keyword in perl). > > > > > > The each_ syntax is my fault. I still like it, but I know that it is not > universally loved. get_Alignments would be probably the ideal I think... > I still find the each_ style confusing and counter-intuitive. For instance, each_tag_value() is ambiguous in the first place: does it mean the value(s!) of each tag (in the sense of all tags), or each (in the sense of all) value(s) of a particular tag? Anyway, apart from flattening your learning curve, the each_ style is not incomprehensible ... ;) Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From hlapp@gmx.net Mon Dec 18 07:42:51 2000 Date: Sun, 17 Dec 2000 23:42:51 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Bad news for 5.0004 compatibility
Ewan Birney wrote: > > - The entire LiveSeq modules are not quoting the hash values, ie, going > > $self->{seq} = $something > This BTW is not restricted to Bio::LiveSeq::*. You can find it all over the place. People seem to dislike quoting hash keywords. If one quote takes you 1 second to type (if you need to switch constantly between American and European keyboards, you're never sure where the quote key sits), omitting it could amount to saving at least 1 or 2 minutes per module written ... Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From krbou@pgsgent.be Mon Dec 18 07:56:22 2000 Date: Mon, 18 Dec 2000 08:56:22 +0100 From: Kris Boulez krbou@pgsgent.be Subject: [Bioperl-l] Seq(IO) documentation thoughts
Over the weekend I've been writing some example code (for the seq/seqio part of BioPerl). The pratical issues I encountered I'll save for my answer to Peter's mail. One thing I want to ask here. How far do we want to go in having BioPerl (SeqIO) being a format convertor ? I've played around a bit with converting one sequence formati (a) to another (b) and using (b) as input for another round. It turns out that after some rounds (mostly <10) BioPerl isn't -w clean anymore ('use of uninitialized value ...') or just throws an error. Is it worth investigating this, or do we just say that we only support one conversion. Something else I noted, that we'll have to explain well in the docs is the difference between Seq (the sequence object) and ->seq() (the sequence as a string). People might want to expect that $seqobj->subseq(10,50) returns a new sequence object and not the string. Would such a method make sense. Kris,From hlapp@gmx.net Mon Dec 18 07:57:49 2000 Date: Sun, 17 Dec 2000 23:57:49 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Bio::Seq::GenEMBLI proposal
Ewan Birney wrote: > > The proposal is an interface called > > Bio::Seq::GenEMBLI > > and an implementation Bio::Seq::GenEMBL. (interface allows other people to > comply with the interface without using the same implementation. A "good > thing" tm, in particular for database implementors). > Just a remark: why can't I comply with a module's API by just subclassing it and overriding all its methods? Why do I need an implementation-less interface for this? (I thought the interface-hype was initiated to justify Java's disability of multiple inheritance.) > At the moment I have just taken what is in the Bio::Seq object and moved > it into its own interface, written below. > > Decisions: > > (a) should we keep with the each_ syntax, or would people prefer > "something returing an array of things" to have a different naming > convention? > If you ask, I say let's change it to something more commonly used and more intuitive for newcomers. I'm not sure, however, that either changing all such names to a new naming style, or introducing inconsistencies is good. What do people feel about this? > > (b) should date's be formatted strings or something else? (if so, what?) > If something structured, it is clear that we will have to parse the date ... > (c) should keyword lines be split on keywords and each_keyword methods > or not? > ? What do you mean? > (d) should the interface extend to cover swissprot, in which case > > - name change? > > - additional methods? Hm. If they share a lot, can't we make swissprot inherit from GenEMBL? > > Here is what I have so far for this interface definition, waiting to be > committed once I get the "ok" > Ok, apart from the comments above. Do we really need the interface here? Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From hlapp@gmx.net Mon Dec 18 07:59:53 2000 Date: Sun, 17 Dec 2000 23:59:53 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Re: Developing Improved Bioperl Documentation
Peter Schattner wrote: > > ========================= > Bioperl Tutorial - outline > Excellent. I'm seriously impressed. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From birney@ebi.ac.uk Mon Dec 18 09:05:59 2000 Date: Mon, 18 Dec 2000 09:05:59 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Bio::Seq::GenEMBLI proposal
On Sun, 17 Dec 2000, Hilmar Lapp wrote: > Ewan Birney wrote: > > > > The proposal is an interface called > > > > Bio::Seq::GenEMBLI > > > > and an implementation Bio::Seq::GenEMBL. (interface allows other people to > > comply with the interface without using the same implementation. A "good > > thing" tm, in particular for database implementors). > > > > Just a remark: why can't I comply with a module's API by just > subclassing it and overriding all its methods? Why do I need an > implementation-less interface for this? (I thought the interface-hype > was initiated to justify Java's disability of multiple inheritance.) Because in general an implementation has many methods (in particular the set methods, such as "add_date") than an interface. Interfaces are more likely to be read only. Inside Ensembl and other projects, like bioperl-corba-client, we basically comply to the interfaces and do not override the implementation. Overriding implmentations in my view (a) makes the code less clear (ie, someone has to figure out that you realy hae overridden each method) and (b) gives ample oppertunity for non-intentional screwups when the implementations change, eg, by adding a function that assummes that it is implemented as a hash to the implementation can produce a *segmentation fault* for some implementations (yikes!) > > > At the moment I have just taken what is in the Bio::Seq object and moved > > it into its own interface, written below. > > > > Decisions: > > > > (a) should we keep with the each_ syntax, or would people prefer > > "something returing an array of things" to have a different naming > > convention? > > > > If you ask, I say let's change it to something more commonly used and > more intuitive for newcomers. I'm not sure, however, that either > changing all such names to a new naming style, or introducing > inconsistencies is good. What do people feel about this? > > > > > (b) should date's be formatted strings or something else? (if so, what?) > > > > If something structured, it is clear that we will have to parse the date > ... > > > (c) should keyword lines be split on keywords and each_keyword methods > > or not? > > > > ? What do you mean? > > > (d) should the interface extend to cover swissprot, in which case > > > > - name change? > > > > - additional methods? > > Hm. If they share a lot, can't we make swissprot inherit from GenEMBL? > > > > > Here is what I have so far for this interface definition, waiting to be > > committed once I get the "ok" > > > > Ok, apart from the comments above. Do we really need the interface here? > Absolutely. Ensembl is going to need to hit this interface from the ContigI interface (ie, Ensembl's Bio::EnsEMBL::DB::ContigI will inheriet off this). ContigI is implemented inside Ensembl two radically different ways. We need this to allow Ensembl to use the SeqIO system for GenBank/EMBL dumping (write_seq on genbank/embl will be smart, and do an ->isa() on the incoing seq to see if it supports this interface). We definitely need an interface here. In fact, I think we need interfaces nearly everywhere - it really future proofs the code... > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From birney@ebi.ac.uk Mon Dec 18 09:07:17 2000 Date: Mon, 18 Dec 2000 09:07:17 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Seq(IO) documentation thoughts
On Mon, 18 Dec 2000, Kris Boulez wrote: > Over the weekend I've been writing some example code (for the seq/seqio > part of BioPerl). The pratical issues I encountered I'll save for my > answer to Peter's mail. > > One thing I want to ask here. How far do we want to go in having BioPerl > (SeqIO) being a format convertor ? I've played around a bit with > converting one sequence formati (a) to another (b) and using (b) as > input for another round. It turns out that after some rounds (mostly <10) > BioPerl isn't -w clean anymore ('use of uninitialized value ...') or > just throws an error. Is it worth investigating this, or do we just say > that we only support one conversion. I think we should aim for complete round-trip information transfer. We just haven't tested this area hard enough. > > Something else I noted, that we'll have to explain well in the > docs is the difference between Seq (the sequence object) and ->seq() > (the sequence as a string). People might want to expect that > $seqobj->subseq(10,50) returns a new sequence object and not the string. > Would such a method make sense. > $seqobj->trunc(10,50) does this. > Kris, > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From hlapp@gmx.net Mon Dec 18 09:19:03 2000 Date: Mon, 18 Dec 2000 01:19:03 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Bio::Seq::GenEMBLI proposal
Ewan Birney wrote: > > We definitely need an interface here. In fact, I think we need interfaces > nearly everywhere - it really future proofs the code... > I already thought so ... :) I agree with your points in clarity and safety. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From birney@ebi.ac.uk Mon Dec 18 09:25:56 2000 Date: Mon, 18 Dec 2000 09:25:56 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Bio::Tools::HMMER refactoring
Re: Bio::SeqAnalysisParserI I can live with this interface now explained. It is actually reminiscent of the Ensembl pipeline "Runnable" system, although the Runnable system encapsulates the actual running of the program. I do think though that we are making life more complex for the implementators of the interface and the clients. I can imagine the following scenario: $new_analysis = yadda-yadda $new_analysis->parse(-fh => \*INPUT); &complex_process_results($new_analysis); sub complex_process_results { $ana = shift; while( $next_feature = $ana->next_feature ) { ... lots of stuff ... # stupidly the client reuses the new analysis for a # new analysis, maybe because it needs the parameterisation # made from the first one... $ana->parse(\*NEW_INPUT); # yikes - hard bug to catch back at while lop } In addition, this interface will not go easily into a corba /time-sliced/threaded framework. Why not have Bio::SeqAnalysisParserFactoryI $parser = $factory->create_parser(-fh => \*FILE); Bio::SeqAnalyisParserI while( $next_feature = $parser->next_feature ) { } same number of functions defined. Twice the number of interfaces, but these are the interfaces I would argue we want. An implementation could implement ParserFactoryI and ParserI in the same module if so wished. Whaddya reckon? Too complex for your taste hilmar? ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From birney@ebi.ac.uk Mon Dec 18 09:33:43 2000 Date: Mon, 18 Dec 2000 09:33:43 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Bio::Seq::GenEMBLI proposal
On Mon, 18 Dec 2000, Hilmar Lapp wrote: > Ewan Birney wrote: > > > > We definitely need an interface here. In fact, I think we need interfaces > > nearly everywhere - it really future proofs the code... > > > > I already thought so ... :) > > I agree with your points in clarity and safety. > Thanks! Ok. Now for the "naming proposal" I am going to suggst we vote on either: each_<type> eg, each_keyword or each_Seq or get_<type>s eg, get_keywords or get_Seqs (notice plural) for methods returning an array of things. After the vote, we will say that new modules should conform to this convention, but Hilmar to decide if this is a 0.7 branch criteria retrofitting (I would vote that it is *not* a 0.7 branch criteria) My vote is for.... get_<type>s Your vote counts ;) > Hilmar > > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From hlapp@gmx.net Mon Dec 18 10:08:03 2000 Date: Mon, 18 Dec 2000 02:08:03 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Seq(IO) documentation thoughts
Kris Boulez wrote: > > One thing I want to ask here. How far do we want to go in having BioPerl > (SeqIO) being a format convertor ? I've played around a bit with > converting one sequence formati (a) to another (b) and using (b) as > input for another round. It turns out that after some rounds (mostly <10) > BioPerl isn't -w clean anymore ('use of uninitialized value ...') or > just throws an error. Is it worth investigating this, or do we just say > that we only support one conversion. > We had a similar discussion some months ago. Quoting myself :O) from http://bioperl.org/pipermail/bioperl-l/2000-September/001282.html: --- quote on The point I'd like to make may be best illustrated by comparing with automated language translators that are around (like babelfish; babelfish.altavista.com). Try to translate an only slightly complicated sentence from one language into another, which already screws it up half-way, and then translate the result into a third. I think it is pointless for BioPerl to aim at clean and complete conversion from any rich format into another rich format for sequences. The only way this could be achieved with a reasonable effort is by mapping languages to a common meta-representation, like XML or ASN.1 (and anything the meta-format doesn't cover will still be lost). --- quote off This more or less was approved consensus, at least to my humble understanding. I can easily imagine that there are still parsing bugs making things worse, and these obviously need to be eliminated. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From hlapp@gmx.net Mon Dec 18 10:18:20 2000 Date: Mon, 18 Dec 2000 02:18:20 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Bio::Seq::GenEMBLI proposal
Ewan Birney wrote: > > I am going to suggst we vote on either: > > each_<type> > > eg, each_keyword or each_Seq > > or > > get_<type>s > > eg, get_keywords or get_Seqs > > (notice plural) > > for methods returning an array of things. > > After the vote, we will say that new modules should conform to this > convention, but Hilmar to decide if this is a 0.7 branch criteria > retrofitting (I would vote that it is *not* a 0.7 branch criteria) > > My vote is for.... > > get_<type>s > > Your vote counts ;) > We're voting without standardizing the ballots first? How do we recount votes in bounced mails? And after all, how do I punch the email? Jokes aside, I vote for get_<type>s and it not being an 0.7 criteria. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From krbou@pgsgent.be Mon Dec 18 10:22:37 2000 Date: Mon, 18 Dec 2000 11:22:37 +0100 From: Kris Boulez krbou@pgsgent.be Subject: [Bioperl-l] Seq(IO) documentation thoughts
Quoting Ewan Birney (birney@ebi.ac.uk): > On Mon, 18 Dec 2000, Kris Boulez wrote: > > > I think we should aim for complete round-trip information transfer. We > just haven't tested this area hard enough. > OK, I'll have a more close look at it and report my findings. > > > > Something else I noted, that we'll have to explain well in the > > docs is the difference between Seq (the sequence object) and ->seq() > > (the sequence as a string). People might want to expect that > > $seqobj->subseq(10,50) returns a new sequence object and not the string. > > Would such a method make sense. > > > > $seqobj->trunc(10,50) does this. > Hadn't gotten that far in the docs (and writing the example code) yet. Thanks for the info. Kris,From birney@ebi.ac.uk Mon Dec 18 10:34:00 2000 Date: Mon, 18 Dec 2000 10:34:00 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Seq(IO) documentation thoughts
On Mon, 18 Dec 2000, Hilmar Lapp wrote: > Kris Boulez wrote: > > > > One thing I want to ask here. How far do we want to go in having BioPerl > > (SeqIO) being a format convertor ? I've played around a bit with > > converting one sequence formati (a) to another (b) and using (b) as > > input for another round. It turns out that after some rounds (mostly <10) > > BioPerl isn't -w clean anymore ('use of uninitialized value ...') or > > just throws an error. Is it worth investigating this, or do we just say > > that we only support one conversion. > > > > > We had a similar discussion some months ago. Quoting myself :O) from > http://bioperl.org/pipermail/bioperl-l/2000-September/001282.html: > > --- quote on > The point I'd like to make may be best illustrated by comparing with > automated language translators that are around (like babelfish; > babelfish.altavista.com). Try to translate an only slightly complicated > sentence from one language into another, which already screws it up > half-way, and then translate the result into a third. I think it is > pointless for BioPerl to aim at clean and complete conversion from any > rich format into another rich format for sequences. > > The only way this could be achieved with a reasonable effort is by > mapping languages to a common meta-representation, like XML or ASN.1 > (and > anything the meta-format doesn't cover will still be lost). > --- quote off > > This more or less was approved consensus, at least to my humble > understanding. I can easily imagine that there are still parsing bugs > making things worse, and these obviously need to be eliminated. I agree with Hilmar here for between rich-format transfer, but I do think that mutliple read->write cycles of one format should have minimal if not 0 information loss. We could also try for an embl->genbank->embl loop being close to 0. the sorts of things I am ok at "losing" is white space formatting in the comments. maintaining this inside the objects effectively means maintaining the file as it was read in an object. Not nice in my view... > > Hilmar > > > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From hlapp@gmx.net Mon Dec 18 10:35:58 2000 Date: Mon, 18 Dec 2000 02:35:58 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Bio::SeqAnalysisParserI [was: Bio::Tools::HMMER refactoring]
Ewan Birney wrote: > > In addition, this interface will not go easily into a corba > /time-sliced/threaded framework. > > Why not have > > Bio::SeqAnalysisParserFactoryI > > $parser = $factory->create_parser(-fh => \*FILE); > > Bio::SeqAnalyisParserI > > while( $next_feature = $parser->next_feature ) { > > } > > same number of functions defined. Twice the number of interfaces, but > these are the interfaces I would argue we want. > > An implementation could implement ParserFactoryI and ParserI in the same > module if so wished. > > Whaddya reckon? Too complex for your taste hilmar? > Well, Jason and I had such a layout in mind first, but the question was how significant the performance hit might be in a CORBA context. A likely situation is that you have less than 10 methods for which you need parsers, and thousands of sequences, that is, thousands of inputs for each parser. We thought that in a CORBA context creating 10 objects instead of 10,000 does matter (in pure Perl you probably wouldn't notice a difference), and that therefore we wanted to be able to reuse a once-created parser object. Of course you could let the parser implement the factory, too, and abuse it as a 'reset', but IMHO this is abuse. So, what I wanted to say, I guess both Jason and I are in principle happy with a factory. Based on my experience with CORBA, however, there is a performance issue, but my experience is somewhat not up-to-date, and not that extensive, so it's up to you and Jason to make a decision here. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From birney@ebi.ac.uk Mon Dec 18 10:41:07 2000 Date: Mon, 18 Dec 2000 10:41:07 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Bio::SeqAnalysisParserI [was: Bio::Tools::HMMER refactoring]
On Mon, 18 Dec 2000, Hilmar Lapp wrote: > Ewan Birney wrote: > > > > In addition, this interface will not go easily into a corba > > /time-sliced/threaded framework. > > > > Why not have > > > > Bio::SeqAnalysisParserFactoryI > > > > $parser = $factory->create_parser(-fh => \*FILE); > > > > Bio::SeqAnalyisParserI > > > > while( $next_feature = $parser->next_feature ) { > > > > } > > > > same number of functions defined. Twice the number of interfaces, but > > these are the interfaces I would argue we want. > > > > An implementation could implement ParserFactoryI and ParserI in the same > > module if so wished. > > > > Whaddya reckon? Too complex for your taste hilmar? > > > > Well, Jason and I had such a layout in mind first, but the question was > how significant the performance hit might be in a CORBA context. A > likely situation is that you have less than 10 methods for which you > need parsers, and thousands of sequences, that is, thousands of inputs > for each parser. We thought that in a CORBA context creating 10 objects > instead of 10,000 does matter (in pure Perl you probably wouldn't notice > a difference), and that therefore we wanted to be able to reuse a > once-created parser object. > > Of course you could let the parser implement the factory, too, and abuse > it as a 'reset', but IMHO this is abuse. > > So, what I wanted to say, I guess both Jason and I are in principle > happy with a factory. Based on my experience with CORBA, however, there > is a performance issue, but my experience is somewhat not up-to-date, > and not that extensive, so it's up to you and Jason to make a decision > here. >From a CORBA perspective I think both schemes would be implemented similarly, but with teh current scheme being less clean - indeed - potentially dangerously assumming a single client, single thread mode. The overhead is going to be in making the SeqFeatures in a CORBA contex, not the analysis routines. The analysis routine creation only becomes a bottleneck in certain cases (though this does happen - for example, we have hit this bottleneck in Ensembl...) So - we are ok in splitting the interfaces into two now? Final thoughts from jason? > > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From hlapp@gmx.net Mon Dec 18 10:47:57 2000 Date: Mon, 18 Dec 2000 02:47:57 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Seq(IO) documentation thoughts
Ewan Birney wrote: > > I agree with Hilmar here for between rich-format transfer, but I do think > that mutliple read->write cycles of one format should have minimal if not > 0 information loss. Right. If information loss in this case is a monotonely increasing function of the number of r-w cycles, that's a bug. (There may be a loss in the initial read, but already the second write should yield the same result as the first.) Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From krbou@pgsgent.be Mon Dec 18 10:59:17 2000 Date: Mon, 18 Dec 2000 11:59:17 +0100 From: Kris Boulez krbou@pgsgent.be Subject: [Bioperl-l] Seq(IO) documentation thoughts
Quoting Hilmar Lapp (hlapp@gmx.net): > Ewan Birney wrote: > > > > I agree with Hilmar here for between rich-format transfer, but I do think > > that mutliple read->write cycles of one format should have minimal if not > > 0 information loss. > > Right. If information loss in this case is a monotonely increasing > function of the number of r-w cycles, that's a bug. (There may be a loss > in the initial read, but already the second write should yield the same > result as the first.) > As said earlier. I will look into it more deeply. As this is my evening project, I'll be back later. Kris,From krbou@pgsgent.be Mon Dec 18 13:16:17 2000 Date: Mon, 18 Dec 2000 14:16:17 +0100 From: Kris Boulez krbou@pgsgent.be Subject: [Bioperl-l] Re: Developing Improved Bioperl Documentation
Quoting Peter Schattner (schattner@alum.mit.edu): > Kris Boulez wrote: > > > > > My plan is to start writing cookbook-like documentation (see an earlier > > mail from Ewan), based on the SYNOPSIS part of each module's > > documentation. In the mean time chekcing these section for correctness. > > > > Kris, > > Kris, I gather now that the sort of cookbook that you (and Brian?) are > planning is rather different than the type of tutorial that I have been > envisioning. I don’t think this is bad. After all, in learning new > software (as in coding our favorite computer language) there is a > something to be said for "having more than one way to do it". I think > that your cookbook would be complementary (and I would hope also > complimentary ;-) to the tutorial I would like to write. > > > ========================= > Bioperl Tutorial - outline > [ .. ] Wow, I think this will be a very good thing. I also think that our approaches are complimentary (and should not interfere). Two (related) questions: - In which format will this be written (Wiki, LaTex, plain text, POD, HTML, ...) ? - Will this be put under bioperl-live or will we use a different CVS repository (there exists a bioperl-cookbook) ? Kris,From jason@chg.mc.duke.edu Mon Dec 18 14:23:22 2000 Date: Mon, 18 Dec 2000 09:23:22 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] Bio::Seq::GenEMBLI proposal
I vote for get_<type>s > On Mon, 18 Dec 2000, Hilmar Lapp wrote: > > > Ewan Birney wrote: > > > > > > We definitely need an interface here. In fact, I think we need interfaces > > > nearly everywhere - it really future proofs the code... > > > > > > > I already thought so ... :) > > > > I agree with your points in clarity and safety. > > > > Thanks! > > Ok. Now for the "naming proposal" > > I am going to suggst we vote on either: > > > each_<type> > > eg, each_keyword or each_Seq > > or > > get_<type>s > > eg, get_keywords or get_Seqs > > (notice plural) > > for methods returning an array of things. > > > After the vote, we will say that new modules should conform to this > convention, but Hilmar to decide if this is a 0.7 branch criteria > retrofitting (I would vote that it is *not* a 0.7 branch criteria) > > > > My vote is for.... > > > get_<type>s > > Your vote counts ;) > > > > Hilmar > > > > -- > > ----------------------------------------------------------------- > > Hilmar Lapp email: hlapp@gmx.net > > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > > ----------------------------------------------------------------- > > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > <birney@ebi.ac.uk>. > ----------------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From gordonp@niji.imb.nrc.ca Mon Dec 18 14:32:54 2000 Date: Mon, 18 Dec 2000 10:32:54 -0400 (AST) From: Paul Gordon gordonp@niji.imb.nrc.ca Subject: [Bioperl-l] Bio::Seq::GenEMBLI proposal
> Decisions: > > (a) should we keep with the each_ syntax, or would people prefer > "something returing an array of things" to have a different naming > convention? > > > (b) should date's be formatted strings or something else? (if so, what?) In my experience, the best format for dates is the number of seconds since the epoch (as is returned by time). No possible ambiguities, plus functions in Perl to get the desired fields from it (localtime & gmtime). Compact too, with no need for structures and references. My CAN$0.02 worth. -Paul ________________________________________________________________________ Paul Gordon Paul.Gordon@nrc.ca Genomic Technologies http://maggie.cbr.nrc.ca Institute for Marine Biosciences National Research Council CanadaFrom jason@chg.mc.duke.edu Mon Dec 18 14:37:59 2000 Date: Mon, 18 Dec 2000 09:37:59 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] Bio::SeqAnalysisParserI [was: Bio::Tools::HMMER refactoring]
On Mon, 18 Dec 2000, Ewan Birney wrote: > On Mon, 18 Dec 2000, Hilmar Lapp wrote: > > > Ewan Birney wrote: > > > > > > In addition, this interface will not go easily into a corba > > > /time-sliced/threaded framework. > > > > > > Why not have > > > > > > Bio::SeqAnalysisParserFactoryI > > > > > > $parser = $factory->create_parser(-fh => \*FILE); > > > > > > Bio::SeqAnalyisParserI > > > > > > while( $next_feature = $parser->next_feature ) { > > > > > > } > > > > > > same number of functions defined. Twice the number of interfaces, but > > > these are the interfaces I would argue we want. > > > > > > An implementation could implement ParserFactoryI and ParserI in the same > > > module if so wished. > > > > > > Whaddya reckon? Too complex for your taste hilmar? > > > > > > > Well, Jason and I had such a layout in mind first, but the question was > > how significant the performance hit might be in a CORBA context. A > > likely situation is that you have less than 10 methods for which you > > need parsers, and thousands of sequences, that is, thousands of inputs > > for each parser. We thought that in a CORBA context creating 10 objects > > instead of 10,000 does matter (in pure Perl you probably wouldn't notice > > a difference), and that therefore we wanted to be able to reuse a > > once-created parser object. > > > > Of course you could let the parser implement the factory, too, and abuse > > it as a 'reset', but IMHO this is abuse. > > > > So, what I wanted to say, I guess both Jason and I are in principle > > happy with a factory. Based on my experience with CORBA, however, there > > is a performance issue, but my experience is somewhat not up-to-date, > > and not that extensive, so it's up to you and Jason to make a decision > > here. > > >From a CORBA perspective I think both schemes would be implemented > similarly, but with teh current scheme being less clean - indeed - > potentially dangerously assumming a single client, single thread mode. > > The overhead is going to be in making the SeqFeatures in a CORBA contex, > not the analysis routines. The analysis routine creation only becomes a > bottleneck in certain cases (though this does happen - for example, we > have hit this bottleneck in Ensembl...) > > > So - we are ok in splitting the interfaces into two now? Final thoughts > from jason? [ A final thought from Jason ... This is probably lost on those that don't get the bastion of American culture - The Jerry Springer Show ] I am happy with an interface Split. I did a sort-of factory in SeqAnalysisParser recently, to simplify how to work with analysis parsers and adding those features to sequences (which I am sure has just become even more confusing for the onlookers). This proposal does a good job establishing boundaries of where functionality should come from so I like it. I think the CORBA performance questions will have to be evaluated as we get into it, but I suspect there will be other things as bottlenecks first - let's also see what we can learn from Ensembl. I also know that we have very much avoided anything to do with analysis in the current BioCorba spec to see what we could learn from the LSR idl and to delay that battle for a little while. I think we may want to revist that in the BioCorba idl some time in the future though as Bioperl begins to provide this functionality that other language projects may want to use (instead of each of them writing a Genscan, HMMer, Blast parser). So you have my vote (and every vote should count unless there is a 'rule of law' preventing that from happening). -jason > > > > > > > Hilmar > > -- > > ----------------------------------------------------------------- > > Hilmar Lapp email: hlapp@gmx.net > > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > > ----------------------------------------------------------------- > > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > <birney@ebi.ac.uk>. > ----------------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From birney@ebi.ac.uk Mon Dec 18 15:09:15 2000 Date: Mon, 18 Dec 2000 15:09:15 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Bio::Seq::GenEMBLI proposal
On Mon, 18 Dec 2000, Paul Gordon wrote: > > Decisions: > > > > (a) should we keep with the each_ syntax, or would people prefer > > "something returing an array of things" to have a different naming > > convention? > > > > > > (b) should date's be formatted strings or something else? (if so, what?) > > In my experience, the best format for dates is the number of seconds since > the epoch (as is returned by time). No possible ambiguities, plus > functions in Perl to get the desired fields from it (localtime & gmtime). > Compact too, with no need for structures and references. If only life was this easy... The data format is mangled VMS, to the nearest day in EMBL. (No idea what it is in GenBank). We need to handle, robustly, this sort of accuracy of dates. We could claim the day is the 1st second at 12am and use unix time (second since epoch).... it is going to confuse the new users... I hae a nasty feeling we might roll-our-own date object here. <sigh>. Other suggestions? > > My CAN$0.02 worth. > > -Paul > > ________________________________________________________________________ > Paul Gordon Paul.Gordon@nrc.ca > Genomic Technologies http://maggie.cbr.nrc.ca > Institute for Marine Biosciences > National Research Council Canada > > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From birney@ebi.ac.uk Mon Dec 18 15:13:48 2000 Date: Mon, 18 Dec 2000 15:13:48 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] 5.004 falling down
Jason - I can't get your latest tmpdir system to work anywhere (!). Does it pass test on your end.. I'm going to have a play now... ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From gordonp@niji.imb.nrc.ca Mon Dec 18 15:39:59 2000 Date: Mon, 18 Dec 2000 11:39:59 -0400 (AST) From: Paul Gordon gordonp@niji.imb.nrc.ca Subject: [Bioperl-l] Bio::Seq::GenEMBLI proposal
> > > (b) should date's be formatted strings or something else? (if so, what?) > > > > In my experience, the best format for dates is the number of seconds since > > the epoch (as is returned by time). No possible ambiguities, plus > > functions in Perl to get the desired fields from it (localtime & gmtime). > > Compact too, with no need for structures and references. > The data format is mangled VMS, to the nearest day in EMBL. (No idea what > it is in GenBank). We need to handle, robustly, this sort of accuracy of > dates. We could claim the day is the 1st second at 12am and use unix time > (second since epoch).... it is going to confuse the new users... Date rounding shouldn't be too confusing... I think new users are likely to be familiar with the time and localtime commands, or at least have them in the camel book. If a new date class is created, users will have to depend on the perldocs for that new class, which may be a little harded. If the plan is to go the object route, at least there are good ones around that could probably be reused, such as DateTime::Precise which has some nice parsing.From mrp@sanger.ac.uk Mon Dec 18 16:19:43 2000 Date: Mon, 18 Dec 2000 16:19:43 +0000 From: Matthew Pocock mrp@sanger.ac.uk Subject: [Bioperl-l] Bio::Seq::GenEMBLI proposal
Paul Gordon wrote: > > > > (b) should date's be formatted strings or something else? (if so, what?) > > > > > > In my experience, the best format for dates is the number of seconds since > > > the epoch (as is returned by time). No possible ambiguities, plus > > > functions in Perl to get the desired fields from it (localtime & gmtime). > > > Compact too, with no need for structures and references. > > The data format is mangled VMS, to the nearest day in EMBL. (No idea what > > it is in GenBank). We need to handle, robustly, this sort of accuracy of > > dates. We could claim the day is the 1st second at 12am and use unix time > > (second since epoch).... it is going to confuse the new users... > > Date rounding shouldn't be too confusing... > I think new users are likely to be familiar with the time and localtime > commands, or at least have them in the camel book. If a new date class is > created, users will have to depend on the perldocs for that new class, > which may be a little harded. If the plan is to go the object route, > at least there are good ones around that could probably be reused, such as > DateTime::Precise which has some nice parsing. Could you bless a scalar ref to a date epoch number into a date package? Then you can easily add accessor methods like day, month, year without users needing to know how the date is actualy stored. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-lFrom murad@godel.bioc.columbia.edu Mon Dec 18 10:15:27 2000 Date: Mon, 18 Dec 2000 11:15:27 +0100 From: Murad Nayal murad@godel.bioc.columbia.edu Subject: [Bioperl-l] tempfile creation and Bio::Tools::Run
Jason Stajich wrote: > > will do. I am also adding tempfile and tempdir methods to RootI. I will > fix ClustalW and TCoffee to use tempfile properly and then move these > modules to Bio::Tools::Run::Alignment My feeling is that tempfile and tempdir would be better put in a separate package. They are utilities that are only used by classes that need temp file creation and not common behavior for all bioperl objects. in my own perl library I have a FileTools package where I put all such methods in (including methods to extract directory names, file extensions and so on as well as methods to find executables etc. etc.). encapsulating this behavior in a package allows me to do things like clean up the temp files automatically when the program terminates. I will be happy to contribute that to bioperl if there is interest. It is not tested on NT/MAC though and will probably need some modification for that. > Only problem is we'll lose our logs from cvs but we can always do some diffs > when needed. > > -Jason > ----- Original Message ----- > From: "Peter Schattner" <schattner@alum.mit.edu> > To: "Jason Stajich" <jason@chg.mc.duke.edu> > Cc: <bioperl-l@bioperl.org> > Sent: Sunday, December 17, 2000 2:49 PM > Subject: Re: [Bioperl-l] tempfile creation and Bio::Tools::Run > > > Jason Stajich wrote: > > > > > > > > > As Hilmar has suggested, how about moving modules that run external > > > applications to Bio::Tools::Run - while things that are typically > parsing > > > output from programs can stay in Bio::Tools. > > > > > > -Jason > > > > > > > Fine with me, Jason. Can you do the moving ...? :-) > > > > Peter > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l -- Murad Nayal M.D. Ph.D. Department of Biochemistry and Molecular Biophysics College of Physicians and Surgeons of Columbia University 630 West 168th Street. New York, NY 10032 Tel: 212-305-6884 Fax: 212-305-6926From jason@chg.mc.duke.edu Mon Dec 18 17:09:32 2000 Date: Mon, 18 Dec 2000 12:09:32 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] Variation_IO.t [was: External dependencies]
I tested Variation_IO.t on irix 6.5, perl 5.004_04 got this error for make test on Variation_IO.t t/Variation_IO......Ambiguous use of id => resolved to "id" => at blib/lib/Bio/Variation/SeqDiff.pm line 915. couldn't wrap 'I>TRTRAPGTQRPRAQHLPAPVCCCCSSSSSSSSSSSSSSSSSSSSKRLAPGSSSSSRVRMVLPKPIVEAPQATWSWMRNSNLHSRSRPWSATPREVASQSLEPPWPPARGCRSSCQHLRTRMTQLPHPRCPCWAPLSPA*' at /usr/share/lib/perl5/Text/Wrap.pm line 58, <GEN0> chunk 11. dubious Test returned status 2 (wstat 512, 0x200) Number found where operator expected at (eval 103) line 1, near "*(512" (Missing operator before 512?) DIED. FAILED tests 4-26 Failed 23/26 tests, 11.54% okay perl5.004 or irix is actually really picky - complaining about dos linefeeds and is also doing this for both DB.t and game.t in the BEGIN block where we tell it to skip this test, I am really not sure why this happens - in fact any print statement in the block seems to spawn uninitialized messages... t/game..............XML::Parser::PerlSAX not loaded. This means game test cannot be executed. Skipping Use of uninitialized value at t/game.t line 26. Use of uninitialized value at t/game.t line 27. Use of uninitialized value at t/game.t line 28. Use of uninitialized value at t/game.t line 29. ok -jason On Sun, 17 Dec 2000, Heikki Lehvaslaiho wrote: > We better move this discussion back to bioperl-l so that other can > help. I've never ran perl in a Mac. > > It seems to me that either the XML modules are not installed properly > or the > Variation_IO.t is not going through the motions as it should. > > -Heikki > > Todd Richmond wrote: > > > > On 12/15/00 1:30 AM, "Heikki Lehvaslaiho" <heikki@ebi.ac.uk> wrote: > > > > > > > > I think I know what is going on. In perl distribution 5.004 and before > > > Text::Wrap (version 97.011701) was able to wrap only on word boundary. > > > The latest version (98.112902) has this fixed. > > > > > > > This indeed fixes the problem - however, it appears that not all of the > > tests are run. It starts with "1..26", then runs through the first 17 tests > > successfully and then just stops. Two new files are created: > > "mutations.out.xml" and "polymorphism.out.xml", but they are both empty. In > > addition, to run the tests on a Mac, I had to remove the "t/" from in front > > of the file names within Variation_IO.t, because otherwise it couldn't find > > them. This is probably a problem with the calling directory and differences > > between Unix and MacOS in defining what that is. Moving Variation_IO.t up > > one level outside of the "t" directory failed to fix the problem (which is > > where I assume the "make test" would have been run from). > > > > -- > > Dr Todd Richmond http://cellwall.stanford.edu/todd > > Carnegie Institution email: todd@andrew2.stanford.edu > > Department of Plant Biology fax: 1-650-325-6857 > > 260 Panama Street phone: 1-650-325-1521 x431 > > Stanford, CA 94305 > > -- > ______ _/ _/_____________________________________________________ > _/ _/ http://www.ebi.ac.uk/mutations/ > _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From birney@ebi.ac.uk Mon Dec 18 17:34:32 2000 Date: Mon, 18 Dec 2000 17:34:32 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Variation_IO.t [was: External dependencies]
On Mon, 18 Dec 2000, Jason Stajich wrote: > I tested Variation_IO.t on irix 6.5, perl 5.004_04 > > got this error for make test on Variation_IO.t > t/Variation_IO......Ambiguous use of id => resolved to "id" => at > blib/lib/Bio/Variation/SeqDiff.pm line 915. > couldn't wrap > 'I>TRTRAPGTQRPRAQHLPAPVCCCCSSSSSSSSSSSSSSSSSSSSKRLAPGSSSSSRVRMVLPKPIVEAPQATWSWMRNSNLHSRSRPWSATPREVASQSLEPPWPPARGCRSSCQHLRTRMTQLPHPRCPCWAPLSPA*' > at /usr/share/lib/perl5/Text/Wrap.pm line 58, <GEN0> chunk 11. > dubious > Test returned status 2 (wstat 512, 0x200) > Number found where operator expected at (eval 103) line 1, near "*(512" > (Missing operator before 512?) > DIED. FAILED tests 4-26 > Failed 23/26 tests, 11.54% okay > > perl5.004 or irix is actually really picky - complaining about dos > linefeeds and is also doing this for both DB.t and game.t in the BEGIN > block where we tell it to skip this test, I am really not sure why this > happens - in fact any print statement in the block seems to spawn > uninitialized messages... Hmmm. Are you up to date? I squashed some of these things 2 days ago... > > t/game..............XML::Parser::PerlSAX not loaded. This means game test > cannot be executed. Skipping > Use of uninitialized value at t/game.t line 26. > Use of uninitialized value at t/game.t line 27. > Use of uninitialized value at t/game.t line 28. > Use of uninitialized value at t/game.t line 29. > ok > > -jason > On Sun, 17 Dec 2000, Heikki Lehvaslaiho wrote: > > > We better move this discussion back to bioperl-l so that other can > > help. I've never ran perl in a Mac. > > > > It seems to me that either the XML modules are not installed properly > > or the > > Variation_IO.t is not going through the motions as it should. > > > > -Heikki > > > > Todd Richmond wrote: > > > > > > On 12/15/00 1:30 AM, "Heikki Lehvaslaiho" <heikki@ebi.ac.uk> wrote: > > > > > > > > > > > I think I know what is going on. In perl distribution 5.004 and before > > > > Text::Wrap (version 97.011701) was able to wrap only on word boundary. > > > > The latest version (98.112902) has this fixed. > > > > > > > > > > This indeed fixes the problem - however, it appears that not all of the > > > tests are run. It starts with "1..26", then runs through the first 17 tests > > > successfully and then just stops. Two new files are created: > > > "mutations.out.xml" and "polymorphism.out.xml", but they are both empty. In > > > addition, to run the tests on a Mac, I had to remove the "t/" from in front > > > of the file names within Variation_IO.t, because otherwise it couldn't find > > > them. This is probably a problem with the calling directory and differences > > > between Unix and MacOS in defining what that is. Moving Variation_IO.t up > > > one level outside of the "t" directory failed to fix the problem (which is > > > where I assume the "make test" would have been run from). > > > > > > -- > > > Dr Todd Richmond http://cellwall.stanford.edu/todd > > > Carnegie Institution email: todd@andrew2.stanford.edu > > > Department of Plant Biology fax: 1-650-325-6857 > > > 260 Panama Street phone: 1-650-325-1521 x431 > > > Stanford, CA 94305 > > > > -- > > ______ _/ _/_____________________________________________________ > > _/ _/ http://www.ebi.ac.uk/mutations/ > > _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk > > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > > ___ _/_/_/_/_/________________________________________________________ > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > Jason Stajich > jason@chg.mc.duke.edu > Center for Human Genetics > Duke University Medical Center > http://www.chg.duke.edu/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From ltzhong@yahoo.com Mon Dec 18 17:58:49 2000 Date: Mon, 18 Dec 2000 09:58:49 -0800 (PST) From: Lt Zhong ltzhong@yahoo.com Subject: [Bioperl-l] Database Management System
All, What database Management System do you use internally for bioinformatics purpose? Oracle? AS400? or others? Any suggestions? We try to set up a new database and have choices to use either Oracle or AS400. I am not sure if it's a proper question here. If not, I apologize for that. Thank you very much. LTZ __________________________________________________ Do You Yahoo!? Yahoo! Shopping - Thousands of Stores. Millions of Products. http://shopping.yahoo.com/From dblock@gene.pbi.nrc.ca Mon Dec 18 19:19:36 2000 Date: Mon, 18 Dec 2000 13:19:36 -0600 (CST) From: David Block dblock@gene.pbi.nrc.ca Subject: [Bioperl-l] Database Management System
On Mon, 18 Dec 2000, Lt Zhong wrote: > All, > > What database Management System do you use internally > for bioinformatics purpose? Oracle? AS400? or others? > Any suggestions? We try to set up a new database and > have choices to use either Oracle or AS400. There is Oracle expertise on the list, but the majority of us work with MySQL, which is free, GPL, and very fast. www.mysql.com. For bioinformatics purposes, with proper backups and a recovery plan, MySQL is everything we need. BioPerl itself is database-agnostic- the backend stuff is up to you to implement. > > I am not sure if it's a proper question here. If not, > I apologize for that. > > Thank you very much. > > LTZ > > __________________________________________________ > Do You Yahoo!? > Yahoo! Shopping - Thousands of Stores. Millions of Products. > http://shopping.yahoo.com/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > For an example installation of MySQL (with a sample schema), see the website below. -- David Block dblock@gene.pbi.nrc.ca http://bioinfo.pbi.nrc.ca/dblock/wiki Plant Biotechnology Institute National Research Council of Canada Saskatoon, SaskatchewanFrom MEColosimo@alumni.carnegiemellon.edu Mon Dec 18 20:37:01 2000 Date: Mon, 18 Dec 2000 15:37:01 -0500 From: Marc Colosimo MEColosimo@alumni.carnegiemellon.edu Subject: [Bioperl-l] Bioperl Documentation (PrimarySeq)
I just want to point out that there is an error in the POD for PrimarySeq. Under SYNOPSIS, making a seqobj from memory, the key -accession => 'X78121' is used. Well, that does not work. Under, new it is listed correctly as -accession_number => 'AL000012'. In 0.62 there is no documentation for new and I had to go look at the code to see what was up. Granted it was only a few minutes. But this is a case that the POD was wrong. I would bet that similar errors can be found in the PODs. When updating the Bioperl Documentation, I guess care should be taken to make sure that what is written is correct. MarcFrom jason@chg.mc.duke.edu Mon Dec 18 21:43:56 2000 Date: Mon, 18 Dec 2000 16:43:56 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] Bioperl Documentation (PrimarySeq)
-accession was phased out in the last release, my fault for not checking the synopsis, thanks for alerting us. This is fixed now. -jason On Mon, 18 Dec 2000, Marc Colosimo wrote: > I just want to point out that there is an error in the POD for > PrimarySeq. Under SYNOPSIS, making a seqobj from memory, the key > -accession => 'X78121' is used. Well, that does not work. Under, new it > is listed correctly as -accession_number => 'AL000012'. In 0.62 there is > no documentation for new and I had to go look at the code to see what > was up. Granted it was only a few minutes. But this is a case that the > POD was wrong. > > I would bet that similar errors can be found in the PODs. When updating > the Bioperl Documentation, I guess care should be taken to make sure > that what is written is correct. > > Marc > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From lapp@gnf.org Mon Dec 18 22:23:58 2000 Date: Mon, 18 Dec 2000 14:23:58 -0800 From: Hilmar Lapp lapp@gnf.org Subject: [Bioperl-l] Bioperl Documentation (PrimarySeq)
Marc Colosimo wrote: > > I would bet that similar errors can be found in the PODs. I'd bet so, too. That's why people should submit incorrect documentation they found as bug reports, because it's simply bugs that need to be fixed. Some people may think it's not worth reporting but as a matter of fact by not reporting you let other's run into the same problem. (In fact doc bugs are the ones I like most because they're the easiest to fix :-) Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 -------------------------------------------------------------From schattner@alum.mit.edu Mon Dec 18 22:53:40 2000 Date: Mon, 18 Dec 2000 14:53:40 -0800 From: Peter Schattner schattner@alum.mit.edu Subject: [Bioperl-l] Re: Developing Improved Bioperl Documentation
Kris Boulez wrote: > - In which format will this [tutorial] be written (Wiki, LaTex, plain text, POD, > HTML, ...) ? Good question. I'm not real experienced with developing documentation in any of these formats. Initally I am writing (I'm a bit embarassed to admit) in Microsoft Word 98 for the Macintosh. However when I am ready to upload I will convert either to plain text or to some more widely accessible format. Any suggestions from you or anyone else on the list on how to facilitate this process would be appreciated. > - Will this be put under bioperl-live or will we use a different CVS > repository (there exists a bioperl-cookbook) ? My inclination is to include the tutorial with bioperl-live so that it gets released with the software that it is to document like (bioperl.pod or biostart.pod. )From jason@chg.mc.duke.edu Mon Dec 18 22:47:11 2000 Date: Mon, 18 Dec 2000 17:47:11 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] 5.004 falling down
Let me know if you are still getting the problems with the changes I checked in this afternoon. I needed to use require 'File/Temp.pm' instead of use File::Temp as the eval was not picking up the error otherwise. Had to do the same for File::Spec as it could concievably not be installed... I am happy to move tempfile creation to yet another module if you think it makes more sense to put it in something like Bio::Root::FileTools. -Jason On Mon, 18 Dec 2000, Ewan Birney wrote: > > > Jason - > > I can't get your latest tmpdir system to work anywhere (!). Does it pass > test on your end.. > > > I'm going to have a play now... > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > <birney@ebi.ac.uk>. > ----------------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From jason@chg.mc.duke.edu Mon Dec 18 22:57:15 2000 Date: Mon, 18 Dec 2000 17:57:15 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] Variation_IO.t [was: External dependencies]
On Mon, 18 Dec 2000, Ewan Birney wrote: > On Mon, 18 Dec 2000, Jason Stajich wrote: > > > I tested Variation_IO.t on irix 6.5, perl 5.004_04 > > > > got this error for make test on Variation_IO.t > > t/Variation_IO......Ambiguous use of id => resolved to "id" => at > > blib/lib/Bio/Variation/SeqDiff.pm line 915. > > couldn't wrap > > 'I>TRTRAPGTQRPRAQHLPAPVCCCCSSSSSSSSSSSSSSSSSSSSKRLAPGSSSSSRVRMVLPKPIVEAPQATWSWMRNSNLHSRSRPWSATPREVASQSLEPPWPPARGCRSSCQHLRTRMTQLPHPRCPCWAPLSPA*' > > at /usr/share/lib/perl5/Text/Wrap.pm line 58, <GEN0> chunk 11. > > dubious > > Test returned status 2 (wstat 512, 0x200) > > Number found where operator expected at (eval 103) line 1, near "*(512" > > (Missing operator before 512?) > > DIED. FAILED tests 4-26 > > Failed 23/26 tests, 11.54% okay > > > > perl5.004 or irix is actually really picky - complaining about dos > > linefeeds and is also doing this for both DB.t and game.t in the BEGIN > > block where we tell it to skip this test, I am really not sure why this > > happens - in fact any print statement in the block seems to spawn > > uninitialized messages... > > Hmmm. Are you up to date? I squashed some of these things 2 days ago... yeah, build is up to date, but the sgi machine I can get on is running with Text::Wrap module version 97.011701, perl 5.004_04 ( which I know Heikki said causes problems). We can always skip tests if Text::Wrap version is not at least > 97.011701. -jason > > > > > > t/game..............XML::Parser::PerlSAX not loaded. This means game test > > cannot be executed. Skipping > > Use of uninitialized value at t/game.t line 26. > > Use of uninitialized value at t/game.t line 27. > > Use of uninitialized value at t/game.t line 28. > > Use of uninitialized value at t/game.t line 29. > > ok > > > > -jason > > On Sun, 17 Dec 2000, Heikki Lehvaslaiho wrote: > > > > > We better move this discussion back to bioperl-l so that other can > > > help. I've never ran perl in a Mac. > > > > > > It seems to me that either the XML modules are not installed properly > > > or the > > > Variation_IO.t is not going through the motions as it should. > > > > > > -Heikki > > > > > > Todd Richmond wrote: > > > > > > > > On 12/15/00 1:30 AM, "Heikki Lehvaslaiho" <heikki@ebi.ac.uk> wrote: > > > > > > > > > > > > > > I think I know what is going on. In perl distribution 5.004 and before > > > > > Text::Wrap (version 97.011701) was able to wrap only on word boundary. > > > > > The latest version (98.112902) has this fixed. > > > > > > > > > > > > > This indeed fixes the problem - however, it appears that not all of the > > > > tests are run. It starts with "1..26", then runs through the first 17 tests > > > > successfully and then just stops. Two new files are created: > > > > "mutations.out.xml" and "polymorphism.out.xml", but they are both empty. In > > > > addition, to run the tests on a Mac, I had to remove the "t/" from in front > > > > of the file names within Variation_IO.t, because otherwise it couldn't find > > > > them. This is probably a problem with the calling directory and differences > > > > between Unix and MacOS in defining what that is. Moving Variation_IO.t up > > > > one level outside of the "t" directory failed to fix the problem (which is > > > > where I assume the "make test" would have been run from). > > > > > > > > -- > > > > Dr Todd Richmond http://cellwall.stanford.edu/todd > > > > Carnegie Institution email: todd@andrew2.stanford.edu > > > > Department of Plant Biology fax: 1-650-325-6857 > > > > 260 Panama Street phone: 1-650-325-1521 x431 > > > > Stanford, CA 94305 > > > > > > -- > > > ______ _/ _/_____________________________________________________ > > > _/ _/ http://www.ebi.ac.uk/mutations/ > > > _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk > > > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > > > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > > > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > > > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > > > ___ _/_/_/_/_/________________________________________________________ > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@bioperl.org > > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > > > > Jason Stajich > > jason@chg.mc.duke.edu > > Center for Human Genetics > > Duke University Medical Center > > http://www.chg.duke.edu/ > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > <birney@ebi.ac.uk>. > ----------------------------------------------------------------- > > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From heikki@ebi.ac.uk Tue Dec 19 10:09:57 2000 Date: Tue, 19 Dec 2000 10:09:57 +0000 From: Heikki Lehvaslaiho heikki@ebi.ac.uk Subject: [Bioperl-l] Variation_IO.t [was: External dependencies]
Jason, Could you cvs update and try again. I added the quotes around -id and put in the test the dependency to Text::Wrap 98. I can not test this properly as my systems all have the latest Text::Wrap module. -Heikki Jason Stajich wrote: > > On Mon, 18 Dec 2000, Ewan Birney wrote: > > > On Mon, 18 Dec 2000, Jason Stajich wrote: > > > > > I tested Variation_IO.t on irix 6.5, perl 5.004_04 > > > > > > got this error for make test on Variation_IO.t > > > t/Variation_IO......Ambiguous use of id => resolved to "id" => at > > > blib/lib/Bio/Variation/SeqDiff.pm line 915. > > > couldn't wrap > > > 'I>TRTRAPGTQRPRAQHLPAPVCCCCSSSSSSSSSSSSSSSSSSSSKRLAPGSSSSSRVRMVLPKPIVEAPQATWSWMRNSNLHSRSRPWSATPREVASQSLEPPWPPARGCRSSCQHLRTRMTQLPHPRCPCWAPLSPA*' > > > at /usr/share/lib/perl5/Text/Wrap.pm line 58, <GEN0> chunk 11. > > > dubious > > > Test returned status 2 (wstat 512, 0x200) > > > Number found where operator expected at (eval 103) line 1, near "*(512" > > > (Missing operator before 512?) > > > DIED. FAILED tests 4-26 > > > Failed 23/26 tests, 11.54% okay > > > > > > perl5.004 or irix is actually really picky - complaining about dos > > > linefeeds and is also doing this for both DB.t and game.t in the BEGIN > > > block where we tell it to skip this test, I am really not sure why this > > > happens - in fact any print statement in the block seems to spawn > > > uninitialized messages... > > > > Hmmm. Are you up to date? I squashed some of these things 2 days ago... > > yeah, build is up to date, but the sgi machine I can get on is running > with Text::Wrap module version 97.011701, perl 5.004_04 ( which I know > Heikki said causes problems). We can always skip tests if Text::Wrap > version is not at least > 97.011701. > > -jason > > > > > > > > > > t/game..............XML::Parser::PerlSAX not loaded. This means game test > > > cannot be executed. Skipping > > > Use of uninitialized value at t/game.t line 26. > > > Use of uninitialized value at t/game.t line 27. > > > Use of uninitialized value at t/game.t line 28. > > > Use of uninitialized value at t/game.t line 29. > > > ok > > > > > > -jason > > > On Sun, 17 Dec 2000, Heikki Lehvaslaiho wrote: > > > > > > > We better move this discussion back to bioperl-l so that other can > > > > help. I've never ran perl in a Mac. > > > > > > > > It seems to me that either the XML modules are not installed properly > > > > or the > > > > Variation_IO.t is not going through the motions as it should. > > > > > > > > -Heikki > > > > > > > > Todd Richmond wrote: > > > > > > > > > > On 12/15/00 1:30 AM, "Heikki Lehvaslaiho" <heikki@ebi.ac.uk> wrote: > > > > > > > > > > > > > > > > > I think I know what is going on. In perl distribution 5.004 and before > > > > > > Text::Wrap (version 97.011701) was able to wrap only on word boundary. > > > > > > The latest version (98.112902) has this fixed. > > > > > > > > > > > > > > > > This indeed fixes the problem - however, it appears that not all of the > > > > > tests are run. It starts with "1..26", then runs through the first 17 tests > > > > > successfully and then just stops. Two new files are created: > > > > > "mutations.out.xml" and "polymorphism.out.xml", but they are both empty. In > > > > > addition, to run the tests on a Mac, I had to remove the "t/" from in front > > > > > of the file names within Variation_IO.t, because otherwise it couldn't find > > > > > them. This is probably a problem with the calling directory and differences > > > > > between Unix and MacOS in defining what that is. Moving Variation_IO.t up > > > > > one level outside of the "t" directory failed to fix the problem (which is > > > > > where I assume the "make test" would have been run from). > > > > > > > > > > -- > > > > > Dr Todd Richmond http://cellwall.stanford.edu/todd > > > > > Carnegie Institution email: todd@andrew2.stanford.edu > > > > > Department of Plant Biology fax: 1-650-325-6857 > > > > > 260 Panama Street phone: 1-650-325-1521 x431 > > > > > Stanford, CA 94305 > > > > > > > > -- > > > > ______ _/ _/_____________________________________________________ > > > > _/ _/ http://www.ebi.ac.uk/mutations/ > > > > _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk > > > > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > > > > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > > > > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > > > > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > > > > ___ _/_/_/_/_/________________________________________________________ > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l@bioperl.org > > > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > > > > > > > Jason Stajich > > > jason@chg.mc.duke.edu > > > Center for Human Genetics > > > Duke University Medical Center > > > http://www.chg.duke.edu/ > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@bioperl.org > > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > > > > ----------------------------------------------------------------- > > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > > <birney@ebi.ac.uk>. > > ----------------------------------------------------------------- > > > > > > Jason Stajich > jason@chg.mc.duke.edu > Center for Human Genetics > Duke University Medical Center > http://www.chg.duke.edu/ -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________From clay@shirky.com Tue Dec 19 15:32:06 2000 Date: Tue, 19 Dec 2000 10:32:06 -0500 (EST) From: Clay Shirky clay@shirky.com Subject: [Bioperl-l] OT: List of bio software online?
This is not specifically bioperl related, but is there a good list of available biological software anywhere on the Web? I used to use Biocat (http://www.ebi.ac.uk/biocat/), but it seems they have not updated in ahlf a year, and I know that the field is exploding, so I wonder it there is a more up to date list? Thanks, -clay shirkyFrom jdiggans@genelogic.com Tue Dec 19 23:24:12 2000 Date: Tue, 19 Dec 2000 18:24:12 -0500 From: J.C. Diggans jdiggans@genelogic.com Subject: [Bioperl-l] Empty FASTA files with Bio::SeqIO
I'm fairly sure this has come up in months past but a quick search of the archives turned up nothing. Bio::SeqIO chokes when trying to read in files that have one or more empty FASTA sequences. Was this functionality desired or just a by-product of the parsing method? Empty sequences are all too possible in a production environment so was this decision intentional or would a patch be useful? Regards, - jc ------------------------------------------------- James Diggans Phone: 301.987.1756 Gene Logic, Inc. FAX: 301.987.1701 jdiggans@genelogic.com Cell: 301.908.2477 -------------------------------------------------From lapp@gnf.org Tue Dec 19 23:41:28 2000 Date: Tue, 19 Dec 2000 15:41:28 -0800 From: Hilmar Lapp lapp@gnf.org Subject: [Bioperl-l] Empty FASTA files with Bio::SeqIO
"J.C. Diggans" wrote: > > I'm fairly sure this has come up in months past but a quick search of > the archives turned up nothing. Bio::SeqIO chokes when trying to read in > files that have one or more empty FASTA sequences. Was this > functionality desired or just a by-product of the parsing method? Empty > sequences are all too possible in a production environment so was this > decision intentional or would a patch be useful? > It indeed came up some months ago, and it should be somewhere in the archives. The current design is intentional, but we decided to support empty sequences and reading and writing them in FASTA format (NOT in other formats). This will be part of the 0.7 release features. If you look at the task list, you'll find it. Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 -------------------------------------------------------------From krbou@pgsgent.be Wed Dec 20 08:04:16 2000 Date: Wed, 20 Dec 2000 09:04:16 +0100 From: Kris Boulez krbou@pgsgent.be Subject: [Bioperl-l] SeqIO (stress) testing
I played a bit around with SeqIO (starting from one format and writing/reading it back in different formats) and found some interesting things. I didn't have time to get in to all of these, will do hopefully this evening. One thing I already found out (and was able to document properly) is that starting from t/test.genbank, writing it out in PIR format. It then is impossible for BioPerl to read this file back in. As I have little or no knowledge about the PIR format I submitted a bug report (#876). For the following I don't have a demo script ready yet (will do this evening) - starting from t/test.genbank, writing a swiss-prot file gives (we die, no error thrown) Programming error - cannot called write_line_swissprot_regex with different length pre1 and pre2 tags! at /usr/lib/perl5/site_perl/5.005/Bio/SeqIO/swiss.pm line 949, <GEN2> chunk 176. ( when adding $pre1 and $pre2 to the die() ) Programming error - cannot called write_line_swissprot_regex with different length pre1 and pre2 tags (FT sig_peptide 76 123 ) (FT )! at /usr/lib/perl5/site_perl/5.005/Bio/SeqIO/swiss.pm line 949, <GEN2> chunk 176. - starting from t/test.genbank, writing a gcg file, reading this gcg file gives -------------------- EXCEPTION -------------------- MSG: Looks like start of another sequence. See documentation. CONTEXT: Error in uNKNOWN CONTEXT SCRIPT: seqtest.pl STACK: Bio::SeqIO::gcg::next_seq(123) main::seqtest.pl(14) --------------------------------------------------- - starting from t/test.embl, there is a problem for SeqIO to read a gcg file it wrote himself (it just loops forever). I will investigate this one further as it's not clear when/what happens. By looking at the test (and test sequences) we have now I saw that we only try to read the first sequence from our test sequence files (apart from GCG, which reads more then one file). The test.embl even contains only one sequence. I think that we should test for reading/writing multiple sequences from one file. Kris,From hlapp@gmx.net Wed Dec 20 09:54:59 2000 Date: Wed, 20 Dec 2000 01:54:59 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Re: SeqIO bug report
bioperl-bug-admin@bioperl.org wrote: > > > The following program (change the location of the test > genbank file) > > -------------- program ------------------------ > #!/usr/local/bin/perl -w > > use strict; > use Bio::SeqIO; > > my $genbank_file = "/home/kris/src/cvs/bioperl/bioperl-live/t/test.genbank"; > my $seqin = Bio::SeqIO->new(-file => $genbank_file, -format => 'genbank'); > my $seqout = Bio::SeqIO->new(-file => ">pir.out", -format => 'pir'); > while (my $seq = $seqin->next_seq() ) { > $seqout->write_seq($seq); > } > print STDOUT "wrote pir file\n"; > my $seqin = Bio::SeqIO->new(-file => "pir.out", -format => 'pir'); I suppose this is not the cause of the error you see but be aware that opening a stream on a file you haven't closed yet may give unexpected results. You should either first $seqout = undef; or (cleaner) call $seqout->close(); With either of these in effect, does the error persist? (Probably it does.) Hilmar > my $seqout = Bio::SeqIO->new(-file => ">ttt", -format => 'gcg'); # whatever > while (my $seq = $seqin->next_seq() ) { > $seqout->write_seq($seq); > } > -------------------- end program ------------- > > gives the following output > > [ warnings about parsing genbank file ] > wrote pir file > > -------------------- EXCEPTION -------------------- > MSG: > CONTEXT: Error in uNKNOWN CONTEXT > SCRIPT: seqtest_pir.pl > STACK: > Bio::Root::RootI::new(87) > Bio::SeqIO::pir::next_seq(111) > main::seqtest_pir.pl(15) > --------------------------------------------------- > > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-guts-l -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From krbou@pgsgent.be Wed Dec 20 09:59:06 2000 Date: Wed, 20 Dec 2000 10:59:06 +0100 From: Kris Boulez krbou@pgsgent.be Subject: [Bioperl-l] Re: SeqIO bug report
Quoting Hilmar Lapp (hlapp@gmx.net): > bioperl-bug-admin@bioperl.org wrote: > > > > > > The following program (change the location of the test > > genbank file) > > > > -------------- program ------------------------ > > #!/usr/local/bin/perl -w > > > > use strict; > > use Bio::SeqIO; > > > > my $genbank_file = "/home/kris/src/cvs/bioperl/bioperl-live/t/test.genbank"; > > my $seqin = Bio::SeqIO->new(-file => $genbank_file, -format => 'genbank'); > > my $seqout = Bio::SeqIO->new(-file => ">pir.out", -format => 'pir'); > > while (my $seq = $seqin->next_seq() ) { > > $seqout->write_seq($seq); > > } > > print STDOUT "wrote pir file\n"; > > my $seqin = Bio::SeqIO->new(-file => "pir.out", -format => 'pir'); > > I suppose this is not the cause of the error you see but be aware that > opening a stream on a file you haven't closed yet may give unexpected > results. You should either first > > $seqout = undef; > > or (cleaner) call > > $seqout->close(); > > With either of these in effect, does the error persist? (Probably it > does.) > Minutes after submitting the bug report I realized this myself (the stress testing script I used to detect these, does it). When adding $seqout->close(); system ("cat pir.out"); I get --------- output -------------------- wrote pir file >P1;DDU63596 Dictyostelium discoideum Tdd-4 transposable element flanking sequence, clone p427/428 right end. >P1;DDU63596 GTGACAGTTG GCTGTCAGAC ATACAATGAT TGTTTAGAAG AGGAGAAGAT TGATCCGGAG TACCGTGATA GTATTTTAAA AACTATGAAA GCGGGAATAC TTAATGGTAA ACTAGTTAGA [ ... ] TTACGGCGAG ATGGTTTCTC CTCGCCTGGC CACTCAGCCT TAGTTGTCTC TGTTGTCTTA TAGAGGTCTA CTTGAAGAAG GAAAAACAGG GGTCATGGTT TGACTGTCCT GTGAGCCCTT CTTCCCTGCC TCCCCCACTC ACAGTGACCC GGAATCTGCA GTGCTAGTCT CCCGGAACTA TC -------------------- EXCEPTION -------------------- MSG: CONTEXT: Error in uNKNOWN CONTEXT SCRIPT: seqtest_pir.pl STACK: Bio::Root::RootI::new(87) Bio::SeqIO::pir::next_seq(111) main::seqtest_pir.pl(18) --------------------------------------------------- ------------ end of output ---------------------From hlapp@gmx.net Wed Dec 20 10:13:33 2000 Date: Wed, 20 Dec 2000 02:13:33 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] SeqIO (stress) testing
Kris Boulez wrote: > > - starting from t/test.genbank, writing a swiss-prot file gives (we die, > no error thrown) test.genbank is DNA. Do you translate it? Genbank (DNA) and Swissprot feature tables are basically incompatible. The post I quoted lately contains an example I think. (E.g., you can't have 'source' in a Swissprot feature table; the latter is supposed to contain only protein sites.) > > - starting from t/test.genbank, writing a gcg file, reading this gcg > file gives > -------------------- EXCEPTION -------------------- > MSG: Looks like start of another sequence. See documentation. > CONTEXT: Error in uNKNOWN CONTEXT > SCRIPT: seqtest.pl > STACK: > Bio::SeqIO::gcg::next_seq(123) > main::seqtest.pl(14) > --------------------------------------------------- > > - starting from t/test.embl, there is a problem for SeqIO to read a gcg > file it wrote himself (it just loops forever). I will investigate this > one further as it's not clear when/what happens. > The GCG module seems to be broken. I wanted to use it some time ago, but it even didn't want to read simple sequence files. At that time we had GCG 10, maybe something in the format has changed. GCG format is problematic, because there really isn't a genuine GCG format. A Genbank sequence in GCG format is in fact the sequence in Genbank format with 1 header line prepended and the sequence formatted specially (with a line containing checksum etc, and the notorious two dots). Likewise for a EMBL sequence. How many people have a serious interest in this module? If there are some, could you also provide some example files of a recent GCG version (e.g., 10.1); I personally don't have access to GCG presently. > By looking at the test (and test sequences) we have now I saw that we > only try to read the first sequence from our test sequence files (apart > >from GCG, which reads more then one file). The test.embl even contains > only one sequence. I think that we should test for reading/writing > multiple sequences from one file. > Genbank format and FASTA are tested for reads of multiple entries. (Check further down the script.) Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From hlapp@gmx.net Wed Dec 20 10:24:38 2000 Date: Wed, 20 Dec 2000 02:24:38 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Re: SeqIO bug report
Kris Boulez wrote: > > -------------------- EXCEPTION -------------------- > MSG: > CONTEXT: Error in uNKNOWN CONTEXT > SCRIPT: seqtest_pir.pl > STACK: > Bio::Root::RootI::new(87) > Bio::SeqIO::pir::next_seq(111) > main::seqtest_pir.pl(18) > --------------------------------------------------- > That's weird. Line 87 in RootI is the begin of the BEGIN block. If there's something wrong with it, why doesn't it simply always pop up? Which perl version are you using on your Linux box? I guess 5.005. Can we extend the bug submission page such that people are requested to denote their version of Perl, too? Chris? Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From krbou@pgsgent.be Wed Dec 20 10:59:11 2000 Date: Wed, 20 Dec 2000 11:59:11 +0100 From: Kris Boulez krbou@pgsgent.be Subject: [Bioperl-l] Re: SeqIO bug report
Quoting Hilmar Lapp (hlapp@gmx.net): > Kris Boulez wrote: > > > > -------------------- EXCEPTION -------------------- > > MSG: > > CONTEXT: Error in uNKNOWN CONTEXT > > SCRIPT: seqtest_pir.pl > > STACK: > > Bio::Root::RootI::new(87) > > Bio::SeqIO::pir::next_seq(111) > > main::seqtest_pir.pl(18) > > --------------------------------------------------- > > > > That's weird. Line 87 in RootI is the begin of the BEGIN block. If > there's something wrong with it, why doesn't it simply always pop up? > > Which perl version are you using on your Linux box? I guess 5.005. > This is perl, version 5.005_03 built for i386-linux Could this be because the PIR file contains DNA sequence (shouldn't be as GATC are legal amino acid). Kris,From krbou@pgsgent.be Wed Dec 20 11:13:57 2000 Date: Wed, 20 Dec 2000 12:13:57 +0100 From: Kris Boulez krbou@pgsgent.be Subject: [Bioperl-l] SeqIO (stress) testing
Quoting Hilmar Lapp (hlapp@gmx.net): > Kris Boulez wrote: > > > > - starting from t/test.genbank, writing a swiss-prot file gives (we die, > > no error thrown) > > test.genbank is DNA. Do you translate it? > Nope, checked test.fasta to be protein, forgot this one. Should this matter (i.e. does Swissprot checks it is writing a protein sequence) ? > Genbank (DNA) and Swissprot feature tables are basically incompatible. > The post I quoted lately contains an example I think. (E.g., you can't > have 'source' in a Swissprot feature table; the latter is supposed to > contain only protein sites.) > > > > > - starting from t/test.genbank, writing a gcg file, reading this gcg > > file gives > > -------------------- EXCEPTION -------------------- > > MSG: Looks like start of another sequence. See documentation. > > CONTEXT: Error in uNKNOWN CONTEXT > > SCRIPT: seqtest.pl > > STACK: > > Bio::SeqIO::gcg::next_seq(123) > > main::seqtest.pl(14) > > --------------------------------------------------- > > > > - starting from t/test.embl, there is a problem for SeqIO to read a gcg > > file it wrote himself (it just loops forever). I will investigate this > > one further as it's not clear when/what happens. > > > > The GCG module seems to be broken. I wanted to use it some time ago, but > it even didn't want to read simple sequence files. At that time we had > GCG 10, maybe something in the format has changed. GCG format is > problematic, because there really isn't a genuine GCG format. A Genbank > sequence in GCG format is in fact the sequence in Genbank format with 1 > header line prepended and the sequence formatted specially (with a line > containing checksum etc, and the notorious two dots). Likewise for a > EMBL sequence. > > How many people have a serious interest in this module? If there are > some, could you also provide some example files of a recent GCG version > (e.g., 10.1); I personally don't have access to GCG presently. > Given the widespread use of GCG there is (I guess) an intrest. We found out this undefinedness of the GCG format in another project as well. > > By looking at the test (and test sequences) we have now I saw that we > > only try to read the first sequence from our test sequence files (apart > > >from GCG, which reads more then one file). The test.embl even contains > > only one sequence. I think that we should test for reading/writing > > multiple sequences from one file. > > > > Genbank format and FASTA are tested for reads of multiple entries. > (Check further down the script.) > I missed the Genbank test. As far as I can see the test for Fasta is using Bio::SeqIO::MultiFile (test 17) or works on one sequence (tests 2-5). Kris,From hlapp@gmx.net Wed Dec 20 18:23:39 2000 Date: Wed, 20 Dec 2000 10:23:39 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] SeqIO (stress) testing
Kris Boulez wrote: > I missed the Genbank test. As far as I can see the test for Fasta is > using Bio::SeqIO::MultiFile (test 17) or works on one sequence (tests > 2-5). > There is multiple_fasta.t which at least checks whether looping over all seqs crashes the program. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From hlapp@gmx.net Wed Dec 20 18:51:41 2000 Date: Wed, 20 Dec 2000 10:51:41 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] SeqIO (stress) testing
Kris Boulez wrote: > > Quoting Hilmar Lapp (hlapp@gmx.net): > > Kris Boulez wrote: > > > > > > - starting from t/test.genbank, writing a swiss-prot file gives (we die, > > > no error thrown) > > > > test.genbank is DNA. Do you translate it? > > > Nope, checked test.fasta to be protein, forgot this one. > Should this matter (i.e. does Swissprot checks it is writing a protein > sequence) ? > Probably it shouldn't matter. But I can imagine that you run into trouble if you try to write a Genbank feature table in Swissprot format. One should check what the problem is though. Significant information loss is, however, almost unavoidable. E.g. in Genpept you have active site and a binding site both annotated as site, and a tag site_type tells you what sort of site it is. In Swissprot both sites would have different keys, and there are no tags apart from 1 note-like comment. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From ewing@genome.stanford.edu Wed Dec 20 20:11:17 2000 Date: Wed, 20 Dec 2000 12:11:17 -0800 (PST) From: Rob Ewing ewing@genome.stanford.edu Subject: [Bioperl-l] Exceptions thrown when parsing embl format
Hi, When parsing embl format sequence entries, I run into problems due to slight 'errors' in the embl format. for example , I get the following error message: -------------------- EXCEPTION -------------------- MSG: Weird location line [1280..(1547.1700)] in reading GenBank CONTEXT: Error in uNKNOWN CONTEXT SCRIPT: e STACK: Bio::SeqIO::FTHelper::_generic_seqfeature(160) Bio::SeqIO::embl::next_seq(296) main::-e(1) --------------------------------------------------- when parsing an embl format entry that has the line : FT exon 1280..(1547.1700) (I assume that the parser cannot figure out the start and end of this exon feature). How can I deal with this - is there any way to prevent an exception being thrown and just move on to the next entry in the file. Or should I look at ways of excluding the 'bad' entries from the file? (All I want to do is convert a large embl format file to fasta format!) thanks Rob.From jdiggans@genelogic.com Wed Dec 20 20:25:29 2000 Date: Wed, 20 Dec 2000 15:25:29 -0500 From: J.C. Diggans jdiggans@genelogic.com Subject: [Bioperl-l] Empty FASTA files with Bio::SeqIO
> It indeed came up some months ago, and it should be somewhere in the > archives. The current design is intentional, but we decided to support > empty sequences and reading and writing them in FASTA format (NOT in other > formats). This will be part of the 0.7 release features. If you look at the > task list, you'll find it. I went ahead and patched my local version from 0.6.2 (patch below). It was a quick fix, can anyone think of a case in which this would fail? - jc 122,123c122,134 < my ($top,$sequence) = $entry =~ /^(.+?)\n([^>]+)/s < or $self->throw("Can't parse entry"); --- > # Check for empty sequences and handle gracefully > my ($top,$sequence); > if( $entry =~ /^(.+?)\n([^>]+)/s ) { > # There is valid sequence present > ($top,$sequence) = $entry =~ /^(.+?)\n([^>]+)/s > or $self->throw("Can't parse entry"); > } else { > # There is no sequence present, > $top = $entry =~ /^(.+?)\n/ > or $self->throw("Can't parse entry"); # save top > $sequence = ""; # set sequence to empty string > } > ------------------------------------------------- James Diggans Phone: 301.987.1756 Gene Logic, Inc. FAX: 301.987.1701 jdiggans@genelogic.com Cell: 301.908.2477 -------------------------------------------------From birney@ebi.ac.uk Wed Dec 20 20:34:00 2000 Date: Wed, 20 Dec 2000 20:34:00 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Exceptions thrown when parsing embl format
On Wed, 20 Dec 2000, Rob Ewing wrote: > Hi, > When parsing embl format sequence entries, I run into problems due to > slight 'errors' in the embl format. we are trying to have a bettererror control and also fuzzy location parsing in 0.7, but noone has really got this under control yet. I realise it is a pain in the arse. GenBank/EMBL is a pain in the arse, period (it is not their fault... legacy data...). > > for example , I get the following error message: > > -------------------- EXCEPTION -------------------- > MSG: Weird location line [1280..(1547.1700)] in reading GenBank > CONTEXT: Error in uNKNOWN CONTEXT > SCRIPT: e > STACK: > Bio::SeqIO::FTHelper::_generic_seqfeature(160) > Bio::SeqIO::embl::next_seq(296) > main::-e(1) > --------------------------------------------------- > > when parsing an embl format entry that has the line : > > FT exon 1280..(1547.1700) > > (I assume that the parser cannot figure out the start and end of this > exon feature). > How can I deal with this - is there any way to prevent an exception > being thrown and just move on to the next entry in the file. Or should > I look at ways of excluding the 'bad' entries from the file? > (All I want to do is convert a large embl format file to fasta format!) > > thanks > > Rob. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From lapp@gnf.org Thu Dec 21 00:03:10 2000 Date: Wed, 20 Dec 2000 16:03:10 -0800 From: Hilmar Lapp lapp@gnf.org Subject: [Bioperl-l] Empty FASTA files with Bio::SeqIO
"J.C. Diggans" wrote: > > I went ahead and patched my local version from 0.6.2 (patch below). It > was a quick fix, can anyone think of a case in which this would fail? > > - jc > > 122,123c122,134 > < my ($top,$sequence) = $entry =~ /^(.+?)\n([^>]+)/s > < or $self->throw("Can't parse entry"); > --- > > # Check for empty sequences and handle gracefully > > my ($top,$sequence); > > if( $entry =~ /^(.+?)\n([^>]+)/s ) { > > # There is valid sequence present > > ($top,$sequence) = $entry =~ /^(.+?)\n([^>]+)/s > > or $self->throw("Can't parse entry"); > > } else { > > # There is no sequence present, > > $top = $entry =~ /^(.+?)\n/ > > or $self->throw("Can't parse entry"); # save top > > $sequence = ""; # set sequence to empty string > > } > > > The correctly FASTA-formatted empty seq ought to have an empty line after the '>'-line. I think we should check for that, just to be sure we're not misinterpreting something. Second, Bio::Seq currently won't let you define an empty seq. This needs to be fixed, too. If your fix works for you, that's fine. 0.7 will still take a while anyway, unless someone donates a fuzzy-location full coverage package for christmas. Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 -------------------------------------------------------------From MEColosimo@alumni.carnegiemellon.edu Thu Dec 21 00:16:28 2000 Date: Wed, 20 Dec 2000 19:16:28 -0500 From: Marc Colosimo MEColosimo@alumni.carnegiemellon.edu Subject: [Bioperl-l] Different File Formats
I have a somewhat novice question. Is there a paper or place (book, web site, etc.) that describes all of the generally used formats? I know FASTA and can pick the parts out of a genbank file that I need. But there seems to be a lot of other formats out there (GCG, EMBL, and others) and I really don't know what use they might be or what's in them. Thanks in advance. MarcFrom cstrassel@netgenics.com Thu Dec 21 14:37:07 2000 Date: Thu, 21 Dec 2000 09:37:07 -0500 From: Strassel, Chris cstrassel@netgenics.com Subject: [Bioperl-l] Different File Formats
Marc, Here are 3 locations that might be helpful. They describe databases, but also go through the formats and define each line/field. Chris For GenBank: http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html SwissProt: http://www.expasy.ch/txt/userman.txt GeneSeq: http://www.derwent.com/data/geneuserguide.pdf -----Original Message----- From: Marc Colosimo [mailto:MEColosimo@alumni.carnegiemellon.edu] Sent: Wednesday, December 20, 2000 7:16 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] Different File Formats I have a somewhat novice question. Is there a paper or place (book, web site, etc.) that describes all of the generally used formats? I know FASTA and can pick the parts out of a genbank file that I need. But there seems to be a lot of other formats out there (GCG, EMBL, and others) and I really don't know what use they might be or what's in them. Thanks in advance. Marc _______________________________________________ Bioperl-l mailing list Bioperl-l@bioperl.org http://bioperl.org/mailman/listinfo/bioperl-lFrom ajm6q@virginia.edu Fri Dec 29 02:44:59 2000 Date: Thu, 28 Dec 2000 21:44:59 -0500 (EST) From: Aaron J Mackey ajm6q@virginia.edu Subject: [Bioperl-l] Empty FASTA files with Bio::SeqIO
On Wed, 20 Dec 2000, Hilmar Lapp wrote: > The correctly FASTA-formatted empty seq ought to have an empty line after > the '>'-line. I think we should check for that, just to be sure we're not > misinterpreting something. The fasta programs accept: >sequence 1 >sequence 2 ATCGCGCA >sequence 3 > >sequence 5 >sequence 6 GATTACA Note the "sequence 4" that has no description at all. Of course these examples represent garbage input, but I just wanted to clarify *all* the possibilities. -Aaron -- o ~ ~ ~ ~ ~ ~ o / Aaron J Mackey \ \ Dr. Pearson Laboratory / \ University of Virginia \ / (804) 924-2821 \ \ amackey@virginia.edu / o ~ ~ ~ ~ ~ ~ oFrom hlapp@gmx.net Fri Dec 29 10:35:55 2000 Date: Fri, 29 Dec 2000 02:35:55 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Bio::DB::*
The t/DB.t script failed on my machine, I don't know where it really worked. I fixed the immediate problem in Bio::DB::WebDBSeqI, and did a couple of other things, see the cvs log messages. My remaining concerns are that 1) GenBank retrieval by accession sometimes fails arbitrarily (I haven't tracked it down, but apparently there is no sequence part sometimes -- weird. Problem at NCBI? Note that in contrast to before we now use the qmap.cgi interface. Retrieval by ID didn't fail a single time. Does anyone else observe this?), and 2) the docs in WebDBSeqI stipulate that an array ref be passed to the get_Stream_by_XXX() methods, in contrast to the get_Seq_by_XXX() methods, which require a scalar. Do we want to keep this? Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From hlapp@gmx.net Fri Dec 29 10:38:33 2000 Date: Fri, 29 Dec 2000 02:38:33 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Empty FASTA files with Bio::SeqIO
Aaron J Mackey wrote: > > On Wed, 20 Dec 2000, Hilmar Lapp wrote: > > > The correctly FASTA-formatted empty seq ought to have an empty line after > > the '>'-line. I think we should check for that, just to be sure we're not > > misinterpreting something. > > The fasta programs accept: > The fasta programs might indeed do, but many others are much pickier. I think it was Kris who did the survey some time ago, and the empty line following the id-line seems to be reasonably safe. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From ajm6q@virginia.edu Fri Dec 29 13:38:11 2000 Date: Fri, 29 Dec 2000 08:38:11 -0500 (EST) From: Aaron J Mackey ajm6q@virginia.edu Subject: [Bioperl-l] Empty FASTA files with Bio::SeqIO
On Fri, 29 Dec 2000, Hilmar Lapp wrote: > The fasta programs might indeed do, but many others are much > pickier. I think it was Kris who did the survey some time ago, and > the empty line following the id-line seems to be reasonably safe. I'm just saying that the fasta "standard" is not so picky, so perhaps you should consider not being as picky as the peck of pickier programs. The only rule is that a '>' at the beginning of a line marks the beginning of a new sequence, no matter what follows or proceeds it. -Aaron -- o ~ ~ ~ ~ ~ ~ o / Aaron J Mackey \ \ Dr. Pearson Laboratory / \ University of Virginia \ / (804) 924-2821 \ \ amackey@virginia.edu / o ~ ~ ~ ~ ~ ~ oFrom birney@ebi.ac.uk Fri Dec 1 08:21:06 2000 From: birney@ebi.ac.uk (Ewan Birney) Date: Fri, 1 Dec 2000 08:21:06 +0000 (GMT) Subject: [Bioperl-l] 0.7 release: tasks & assignments In-Reply-To: <3A26A446.E96B7EDE@gmx.net> Message-ID:
Hi there,
Does bioperl have =
such perl module can perform SQL-like query on a XML file, including =
inserting, modifying and updating the XML file?
Thanks in =
advance!
David
David Zhao
Drug Discovery IM&T
The R.W.Johnson PRI
3210 Merryfield Row
San Diego, CA 92121
=
Tel: (858) =
784-3184