From biopython at maubp.freeserve.co.uk Sun Aug 1 06:01:37 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 1 Aug 2010 11:01:37 +0100 Subject: [BioSQL-l] migration to github In-Reply-To: <04BBA390-6BC0-4700-8B14-812F6E2E4705@illinois.edu> References: <22BC0098-7BEB-41E3-9EE6-D8987323CC24@drycafe.net> <04BBA390-6BC0-4700-8B14-812F6E2E4705@illinois.edu> Message-ID: On Sun, Aug 1, 2010 at 12:15 AM, Chris Fields wrote: > > On Jul 30, 2010, at 3:17 AM, Peter wrote: > >> On Fri, Jul 30, 2010 at 12:08 AM, Hilmar Lapp wrote: >>> >>> >>> Finally, does anyone have a strong feeling about the capitalization of >>> BioSQL on Github? All lowercase (github.com/biosql) or capitalized >>> (github.com/BioSQL)? >> >> Personally I'd pick lowercase - it seems more commonly used >> for repositories and usernames in general. In our case it also >> avoided Biopython vs BioPython confusion. Curiously most but >> not all of the BioPerl repositories are in lowercase... >> >> Peter > > Okay, organization and repo name are both now 'biosql'. ?No take-backs! Thanks for sorting this out :) > Re: upper case with bioperl repos, do you mean the Bio-* ones? >?The emphasis there that (1) they aren't part of bioperl core but are > still part of the Bio namespace, and (2) the dist will match the actual > namespace and the module name (Bio::FeatureIO, for instance), > unlike BioPerl and the others, and (3) there is some precedent > (Bio::Graphics being one). ?This simple thing makes it a lot easier > for keeping track of names, and the module name can be used for > CPAN installation, indexing, and documentation. Not being familiar with the specifics it just looked inconsistent, but it sounds like there is a rational and practical scheme in place. Thanks for explaining things. Regards, Peter From rmb32 at cornell.edu Sun Aug 1 15:17:14 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Sun, 01 Aug 2010 12:17:14 -0700 Subject: [BioSQL-l] GMOD Evo Hackathon Open Call for Participation Message-ID: <4C55C83A.3060700@cornell.edu> We are seeking participants for the GMOD Tools for Evolutionary Biology Hackathon, held November 8-12, 2010 at the US National Evolutionary Synthesis Center (NESCent) in Durham, NC. This hackathon targets three critical gaps in the capabilities of the GMOD toolbox that currently limit its utility for evolutionary research: 1. Visualization of comparative genomics data 2. Visualization of phylogenetic data and trees 3. Support for population diversity and phenotype data If you are interested in these areas and have relevant expertise, you are strongly encouraged to apply. Relevant areas of expertise include more than just software development: if you are a GMOD power user, visualization guru, domain expert (comparative, phylogenetics, population, ...), or documentation wizard, then your skills are needed! How To Apply: Fill out the online application form at http://bit.ly/gmodevohack. Applications are due August 25. About GMOD: GMOD is an intercompatible suite of open-source software components for storing, managing, analyzing, and visualizing genome-scale data. GMOD includes many widely-used software components: GBrowse and JBrowse, both genome viewers; GBrowse_syn, a comparative genomics viewer; Chado, a generic and modular database schema; CMap, a comparative map viewer; as well as many other components including Apollo, MAKER, BioMart, InterMine, and Galaxy. We hope to extend the functionality of existing GMOD components, and integrate new components as well. About Hackathons: A hackathon is an intense event at which a group of programmers with different backgrounds and skills collaborate hands-on and face-to-face to develop working code that is of utility to the community as a whole. The mix of people will include domain experts and computer-savvy end-users. More details about the event, its motivation, organization, procedures, and attendees, as well as URLs to the hackathon and related websites are included below. Sincerely, The GMOD EvoHack Organizing Committee (and project affiliations as relevant): Nicole Washington, Chair (LBNL, modENCODE, Phenote) Robert Buels (SGN, Chado NatDiv) Scott Cain (OICR, GMOD) Dave Clements (NESCent, GMOD) Hilmar Lapp (NESCent, Phenoscape, Chado NatDiv) Sheldon McKay (University of Arizona, iPlant, GBrowse_syn) ----------------------------- About the GMOD Evo Hackathon Overview We are organizing a hackathon to fill critical gaps in the capabilities of the Generic Model Organism Database (GMOD) toolbox that currently limit its utility for evolutionary research. Specifically, we will focus on tools for 1) viewing comparative genomics data; 2) visualizing phylogenomic data; and 3) supporting population diversity data and phenotype annotation. The event will be hosted at NESCent and bring together a group of about 20+ software developers, end-user representatives, and documentation experts who would otherwise not meet. The participants will include key developers of GMOD components that currently lack features critical for emerging evolutionary biology research, developers of informatics tools in evolutionary research that lack GMOD integration, and informatics-savvy biologists who can represent end-user requirements. The event will provide a unique opportunity to infuse the GMOD developer community with a heightened awareness of unmet needs in evolutionary biology that GMOD components have the potential to fill, and for tool developers in evolutionary biology to better understand how best to extend or integrate with already existing GMOD components. Before the Event Discussion of ideas and sometimes even design actually starts well before the hackathon, on mailing lists, wiki pages, and conference calls set up among accepted attendees. This advance work lays the foundation for participants to be productive from the very first day. This also means that participants should be willing to contribute some time in advance of the hackathon itself to participate in this preparatory discussion. During the Event Typically, hackathon participants use the morning of the first day of the event to organize themselves into working groups of between 3 and 6 people, each with a focused implementation objective. Ideas and objectives are discussed, and attendees coalesce around the projects in which they have the most experience or interest. Deliverables / Event Results The meeting's attendance, working groups, and outcomes will be fully logged and documented on the GMOD wiki (http://gmod.org). Each working group during the event will typically have its own wiki page, linked from the main EvoHack page, where it documents its minutes and design notes, and provides links to the code and documentation it produces. Also, since GMOD and NESCent are both committed to open source principles, all code and documentation produced by participants during the hackathon must be published under an OSI-approved open source license. As contributions to existing GMOD tools, all hackathon products will most likely satisfy this requirement automatically. NESCent This event is sponsored by the US National Evolutionary Synthesis Center (NESCent, http://www.nescent.org) through its Informatics Whitepapers program (http://www.nescent.org/informatics/whitepapers.php). NESCent promotes the synthesis of information, concepts and knowledge to address significant, emerging, or novel questions in evolutionary science and its applications. NESCent achieves this by supporting research and education across disciplinary, institutional, geographic, and demographic boundaries (see http://www.nescent.org/science/proposals.php). Links Main GMOD EvoHack page, and full proposal: http://gmod.org/wiki/GMOD_Evo_Hackathon NESCent: http://www.nescent.org/ GMOD: http://gmod.org Similar past NESCent events, see: http://hackathon.nescent.org/ GMOD hackathon application: http://bit.ly/gmodevohack -- http://gmod.org/wiki/GMOD_News http://gmod.org/wiki/GMOD_Europe_2010 http://gmod.org/wiki/Help_Desk_Feedback From crackeur at comcast.net Mon Aug 16 21:49:29 2010 From: crackeur at comcast.net (Jimmy Zhang) Date: Mon, 16 Aug 2010 18:49:29 -0700 Subject: [BioSQL-l] [ANN]VTD-XML 2.9 In-Reply-To: <4C55C83A.3060700@cornell.edu> References: <4C55C83A.3060700@cornell.edu> Message-ID: <257BAC75A5844DF5ADF581B97575D970@JimmyZhangPC> VTD-XML 2.9, the next generation XML Processing API for SOA and Cloud computing, has been released. Please visit https://sourceforge.net/projects/vtd-xml/files/ to download the latest version. * Strict Conformance #VTD-XML now fully conforms to XML namespace 1.0 spec * Performance Improvement #Significantly improved parsing performance for small XML files * Expand Core VTD-XML API #Adds getPrefixString(), and toNormalizedString2() * Cutting/Splitting #Adds getSiblingElementFragment() * A number of bug fixes and code enhancement including: #Fixes a bug for reading very large XML documents on some platforms #Fixes a bug in parsing processing instruction #Fixes a bug in outputAndReparse() From rmb32 at cornell.edu Thu Aug 19 13:09:45 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 19 Aug 2010 10:09:45 -0700 Subject: [BioSQL-l] reminder: Aug 25 deadline for GMOD Hackathon application Message-ID: <4C6D6559.3080809@cornell.edu> Hi all, This is your one-week reminder: the deadline for open applications to the GMOD Evo hackathon is Wednesday, August 25th. Rob ======================================== We are seeking participants for the GMOD Tools for Evolutionary Biology Hackathon, held November 8-12, 2010 at the US National Evolutionary Synthesis Center (NESCent) in Durham, NC. This hackathon targets three critical gaps in the capabilities of the GMOD toolbox that currently limit its utility for evolutionary research: 1. Visualization of comparative genomics data 2. Visualization of phylogenetic data and trees 3. Support for population diversity and phenotype data If you are interested in these areas and have relevant expertise, you are strongly encouraged to apply. Relevant areas of expertise include more than just software development: if you are a GMOD power user, visualization guru, domain expert (comparative, phylogenetics, population, ...), or documentation wizard, then your skills are needed! How To Apply: Fill out the online application form at http://bit.ly/gmodevohack. Applications are due August 25. About GMOD: GMOD is an intercompatible suite of open-source software components for storing, managing, analyzing, and visualizing genome-scale data. GMOD includes many widely-used software components: GBrowse and JBrowse, both genome viewers; GBrowse_syn, a comparative genomics viewer; Chado, a generic and modular database schema; CMap, a comparative map viewer; as well as many other components including Apollo, MAKER, BioMart, InterMine, and Galaxy. We hope to extend the functionality of existing GMOD components, and integrate new components as well. About Hackathons: A hackathon is an intense event at which a group of programmers with different backgrounds and skills collaborate hands-on and face-to-face to develop working code that is of utility to the community as a whole. The mix of people will include domain experts and computer-savvy end-users. More details about the event, its motivation, organization, procedures, and attendees, as well as URLs to the hackathon and related websites are included below. Sincerely, The GMOD EvoHack Organizing Committee (and project affiliations as relevant): Nicole Washington, Chair (LBNL, modENCODE, Phenote) Robert Buels (SGN, Chado NatDiv) Scott Cain (OICR, GMOD) Dave Clements (NESCent, GMOD) Hilmar Lapp (NESCent, Phenoscape, Chado NatDiv) Sheldon McKay (University of Arizona, iPlant, GBrowse_syn) ----------------------------- About the GMOD Evo Hackathon Overview We are organizing a hackathon to fill critical gaps in the capabilities of the Generic Model Organism Database (GMOD) toolbox that currently limit its utility for evolutionary research. Specifically, we will focus on tools for 1) viewing comparative genomics data; 2) visualizing phylogenomic data; and 3) supporting population diversity data and phenotype annotation. The event will be hosted at NESCent and bring together a group of about 20+ software developers, end-user representatives, and documentation experts who would otherwise not meet. The participants will include key developers of GMOD components that currently lack features critical for emerging evolutionary biology research, developers of informatics tools in evolutionary research that lack GMOD integration, and informatics-savvy biologists who can represent end-user requirements. The event will provide a unique opportunity to infuse the GMOD developer community with a heightened awareness of unmet needs in evolutionary biology that GMOD components have the potential to fill, and for tool developers in evolutionary biology to better understand how best to extend or integrate with already existing GMOD components. Before the Event Discussion of ideas and sometimes even design actually starts well before the hackathon, on mailing lists, wiki pages, and conference calls set up among accepted attendees. This advance work lays the foundation for participants to be productive from the very first day. This also means that participants should be willing to contribute some time in advance of the hackathon itself to participate in this preparatory discussion. During the Event Typically, hackathon participants use the morning of the first day of the event to organize themselves into working groups of between 3 and 6 people, each with a focused implementation objective. Ideas and objectives are discussed, and attendees coalesce around the projects in which they have the most experience or interest. Deliverables / Event Results The meeting's attendance, working groups, and outcomes will be fully logged and documented on the GMOD wiki (http://gmod.org). Each working group during the event will typically have its own wiki page, linked from the main EvoHack page, where it documents its minutes and design notes, and provides links to the code and documentation it produces. Also, since GMOD and NESCent are both committed to open source principles, all code and documentation produced by participants during the hackathon must be published under an OSI-approved open source license. As contributions to existing GMOD tools, all hackathon products will most likely satisfy this requirement automatically. NESCent This event is sponsored by the US National Evolutionary Synthesis Center (NESCent, http://www.nescent.org) through its Informatics Whitepapers program (http://www.nescent.org/informatics/whitepapers.php). NESCent promotes the synthesis of information, concepts and knowledge to address significant, emerging, or novel questions in evolutionary science and its applications. NESCent achieves this by supporting research and education across disciplinary, institutional, geographic, and demographic boundaries (see http://www.nescent.org/science/proposals.php). Links Main GMOD EvoHack page, and full proposal: http://gmod.org/wiki/GMOD_Evo_Hackathon NESCent: http://www.nescent.org/ GMOD: http://gmod.org Similar past NESCent events, see: http://hackathon.nescent.org/ GMOD hackathon application: http://bit.ly/gmodevohack -- http://gmod.org/wiki/GMOD_News http://gmod.org/wiki/GMOD_Europe_2010 http://gmod.org/wiki/Help_Desk_Feedback From mmuratet at hudsonalpha.org Mon Aug 23 14:43:28 2010 From: mmuratet at hudsonalpha.org (Michael Muratet) Date: Mon, 23 Aug 2010 13:43:28 -0500 Subject: [BioSQL-l] Getting gene name, function etc. from biosql Message-ID: <803C0F6C-FD55-4AFE-9B7F-A0A749295E70@hudsonalpha.org> Greetings I am working on assembling gene CDS sequences on a medium scale, e.g., for all S. aureus strains, and I'm trying to find a way to get gene names from biosql entries I created from Genbank files with load_seqdatabase.pl. I'm using a query like this: SELECT c.seqfeature_id, b.strand, SUBSTR(a.seq, b.start_pos, b.end_pos- b.start_pos+1) as seq FROM biosequence a JOIN seqfeature c ON (a.bioentry_id=c.bioentry_id) JOIN location b ON (b.seqfeature_id=c.seqfeature_id) WHERE c.type_term_id=12 AND c.bioentry_id=221 This seems to work OK to get the sequence with the provision that one needs to reverse complement the sequence if the strand is minus. But I don't see anything in the schema that will allow me to identify the gene name or product from the seqfeature_id. Is gene name or product in the schema somewhere and I've missed it? Thanks Mike Michael Muratet, Ph.D. Senior Scientist HudsonAlpha Institute for Biotechnology mmuratet at hudsonalpha.org (256) 327-0473 (p) (256) 327-0966 (f) Room 4005 601 Genome Way Huntsville, Alabama 35806 From mmuratet at hudsonalpha.org Mon Aug 23 15:20:03 2010 From: mmuratet at hudsonalpha.org (Michael Muratet) Date: Mon, 23 Aug 2010 14:20:03 -0500 Subject: [BioSQL-l] Getting gene name, function etc. from biosql In-Reply-To: <4C72C744.7090501@bham.ac.uk> References: <803C0F6C-FD55-4AFE-9B7F-A0A749295E70@hudsonalpha.org> <4C72C744.7090501@bham.ac.uk> Message-ID: On Aug 23, 2010, at 2:08 PM, Nick Loman wrote: > Hi Michael > > You need a join on seqfeature_qualifier_value to get this detail. > This table stores feature qualifiers as key/value pairs, with the > corresponding key name ('name', 'product', etc.) belonging to the > relation 'term', so you'll need to join on that too. Hi Nick Yes, that does the trick. I knew it would be something simple ;-) Thanks Mike > > HTH > > Cheers > > Nick > > > Michael Muratet wrote: >> Greetings >> >> I am working on assembling gene CDS sequences on a medium scale, >> e.g., for all S. aureus strains, and I'm trying to find a way to >> get gene names from biosql entries I created from Genbank files >> with load_seqdatabase.pl. I'm using a query like this: >> >> SELECT >> c.seqfeature_id, b.strand, SUBSTR(a.seq, b.start_pos, >> b.end_pos- b.start_pos+1) as seq >> FROM >> biosequence a >> JOIN >> seqfeature c >> ON (a.bioentry_id=c.bioentry_id) >> JOIN >> location b >> ON (b.seqfeature_id=c.seqfeature_id) >> WHERE >> c.type_term_id=12 >> AND >> c.bioentry_id=221 >> >> This seems to work OK to get the sequence with the provision that >> one needs to reverse complement the sequence if the strand is minus. >> >> But I don't see anything in the schema that will allow me to >> identify the gene name or product from the seqfeature_id. >> >> Is gene name or product in the schema somewhere and I've missed it? >> >> Thanks >> >> Mike >> >> >> Michael Muratet, Ph.D. >> Senior Scientist >> HudsonAlpha Institute for Biotechnology >> mmuratet at hudsonalpha.org >> (256) 327-0473 (p) >> (256) 327-0966 (f) >> >> Room 4005 >> 601 Genome Way >> Huntsville, Alabama 35806 >> >> >> >> >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l >> > Michael Muratet, Ph.D. Senior Scientist HudsonAlpha Institute for Biotechnology mmuratet at hudsonalpha.org (256) 327-0473 (p) (256) 327-0966 (f) Room 4005 601 Genome Way Huntsville, Alabama 35806 From n.j.loman at bham.ac.uk Mon Aug 23 15:08:52 2010 From: n.j.loman at bham.ac.uk (Nick Loman) Date: Mon, 23 Aug 2010 20:08:52 +0100 Subject: [BioSQL-l] Getting gene name, function etc. from biosql In-Reply-To: <803C0F6C-FD55-4AFE-9B7F-A0A749295E70@hudsonalpha.org> References: <803C0F6C-FD55-4AFE-9B7F-A0A749295E70@hudsonalpha.org> Message-ID: <4C72C744.7090501@bham.ac.uk> Hi Michael You need a join on seqfeature_qualifier_value to get this detail. This table stores feature qualifiers as key/value pairs, with the corresponding key name ('name', 'product', etc.) belonging to the relation 'term', so you'll need to join on that too. HTH Cheers Nick Michael Muratet wrote: > Greetings > > I am working on assembling gene CDS sequences on a medium scale, e.g., > for all S. aureus strains, and I'm trying to find a way to get gene > names from biosql entries I created from Genbank files with > load_seqdatabase.pl. I'm using a query like this: > > SELECT > c.seqfeature_id, b.strand, SUBSTR(a.seq, b.start_pos, b.end_pos- > b.start_pos+1) as seq > FROM > biosequence a > JOIN > seqfeature c > ON (a.bioentry_id=c.bioentry_id) > JOIN > location b > ON (b.seqfeature_id=c.seqfeature_id) > WHERE > c.type_term_id=12 > AND > c.bioentry_id=221 > > This seems to work OK to get the sequence with the provision that one > needs to reverse complement the sequence if the strand is minus. > > But I don't see anything in the schema that will allow me to identify > the gene name or product from the seqfeature_id. > > Is gene name or product in the schema somewhere and I've missed it? > > Thanks > > Mike > > > Michael Muratet, Ph.D. > Senior Scientist > HudsonAlpha Institute for Biotechnology > mmuratet at hudsonalpha.org > (256) 327-0473 (p) > (256) 327-0966 (f) > > Room 4005 > 601 Genome Way > Huntsville, Alabama 35806 > > > > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > From hlapp at drycafe.net Tue Aug 24 22:47:44 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Tue, 24 Aug 2010 22:47:44 -0400 Subject: [BioSQL-l] Getting gene name, function etc. from biosql In-Reply-To: <4C72C744.7090501@bham.ac.uk> References: <803C0F6C-FD55-4AFE-9B7F-A0A749295E70@hudsonalpha.org> <4C72C744.7090501@bham.ac.uk> Message-ID: Yep - thanks for the helping out! -hilmar On Aug 23, 2010, at 3:08 PM, Nick Loman wrote: > Hi Michael > > You need a join on seqfeature_qualifier_value to get this detail. > This table stores feature qualifiers as key/value pairs, with the > corresponding key name ('name', 'product', etc.) belonging to the > relation 'term', so you'll need to join on that too. > > HTH > > Cheers > > Nick > > > Michael Muratet wrote: >> Greetings >> >> I am working on assembling gene CDS sequences on a medium scale, >> e.g., for all S. aureus strains, and I'm trying to find a way to >> get gene names from biosql entries I created from Genbank files >> with load_seqdatabase.pl. I'm using a query like this: >> >> SELECT >> c.seqfeature_id, b.strand, SUBSTR(a.seq, b.start_pos, >> b.end_pos- b.start_pos+1) as seq >> FROM >> biosequence a >> JOIN >> seqfeature c >> ON (a.bioentry_id=c.bioentry_id) >> JOIN >> location b >> ON (b.seqfeature_id=c.seqfeature_id) >> WHERE >> c.type_term_id=12 >> AND >> c.bioentry_id=221 >> >> This seems to work OK to get the sequence with the provision that >> one needs to reverse complement the sequence if the strand is minus. >> >> But I don't see anything in the schema that will allow me to >> identify the gene name or product from the seqfeature_id. >> >> Is gene name or product in the schema somewhere and I've missed it? >> >> Thanks >> >> Mike >> >> >> Michael Muratet, Ph.D. >> Senior Scientist >> HudsonAlpha Institute for Biotechnology >> mmuratet at hudsonalpha.org >> (256) 327-0473 (p) >> (256) 327-0966 (f) >> >> Room 4005 >> 601 Genome Way >> Huntsville, Alabama 35806 >> >> >> >> >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l >> > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From xupeng86 at gmail.com Tue Aug 24 23:13:04 2010 From: xupeng86 at gmail.com (=?GB2312?B?0OzF8w==?=) Date: Wed, 25 Aug 2010 11:13:04 +0800 Subject: [BioSQL-l] BioSQL-l Digest, Vol 76, Issue 5 In-Reply-To: References: Message-ID: Hi, everybody. I'm trying to split the NCBI COG flat files into mysql database. Anyone knows if there's already a universal schema that Bioperl can easily cope with ? Thanks. From hlapp at drycafe.net Tue Aug 24 23:15:18 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Tue, 24 Aug 2010 23:15:18 -0400 Subject: [BioSQL-l] BioSQL-l Digest, Vol 76, Issue 5 In-Reply-To: References: Message-ID: <4003E289-CBA6-405F-A1BA-505E718511B0@drycafe.net> BioSQL. Which is presumably why you posted here, right? -hilmar On Aug 24, 2010, at 11:13 PM, ?? wrote: > Hi, everybody. > I'm trying to split the NCBI COG flat files into mysql database. > Anyone knows if there's already a universal schema that Bioperl can > easily cope with ? > Thanks. > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From biopython at maubp.freeserve.co.uk Sun Aug 1 10:01:37 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 1 Aug 2010 11:01:37 +0100 Subject: [BioSQL-l] migration to github In-Reply-To: <04BBA390-6BC0-4700-8B14-812F6E2E4705@illinois.edu> References: <22BC0098-7BEB-41E3-9EE6-D8987323CC24@drycafe.net> <04BBA390-6BC0-4700-8B14-812F6E2E4705@illinois.edu> Message-ID: On Sun, Aug 1, 2010 at 12:15 AM, Chris Fields wrote: > > On Jul 30, 2010, at 3:17 AM, Peter wrote: > >> On Fri, Jul 30, 2010 at 12:08 AM, Hilmar Lapp wrote: >>> >>> >>> Finally, does anyone have a strong feeling about the capitalization of >>> BioSQL on Github? All lowercase (github.com/biosql) or capitalized >>> (github.com/BioSQL)? >> >> Personally I'd pick lowercase - it seems more commonly used >> for repositories and usernames in general. In our case it also >> avoided Biopython vs BioPython confusion. Curiously most but >> not all of the BioPerl repositories are in lowercase... >> >> Peter > > Okay, organization and repo name are both now 'biosql'. ?No take-backs! Thanks for sorting this out :) > Re: upper case with bioperl repos, do you mean the Bio-* ones? >?The emphasis there that (1) they aren't part of bioperl core but are > still part of the Bio namespace, and (2) the dist will match the actual > namespace and the module name (Bio::FeatureIO, for instance), > unlike BioPerl and the others, and (3) there is some precedent > (Bio::Graphics being one). ?This simple thing makes it a lot easier > for keeping track of names, and the module name can be used for > CPAN installation, indexing, and documentation. Not being familiar with the specifics it just looked inconsistent, but it sounds like there is a rational and practical scheme in place. Thanks for explaining things. Regards, Peter From rmb32 at cornell.edu Sun Aug 1 19:17:14 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Sun, 01 Aug 2010 12:17:14 -0700 Subject: [BioSQL-l] GMOD Evo Hackathon Open Call for Participation Message-ID: <4C55C83A.3060700@cornell.edu> We are seeking participants for the GMOD Tools for Evolutionary Biology Hackathon, held November 8-12, 2010 at the US National Evolutionary Synthesis Center (NESCent) in Durham, NC. This hackathon targets three critical gaps in the capabilities of the GMOD toolbox that currently limit its utility for evolutionary research: 1. Visualization of comparative genomics data 2. Visualization of phylogenetic data and trees 3. Support for population diversity and phenotype data If you are interested in these areas and have relevant expertise, you are strongly encouraged to apply. Relevant areas of expertise include more than just software development: if you are a GMOD power user, visualization guru, domain expert (comparative, phylogenetics, population, ...), or documentation wizard, then your skills are needed! How To Apply: Fill out the online application form at http://bit.ly/gmodevohack. Applications are due August 25. About GMOD: GMOD is an intercompatible suite of open-source software components for storing, managing, analyzing, and visualizing genome-scale data. GMOD includes many widely-used software components: GBrowse and JBrowse, both genome viewers; GBrowse_syn, a comparative genomics viewer; Chado, a generic and modular database schema; CMap, a comparative map viewer; as well as many other components including Apollo, MAKER, BioMart, InterMine, and Galaxy. We hope to extend the functionality of existing GMOD components, and integrate new components as well. About Hackathons: A hackathon is an intense event at which a group of programmers with different backgrounds and skills collaborate hands-on and face-to-face to develop working code that is of utility to the community as a whole. The mix of people will include domain experts and computer-savvy end-users. More details about the event, its motivation, organization, procedures, and attendees, as well as URLs to the hackathon and related websites are included below. Sincerely, The GMOD EvoHack Organizing Committee (and project affiliations as relevant): Nicole Washington, Chair (LBNL, modENCODE, Phenote) Robert Buels (SGN, Chado NatDiv) Scott Cain (OICR, GMOD) Dave Clements (NESCent, GMOD) Hilmar Lapp (NESCent, Phenoscape, Chado NatDiv) Sheldon McKay (University of Arizona, iPlant, GBrowse_syn) ----------------------------- About the GMOD Evo Hackathon Overview We are organizing a hackathon to fill critical gaps in the capabilities of the Generic Model Organism Database (GMOD) toolbox that currently limit its utility for evolutionary research. Specifically, we will focus on tools for 1) viewing comparative genomics data; 2) visualizing phylogenomic data; and 3) supporting population diversity data and phenotype annotation. The event will be hosted at NESCent and bring together a group of about 20+ software developers, end-user representatives, and documentation experts who would otherwise not meet. The participants will include key developers of GMOD components that currently lack features critical for emerging evolutionary biology research, developers of informatics tools in evolutionary research that lack GMOD integration, and informatics-savvy biologists who can represent end-user requirements. The event will provide a unique opportunity to infuse the GMOD developer community with a heightened awareness of unmet needs in evolutionary biology that GMOD components have the potential to fill, and for tool developers in evolutionary biology to better understand how best to extend or integrate with already existing GMOD components. Before the Event Discussion of ideas and sometimes even design actually starts well before the hackathon, on mailing lists, wiki pages, and conference calls set up among accepted attendees. This advance work lays the foundation for participants to be productive from the very first day. This also means that participants should be willing to contribute some time in advance of the hackathon itself to participate in this preparatory discussion. During the Event Typically, hackathon participants use the morning of the first day of the event to organize themselves into working groups of between 3 and 6 people, each with a focused implementation objective. Ideas and objectives are discussed, and attendees coalesce around the projects in which they have the most experience or interest. Deliverables / Event Results The meeting's attendance, working groups, and outcomes will be fully logged and documented on the GMOD wiki (http://gmod.org). Each working group during the event will typically have its own wiki page, linked from the main EvoHack page, where it documents its minutes and design notes, and provides links to the code and documentation it produces. Also, since GMOD and NESCent are both committed to open source principles, all code and documentation produced by participants during the hackathon must be published under an OSI-approved open source license. As contributions to existing GMOD tools, all hackathon products will most likely satisfy this requirement automatically. NESCent This event is sponsored by the US National Evolutionary Synthesis Center (NESCent, http://www.nescent.org) through its Informatics Whitepapers program (http://www.nescent.org/informatics/whitepapers.php). NESCent promotes the synthesis of information, concepts and knowledge to address significant, emerging, or novel questions in evolutionary science and its applications. NESCent achieves this by supporting research and education across disciplinary, institutional, geographic, and demographic boundaries (see http://www.nescent.org/science/proposals.php). Links Main GMOD EvoHack page, and full proposal: http://gmod.org/wiki/GMOD_Evo_Hackathon NESCent: http://www.nescent.org/ GMOD: http://gmod.org Similar past NESCent events, see: http://hackathon.nescent.org/ GMOD hackathon application: http://bit.ly/gmodevohack -- http://gmod.org/wiki/GMOD_News http://gmod.org/wiki/GMOD_Europe_2010 http://gmod.org/wiki/Help_Desk_Feedback From crackeur at comcast.net Tue Aug 17 01:49:29 2010 From: crackeur at comcast.net (Jimmy Zhang) Date: Mon, 16 Aug 2010 18:49:29 -0700 Subject: [BioSQL-l] [ANN]VTD-XML 2.9 In-Reply-To: <4C55C83A.3060700@cornell.edu> References: <4C55C83A.3060700@cornell.edu> Message-ID: <257BAC75A5844DF5ADF581B97575D970@JimmyZhangPC> VTD-XML 2.9, the next generation XML Processing API for SOA and Cloud computing, has been released. Please visit https://sourceforge.net/projects/vtd-xml/files/ to download the latest version. * Strict Conformance #VTD-XML now fully conforms to XML namespace 1.0 spec * Performance Improvement #Significantly improved parsing performance for small XML files * Expand Core VTD-XML API #Adds getPrefixString(), and toNormalizedString2() * Cutting/Splitting #Adds getSiblingElementFragment() * A number of bug fixes and code enhancement including: #Fixes a bug for reading very large XML documents on some platforms #Fixes a bug in parsing processing instruction #Fixes a bug in outputAndReparse() From rmb32 at cornell.edu Thu Aug 19 17:09:45 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 19 Aug 2010 10:09:45 -0700 Subject: [BioSQL-l] reminder: Aug 25 deadline for GMOD Hackathon application Message-ID: <4C6D6559.3080809@cornell.edu> Hi all, This is your one-week reminder: the deadline for open applications to the GMOD Evo hackathon is Wednesday, August 25th. Rob ======================================== We are seeking participants for the GMOD Tools for Evolutionary Biology Hackathon, held November 8-12, 2010 at the US National Evolutionary Synthesis Center (NESCent) in Durham, NC. This hackathon targets three critical gaps in the capabilities of the GMOD toolbox that currently limit its utility for evolutionary research: 1. Visualization of comparative genomics data 2. Visualization of phylogenetic data and trees 3. Support for population diversity and phenotype data If you are interested in these areas and have relevant expertise, you are strongly encouraged to apply. Relevant areas of expertise include more than just software development: if you are a GMOD power user, visualization guru, domain expert (comparative, phylogenetics, population, ...), or documentation wizard, then your skills are needed! How To Apply: Fill out the online application form at http://bit.ly/gmodevohack. Applications are due August 25. About GMOD: GMOD is an intercompatible suite of open-source software components for storing, managing, analyzing, and visualizing genome-scale data. GMOD includes many widely-used software components: GBrowse and JBrowse, both genome viewers; GBrowse_syn, a comparative genomics viewer; Chado, a generic and modular database schema; CMap, a comparative map viewer; as well as many other components including Apollo, MAKER, BioMart, InterMine, and Galaxy. We hope to extend the functionality of existing GMOD components, and integrate new components as well. About Hackathons: A hackathon is an intense event at which a group of programmers with different backgrounds and skills collaborate hands-on and face-to-face to develop working code that is of utility to the community as a whole. The mix of people will include domain experts and computer-savvy end-users. More details about the event, its motivation, organization, procedures, and attendees, as well as URLs to the hackathon and related websites are included below. Sincerely, The GMOD EvoHack Organizing Committee (and project affiliations as relevant): Nicole Washington, Chair (LBNL, modENCODE, Phenote) Robert Buels (SGN, Chado NatDiv) Scott Cain (OICR, GMOD) Dave Clements (NESCent, GMOD) Hilmar Lapp (NESCent, Phenoscape, Chado NatDiv) Sheldon McKay (University of Arizona, iPlant, GBrowse_syn) ----------------------------- About the GMOD Evo Hackathon Overview We are organizing a hackathon to fill critical gaps in the capabilities of the Generic Model Organism Database (GMOD) toolbox that currently limit its utility for evolutionary research. Specifically, we will focus on tools for 1) viewing comparative genomics data; 2) visualizing phylogenomic data; and 3) supporting population diversity data and phenotype annotation. The event will be hosted at NESCent and bring together a group of about 20+ software developers, end-user representatives, and documentation experts who would otherwise not meet. The participants will include key developers of GMOD components that currently lack features critical for emerging evolutionary biology research, developers of informatics tools in evolutionary research that lack GMOD integration, and informatics-savvy biologists who can represent end-user requirements. The event will provide a unique opportunity to infuse the GMOD developer community with a heightened awareness of unmet needs in evolutionary biology that GMOD components have the potential to fill, and for tool developers in evolutionary biology to better understand how best to extend or integrate with already existing GMOD components. Before the Event Discussion of ideas and sometimes even design actually starts well before the hackathon, on mailing lists, wiki pages, and conference calls set up among accepted attendees. This advance work lays the foundation for participants to be productive from the very first day. This also means that participants should be willing to contribute some time in advance of the hackathon itself to participate in this preparatory discussion. During the Event Typically, hackathon participants use the morning of the first day of the event to organize themselves into working groups of between 3 and 6 people, each with a focused implementation objective. Ideas and objectives are discussed, and attendees coalesce around the projects in which they have the most experience or interest. Deliverables / Event Results The meeting's attendance, working groups, and outcomes will be fully logged and documented on the GMOD wiki (http://gmod.org). Each working group during the event will typically have its own wiki page, linked from the main EvoHack page, where it documents its minutes and design notes, and provides links to the code and documentation it produces. Also, since GMOD and NESCent are both committed to open source principles, all code and documentation produced by participants during the hackathon must be published under an OSI-approved open source license. As contributions to existing GMOD tools, all hackathon products will most likely satisfy this requirement automatically. NESCent This event is sponsored by the US National Evolutionary Synthesis Center (NESCent, http://www.nescent.org) through its Informatics Whitepapers program (http://www.nescent.org/informatics/whitepapers.php). NESCent promotes the synthesis of information, concepts and knowledge to address significant, emerging, or novel questions in evolutionary science and its applications. NESCent achieves this by supporting research and education across disciplinary, institutional, geographic, and demographic boundaries (see http://www.nescent.org/science/proposals.php). Links Main GMOD EvoHack page, and full proposal: http://gmod.org/wiki/GMOD_Evo_Hackathon NESCent: http://www.nescent.org/ GMOD: http://gmod.org Similar past NESCent events, see: http://hackathon.nescent.org/ GMOD hackathon application: http://bit.ly/gmodevohack -- http://gmod.org/wiki/GMOD_News http://gmod.org/wiki/GMOD_Europe_2010 http://gmod.org/wiki/Help_Desk_Feedback From mmuratet at hudsonalpha.org Mon Aug 23 18:43:28 2010 From: mmuratet at hudsonalpha.org (Michael Muratet) Date: Mon, 23 Aug 2010 13:43:28 -0500 Subject: [BioSQL-l] Getting gene name, function etc. from biosql Message-ID: <803C0F6C-FD55-4AFE-9B7F-A0A749295E70@hudsonalpha.org> Greetings I am working on assembling gene CDS sequences on a medium scale, e.g., for all S. aureus strains, and I'm trying to find a way to get gene names from biosql entries I created from Genbank files with load_seqdatabase.pl. I'm using a query like this: SELECT c.seqfeature_id, b.strand, SUBSTR(a.seq, b.start_pos, b.end_pos- b.start_pos+1) as seq FROM biosequence a JOIN seqfeature c ON (a.bioentry_id=c.bioentry_id) JOIN location b ON (b.seqfeature_id=c.seqfeature_id) WHERE c.type_term_id=12 AND c.bioentry_id=221 This seems to work OK to get the sequence with the provision that one needs to reverse complement the sequence if the strand is minus. But I don't see anything in the schema that will allow me to identify the gene name or product from the seqfeature_id. Is gene name or product in the schema somewhere and I've missed it? Thanks Mike Michael Muratet, Ph.D. Senior Scientist HudsonAlpha Institute for Biotechnology mmuratet at hudsonalpha.org (256) 327-0473 (p) (256) 327-0966 (f) Room 4005 601 Genome Way Huntsville, Alabama 35806 From mmuratet at hudsonalpha.org Mon Aug 23 19:20:03 2010 From: mmuratet at hudsonalpha.org (Michael Muratet) Date: Mon, 23 Aug 2010 14:20:03 -0500 Subject: [BioSQL-l] Getting gene name, function etc. from biosql In-Reply-To: <4C72C744.7090501@bham.ac.uk> References: <803C0F6C-FD55-4AFE-9B7F-A0A749295E70@hudsonalpha.org> <4C72C744.7090501@bham.ac.uk> Message-ID: On Aug 23, 2010, at 2:08 PM, Nick Loman wrote: > Hi Michael > > You need a join on seqfeature_qualifier_value to get this detail. > This table stores feature qualifiers as key/value pairs, with the > corresponding key name ('name', 'product', etc.) belonging to the > relation 'term', so you'll need to join on that too. Hi Nick Yes, that does the trick. I knew it would be something simple ;-) Thanks Mike > > HTH > > Cheers > > Nick > > > Michael Muratet wrote: >> Greetings >> >> I am working on assembling gene CDS sequences on a medium scale, >> e.g., for all S. aureus strains, and I'm trying to find a way to >> get gene names from biosql entries I created from Genbank files >> with load_seqdatabase.pl. I'm using a query like this: >> >> SELECT >> c.seqfeature_id, b.strand, SUBSTR(a.seq, b.start_pos, >> b.end_pos- b.start_pos+1) as seq >> FROM >> biosequence a >> JOIN >> seqfeature c >> ON (a.bioentry_id=c.bioentry_id) >> JOIN >> location b >> ON (b.seqfeature_id=c.seqfeature_id) >> WHERE >> c.type_term_id=12 >> AND >> c.bioentry_id=221 >> >> This seems to work OK to get the sequence with the provision that >> one needs to reverse complement the sequence if the strand is minus. >> >> But I don't see anything in the schema that will allow me to >> identify the gene name or product from the seqfeature_id. >> >> Is gene name or product in the schema somewhere and I've missed it? >> >> Thanks >> >> Mike >> >> >> Michael Muratet, Ph.D. >> Senior Scientist >> HudsonAlpha Institute for Biotechnology >> mmuratet at hudsonalpha.org >> (256) 327-0473 (p) >> (256) 327-0966 (f) >> >> Room 4005 >> 601 Genome Way >> Huntsville, Alabama 35806 >> >> >> >> >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l >> > Michael Muratet, Ph.D. Senior Scientist HudsonAlpha Institute for Biotechnology mmuratet at hudsonalpha.org (256) 327-0473 (p) (256) 327-0966 (f) Room 4005 601 Genome Way Huntsville, Alabama 35806 From n.j.loman at bham.ac.uk Mon Aug 23 19:08:52 2010 From: n.j.loman at bham.ac.uk (Nick Loman) Date: Mon, 23 Aug 2010 20:08:52 +0100 Subject: [BioSQL-l] Getting gene name, function etc. from biosql In-Reply-To: <803C0F6C-FD55-4AFE-9B7F-A0A749295E70@hudsonalpha.org> References: <803C0F6C-FD55-4AFE-9B7F-A0A749295E70@hudsonalpha.org> Message-ID: <4C72C744.7090501@bham.ac.uk> Hi Michael You need a join on seqfeature_qualifier_value to get this detail. This table stores feature qualifiers as key/value pairs, with the corresponding key name ('name', 'product', etc.) belonging to the relation 'term', so you'll need to join on that too. HTH Cheers Nick Michael Muratet wrote: > Greetings > > I am working on assembling gene CDS sequences on a medium scale, e.g., > for all S. aureus strains, and I'm trying to find a way to get gene > names from biosql entries I created from Genbank files with > load_seqdatabase.pl. I'm using a query like this: > > SELECT > c.seqfeature_id, b.strand, SUBSTR(a.seq, b.start_pos, b.end_pos- > b.start_pos+1) as seq > FROM > biosequence a > JOIN > seqfeature c > ON (a.bioentry_id=c.bioentry_id) > JOIN > location b > ON (b.seqfeature_id=c.seqfeature_id) > WHERE > c.type_term_id=12 > AND > c.bioentry_id=221 > > This seems to work OK to get the sequence with the provision that one > needs to reverse complement the sequence if the strand is minus. > > But I don't see anything in the schema that will allow me to identify > the gene name or product from the seqfeature_id. > > Is gene name or product in the schema somewhere and I've missed it? > > Thanks > > Mike > > > Michael Muratet, Ph.D. > Senior Scientist > HudsonAlpha Institute for Biotechnology > mmuratet at hudsonalpha.org > (256) 327-0473 (p) > (256) 327-0966 (f) > > Room 4005 > 601 Genome Way > Huntsville, Alabama 35806 > > > > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > From hlapp at drycafe.net Wed Aug 25 02:47:44 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Tue, 24 Aug 2010 22:47:44 -0400 Subject: [BioSQL-l] Getting gene name, function etc. from biosql In-Reply-To: <4C72C744.7090501@bham.ac.uk> References: <803C0F6C-FD55-4AFE-9B7F-A0A749295E70@hudsonalpha.org> <4C72C744.7090501@bham.ac.uk> Message-ID: Yep - thanks for the helping out! -hilmar On Aug 23, 2010, at 3:08 PM, Nick Loman wrote: > Hi Michael > > You need a join on seqfeature_qualifier_value to get this detail. > This table stores feature qualifiers as key/value pairs, with the > corresponding key name ('name', 'product', etc.) belonging to the > relation 'term', so you'll need to join on that too. > > HTH > > Cheers > > Nick > > > Michael Muratet wrote: >> Greetings >> >> I am working on assembling gene CDS sequences on a medium scale, >> e.g., for all S. aureus strains, and I'm trying to find a way to >> get gene names from biosql entries I created from Genbank files >> with load_seqdatabase.pl. I'm using a query like this: >> >> SELECT >> c.seqfeature_id, b.strand, SUBSTR(a.seq, b.start_pos, >> b.end_pos- b.start_pos+1) as seq >> FROM >> biosequence a >> JOIN >> seqfeature c >> ON (a.bioentry_id=c.bioentry_id) >> JOIN >> location b >> ON (b.seqfeature_id=c.seqfeature_id) >> WHERE >> c.type_term_id=12 >> AND >> c.bioentry_id=221 >> >> This seems to work OK to get the sequence with the provision that >> one needs to reverse complement the sequence if the strand is minus. >> >> But I don't see anything in the schema that will allow me to >> identify the gene name or product from the seqfeature_id. >> >> Is gene name or product in the schema somewhere and I've missed it? >> >> Thanks >> >> Mike >> >> >> Michael Muratet, Ph.D. >> Senior Scientist >> HudsonAlpha Institute for Biotechnology >> mmuratet at hudsonalpha.org >> (256) 327-0473 (p) >> (256) 327-0966 (f) >> >> Room 4005 >> 601 Genome Way >> Huntsville, Alabama 35806 >> >> >> >> >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l >> > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From xupeng86 at gmail.com Wed Aug 25 03:13:04 2010 From: xupeng86 at gmail.com (=?GB2312?B?0OzF8w==?=) Date: Wed, 25 Aug 2010 11:13:04 +0800 Subject: [BioSQL-l] BioSQL-l Digest, Vol 76, Issue 5 In-Reply-To: References: Message-ID: Hi, everybody. I'm trying to split the NCBI COG flat files into mysql database. Anyone knows if there's already a universal schema that Bioperl can easily cope with ? Thanks. From hlapp at drycafe.net Wed Aug 25 03:15:18 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Tue, 24 Aug 2010 23:15:18 -0400 Subject: [BioSQL-l] BioSQL-l Digest, Vol 76, Issue 5 In-Reply-To: References: Message-ID: <4003E289-CBA6-405F-A1BA-505E718511B0@drycafe.net> BioSQL. Which is presumably why you posted here, right? -hilmar On Aug 24, 2010, at 11:13 PM, ?? wrote: > Hi, everybody. > I'm trying to split the NCBI COG flat files into mysql database. > Anyone knows if there's already a universal schema that Bioperl can > easily cope with ? > Thanks. > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : ===========================================================