From biopython at maubp.freeserve.co.uk  Sun Aug  1 06:01:37 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 1 Aug 2010 11:01:37 +0100
Subject: [BioSQL-l] migration to github
In-Reply-To: <04BBA390-6BC0-4700-8B14-812F6E2E4705@illinois.edu>
References: <22BC0098-7BEB-41E3-9EE6-D8987323CC24@drycafe.net>
	<AANLkTi=Gfrv6LAxFUnnnefueUHb83_vbBzMbhvMHxaNc@mail.gmail.com>
	<04BBA390-6BC0-4700-8B14-812F6E2E4705@illinois.edu>
Message-ID: <AANLkTimUmE5pGW9NVT4ZotexAMzBkjirpya2CxYi_kvA@mail.gmail.com>

On Sun, Aug 1, 2010 at 12:15 AM, Chris Fields wrote:
>
> On Jul 30, 2010, at 3:17 AM, Peter wrote:
>
>> On Fri, Jul 30, 2010 at 12:08 AM, Hilmar Lapp wrote:
>>>
>>>
>>> Finally, does anyone have a strong feeling about the capitalization of
>>> BioSQL on Github? All lowercase (github.com/biosql) or capitalized
>>> (github.com/BioSQL)?
>>
>> Personally I'd pick lowercase - it seems more commonly used
>> for repositories and usernames in general. In our case it also
>> avoided Biopython vs BioPython confusion. Curiously most but
>> not all of the BioPerl repositories are in lowercase...
>>
>> Peter
>
> Okay, organization and repo name are both now 'biosql'. ?No take-backs!

Thanks for sorting this out :)

> Re: upper case with bioperl repos, do you mean the Bio-* ones?
>?The emphasis there that (1) they aren't part of bioperl core but are
> still part of the Bio namespace, and (2) the dist will match the actual
> namespace and the module name (Bio::FeatureIO, for instance),
> unlike BioPerl and the others, and (3) there is some precedent
> (Bio::Graphics being one). ?This simple thing makes it a lot easier
> for keeping track of names, and the module name can be used for
> CPAN installation, indexing, and documentation.

Not being familiar with the specifics it just looked inconsistent, but it
sounds like there is a rational and practical scheme in place. Thanks
for explaining things.

Regards,

Peter


From rmb32 at cornell.edu  Sun Aug  1 15:17:14 2010
From: rmb32 at cornell.edu (Robert Buels)
Date: Sun, 01 Aug 2010 12:17:14 -0700
Subject: [BioSQL-l] GMOD Evo Hackathon Open Call for Participation
Message-ID: <4C55C83A.3060700@cornell.edu>

We are seeking participants for the GMOD Tools for Evolutionary Biology 
Hackathon, held November 8-12, 2010 at the US National Evolutionary 
Synthesis Center (NESCent) in Durham, NC.

This hackathon targets three critical gaps in the capabilities of the 
GMOD toolbox that currently limit its utility for evolutionary research:

  1. Visualization of comparative genomics data
  2. Visualization of phylogenetic data and trees
  3. Support for population diversity and phenotype data

If you are interested in these areas and have relevant expertise, you 
are strongly encouraged to apply. Relevant areas of expertise include 
more than just software development: if you are a GMOD power user, 
visualization guru, domain expert (comparative, phylogenetics, 
population, ...), or documentation wizard, then your skills are needed!

How To Apply:

Fill out the online application form at http://bit.ly/gmodevohack. 
Applications are due August 25.

About GMOD:

GMOD is an intercompatible suite of open-source software components for 
storing, managing, analyzing, and visualizing genome-scale data. GMOD 
includes many widely-used software components: GBrowse and JBrowse, both 
genome viewers; GBrowse_syn, a comparative genomics viewer; Chado, a 
generic and modular database schema; CMap, a comparative map viewer; as 
well as many other components including Apollo, MAKER, BioMart, 
InterMine, and Galaxy. We hope to extend the functionality of existing 
GMOD components, and integrate new components as well.

About Hackathons:

A hackathon is an intense event at which a group of programmers with 
different backgrounds and skills collaborate hands-on and face-to-face 
to develop working code that is of utility to the community as a whole. 
The mix of people will include domain experts and computer-savvy end-users.

More details about the event, its motivation, organization, procedures, 
and attendees, as well as URLs to the hackathon and related websites are 
included below.

Sincerely,

The GMOD EvoHack Organizing Committee (and project affiliations as
relevant):

Nicole Washington, Chair (LBNL, modENCODE, Phenote)

Robert Buels (SGN, Chado NatDiv)

Scott Cain (OICR, GMOD)

Dave Clements (NESCent, GMOD)

Hilmar Lapp (NESCent, Phenoscape, Chado NatDiv)

Sheldon McKay (University of Arizona, iPlant, GBrowse_syn)


-----------------------------

About the GMOD Evo Hackathon

Overview

We are organizing a hackathon to fill critical gaps in the capabilities 
of the Generic Model Organism Database (GMOD) toolbox that currently 
limit its utility for evolutionary research. Specifically, we will focus 
on tools for

   1) viewing comparative genomics data;
   2) visualizing phylogenomic data; and
   3) supporting population diversity data and phenotype annotation.

The event will be hosted at NESCent and bring together a group of about 
20+ software developers, end-user representatives, and documentation 
experts who would otherwise not meet. The participants will include key 
developers of GMOD components that currently lack features critical for 
emerging evolutionary biology research, developers of informatics tools 
in evolutionary research that lack GMOD integration, and 
informatics-savvy biologists who can represent end-user requirements.

The event will provide a unique opportunity to infuse the GMOD developer 
community with a heightened awareness of unmet needs in evolutionary 
biology that GMOD components have the potential to fill, and for tool 
developers in evolutionary biology to better understand how best to 
extend or integrate with already existing GMOD components.

Before the Event

Discussion of ideas and sometimes even design actually starts well 
before the hackathon, on mailing lists, wiki pages, and conference calls 
set up among accepted attendees.  This advance work lays the foundation 
for participants to be productive from the very first day.  This also 
means that participants should be willing to contribute some time in 
advance of the hackathon itself to participate in this preparatory 
discussion.

During the Event

Typically, hackathon participants use the morning of the first day of 
the event to organize themselves into working groups of between 3 and 6 
people, each with a focused implementation objective.  Ideas and 
objectives are discussed, and attendees coalesce around the projects in 
which they have the most experience or interest.


Deliverables / Event Results

The meeting's attendance, working groups, and outcomes will be fully 
logged and documented on the GMOD wiki (http://gmod.org). Each working 
group during the event will typically have its own wiki page, linked 
from the main EvoHack page, where it documents its minutes and design 
notes, and provides links to the code and documentation it produces. 
Also, since GMOD and NESCent are both committed to open source 
principles, all code and documentation produced by participants during 
the hackathon must be published under an OSI-approved open source 
license. As contributions to existing GMOD tools, all hackathon products 
will most likely satisfy this requirement automatically.

NESCent

This event is sponsored by the US National Evolutionary Synthesis Center 
(NESCent, http://www.nescent.org) through its Informatics Whitepapers 
program (http://www.nescent.org/informatics/whitepapers.php). NESCent 
promotes the synthesis of information, concepts and knowledge to address 
significant, emerging, or novel questions in evolutionary science and 
its applications. NESCent achieves this by supporting research and 
education across disciplinary, institutional, geographic, and 
demographic boundaries (see http://www.nescent.org/science/proposals.php).

Links

Main GMOD EvoHack page, and full proposal:
http://gmod.org/wiki/GMOD_Evo_Hackathon

NESCent: http://www.nescent.org/
GMOD: http://gmod.org <http://gmod.org/>
Similar past NESCent events, see: http://hackathon.nescent.org/
GMOD hackathon application:  http://bit.ly/gmodevohack

-- 
http://gmod.org/wiki/GMOD_News
http://gmod.org/wiki/GMOD_Europe_2010
http://gmod.org/wiki/Help_Desk_Feedback


From crackeur at comcast.net  Mon Aug 16 21:49:29 2010
From: crackeur at comcast.net (Jimmy Zhang)
Date: Mon, 16 Aug 2010 18:49:29 -0700
Subject: [BioSQL-l] [ANN]VTD-XML 2.9
In-Reply-To: <4C55C83A.3060700@cornell.edu>
References: <4C55C83A.3060700@cornell.edu>
Message-ID: <257BAC75A5844DF5ADF581B97575D970@JimmyZhangPC>

VTD-XML 2.9, the next generation XML Processing API for SOA and Cloud computing, has been released. Please visit https://sourceforge.net/projects/vtd-xml/files/ to download the latest version.

* Strict Conformance 
#VTD-XML now fully conforms to XML namespace 1.0 spec 
* Performance Improvement
#Significantly improved parsing performance for small XML files 
* Expand Core VTD-XML API 
#Adds getPrefixString(), and toNormalizedString2() 
* Cutting/Splitting 
#Adds getSiblingElementFragment() 
* A number of bug fixes and code enhancement including: 
#Fixes a bug for reading very large XML documents on some platforms 
#Fixes a bug in parsing processing instruction 
#Fixes a bug in outputAndReparse() 


From rmb32 at cornell.edu  Thu Aug 19 13:09:45 2010
From: rmb32 at cornell.edu (Robert Buels)
Date: Thu, 19 Aug 2010 10:09:45 -0700
Subject: [BioSQL-l] reminder: Aug 25 deadline for GMOD Hackathon application
Message-ID: <4C6D6559.3080809@cornell.edu>

Hi all,

This is your one-week reminder: the deadline for open applications to 
the GMOD Evo hackathon is Wednesday, August 25th.

Rob

========================================

We are seeking participants for the GMOD Tools for Evolutionary Biology
Hackathon, held November 8-12, 2010 at the US National Evolutionary
Synthesis Center (NESCent) in Durham, NC.

This hackathon targets three critical gaps in the capabilities of the
GMOD toolbox that currently limit its utility for evolutionary research:

  1. Visualization of comparative genomics data
  2. Visualization of phylogenetic data and trees
  3. Support for population diversity and phenotype data

If you are interested in these areas and have relevant expertise, you
are strongly encouraged to apply. Relevant areas of expertise include
more than just software development: if you are a GMOD power user,
visualization guru, domain expert (comparative, phylogenetics,
population, ...), or documentation wizard, then your skills are needed!

How To Apply:

Fill out the online application form at http://bit.ly/gmodevohack.
Applications are due August 25.

About GMOD:

GMOD is an intercompatible suite of open-source software components for
storing, managing, analyzing, and visualizing genome-scale data. GMOD
includes many widely-used software components: GBrowse and JBrowse, both
genome viewers; GBrowse_syn, a comparative genomics viewer; Chado, a
generic and modular database schema; CMap, a comparative map viewer; as
well as many other components including Apollo, MAKER, BioMart,
InterMine, and Galaxy. We hope to extend the functionality of existing
GMOD components, and integrate new components as well.

About Hackathons:

A hackathon is an intense event at which a group of programmers with
different backgrounds and skills collaborate hands-on and face-to-face
to develop working code that is of utility to the community as a whole.
The mix of people will include domain experts and computer-savvy end-users.

More details about the event, its motivation, organization, procedures,
and attendees, as well as URLs to the hackathon and related websites are
included below.

Sincerely,

The GMOD EvoHack Organizing Committee (and project affiliations as
relevant):

Nicole Washington, Chair (LBNL, modENCODE, Phenote)

Robert Buels (SGN, Chado NatDiv)

Scott Cain (OICR, GMOD)

Dave Clements (NESCent, GMOD)

Hilmar Lapp (NESCent, Phenoscape, Chado NatDiv)

Sheldon McKay (University of Arizona, iPlant, GBrowse_syn)


-----------------------------

About the GMOD Evo Hackathon

Overview

We are organizing a hackathon to fill critical gaps in the capabilities
of the Generic Model Organism Database (GMOD) toolbox that currently
limit its utility for evolutionary research. Specifically, we will focus
on tools for

   1) viewing comparative genomics data;
   2) visualizing phylogenomic data; and
   3) supporting population diversity data and phenotype annotation.

The event will be hosted at NESCent and bring together a group of about
20+ software developers, end-user representatives, and documentation
experts who would otherwise not meet. The participants will include key
developers of GMOD components that currently lack features critical for
emerging evolutionary biology research, developers of informatics tools
in evolutionary research that lack GMOD integration, and
informatics-savvy biologists who can represent end-user requirements.

The event will provide a unique opportunity to infuse the GMOD developer
community with a heightened awareness of unmet needs in evolutionary
biology that GMOD components have the potential to fill, and for tool
developers in evolutionary biology to better understand how best to
extend or integrate with already existing GMOD components.

Before the Event

Discussion of ideas and sometimes even design actually starts well
before the hackathon, on mailing lists, wiki pages, and conference calls
set up among accepted attendees.  This advance work lays the foundation
for participants to be productive from the very first day.  This also
means that participants should be willing to contribute some time in
advance of the hackathon itself to participate in this preparatory
discussion.

During the Event

Typically, hackathon participants use the morning of the first day of
the event to organize themselves into working groups of between 3 and 6
people, each with a focused implementation objective.  Ideas and
objectives are discussed, and attendees coalesce around the projects in
which they have the most experience or interest.


Deliverables / Event Results

The meeting's attendance, working groups, and outcomes will be fully
logged and documented on the GMOD wiki (http://gmod.org). Each working
group during the event will typically have its own wiki page, linked
from the main EvoHack page, where it documents its minutes and design
notes, and provides links to the code and documentation it produces.
Also, since GMOD and NESCent are both committed to open source
principles, all code and documentation produced by participants during
the hackathon must be published under an OSI-approved open source
license. As contributions to existing GMOD tools, all hackathon products
will most likely satisfy this requirement automatically.

NESCent

This event is sponsored by the US National Evolutionary Synthesis Center
(NESCent, http://www.nescent.org) through its Informatics Whitepapers
program (http://www.nescent.org/informatics/whitepapers.php). NESCent
promotes the synthesis of information, concepts and knowledge to address
significant, emerging, or novel questions in evolutionary science and
its applications. NESCent achieves this by supporting research and
education across disciplinary, institutional, geographic, and
demographic boundaries (see http://www.nescent.org/science/proposals.php).

Links

Main GMOD EvoHack page, and full proposal:
http://gmod.org/wiki/GMOD_Evo_Hackathon

NESCent: http://www.nescent.org/
GMOD: http://gmod.org <http://gmod.org/>
Similar past NESCent events, see: http://hackathon.nescent.org/
GMOD hackathon application:  http://bit.ly/gmodevohack

-- 
http://gmod.org/wiki/GMOD_News
http://gmod.org/wiki/GMOD_Europe_2010
http://gmod.org/wiki/Help_Desk_Feedback


From mmuratet at hudsonalpha.org  Mon Aug 23 14:43:28 2010
From: mmuratet at hudsonalpha.org (Michael Muratet)
Date: Mon, 23 Aug 2010 13:43:28 -0500
Subject: [BioSQL-l] Getting gene name, function etc. from biosql
Message-ID: <803C0F6C-FD55-4AFE-9B7F-A0A749295E70@hudsonalpha.org>

Greetings

I am working on assembling gene CDS sequences on a medium scale, e.g.,  
for all S. aureus strains, and I'm trying to find a way to get gene  
names from biosql entries I created from Genbank files with  
load_seqdatabase.pl. I'm using a query like this:

SELECT
     c.seqfeature_id, b.strand, SUBSTR(a.seq, b.start_pos, b.end_pos- 
b.start_pos+1) as seq
FROM
     biosequence a
     JOIN
     seqfeature c
     ON (a.bioentry_id=c.bioentry_id)
     JOIN
     location b
     ON (b.seqfeature_id=c.seqfeature_id)
WHERE
     c.type_term_id=12
     AND
     c.bioentry_id=221

This seems to work OK to get the sequence with the provision that one  
needs to reverse complement the sequence if the strand is minus.

But I don't see anything in the schema that will allow me to identify  
the gene name or product from the seqfeature_id.

Is gene name or product in the schema somewhere and I've missed it?

Thanks

Mike


Michael Muratet, Ph.D.
Senior Scientist
HudsonAlpha Institute for Biotechnology
mmuratet at hudsonalpha.org
(256) 327-0473 (p)
(256) 327-0966 (f)

Room 4005
601 Genome Way
Huntsville, Alabama 35806


From mmuratet at hudsonalpha.org  Mon Aug 23 15:20:03 2010
From: mmuratet at hudsonalpha.org (Michael Muratet)
Date: Mon, 23 Aug 2010 14:20:03 -0500
Subject: [BioSQL-l] Getting gene name, function etc. from biosql
In-Reply-To: <4C72C744.7090501@bham.ac.uk>
References: <803C0F6C-FD55-4AFE-9B7F-A0A749295E70@hudsonalpha.org>
	<4C72C744.7090501@bham.ac.uk>
Message-ID: <EA392433-C1B6-4A3E-93FD-2020B7835E55@hudsonalpha.org>


On Aug 23, 2010, at 2:08 PM, Nick Loman wrote:

> Hi Michael
>
> You need a join on seqfeature_qualifier_value to get this detail.  
> This table stores feature qualifiers as key/value pairs, with the  
> corresponding key name ('name', 'product', etc.) belonging to the  
> relation 'term', so you'll need to join on that too.

Hi Nick

Yes, that does the trick. I knew it would be something simple ;-)

Thanks

Mike

>
> HTH
>
> Cheers
>
> Nick
>
>
> Michael Muratet wrote:
>> Greetings
>>
>> I am working on assembling gene CDS sequences on a medium scale,  
>> e.g.,  for all S. aureus strains, and I'm trying to find a way to  
>> get gene  names from biosql entries I created from Genbank files  
>> with  load_seqdatabase.pl. I'm using a query like this:
>>
>> SELECT
>>     c.seqfeature_id, b.strand, SUBSTR(a.seq, b.start_pos,  
>> b.end_pos- b.start_pos+1) as seq
>> FROM
>>     biosequence a
>>     JOIN
>>     seqfeature c
>>     ON (a.bioentry_id=c.bioentry_id)
>>     JOIN
>>     location b
>>     ON (b.seqfeature_id=c.seqfeature_id)
>> WHERE
>>     c.type_term_id=12
>>     AND
>>     c.bioentry_id=221
>>
>> This seems to work OK to get the sequence with the provision that  
>> one  needs to reverse complement the sequence if the strand is minus.
>>
>> But I don't see anything in the schema that will allow me to  
>> identify  the gene name or product from the seqfeature_id.
>>
>> Is gene name or product in the schema somewhere and I've missed it?
>>
>> Thanks
>>
>> Mike
>>
>>
>> Michael Muratet, Ph.D.
>> Senior Scientist
>> HudsonAlpha Institute for Biotechnology
>> mmuratet at hudsonalpha.org
>> (256) 327-0473 (p)
>> (256) 327-0966 (f)
>>
>> Room 4005
>> 601 Genome Way
>> Huntsville, Alabama 35806
>>
>>
>>
>>
>>
>> _______________________________________________
>> BioSQL-l mailing list
>> BioSQL-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>
>

Michael Muratet, Ph.D.
Senior Scientist
HudsonAlpha Institute for Biotechnology
mmuratet at hudsonalpha.org
(256) 327-0473 (p)
(256) 327-0966 (f)

Room 4005
601 Genome Way
Huntsville, Alabama 35806


From n.j.loman at bham.ac.uk  Mon Aug 23 15:08:52 2010
From: n.j.loman at bham.ac.uk (Nick Loman)
Date: Mon, 23 Aug 2010 20:08:52 +0100
Subject: [BioSQL-l] Getting gene name, function etc. from biosql
In-Reply-To: <803C0F6C-FD55-4AFE-9B7F-A0A749295E70@hudsonalpha.org>
References: <803C0F6C-FD55-4AFE-9B7F-A0A749295E70@hudsonalpha.org>
Message-ID: <4C72C744.7090501@bham.ac.uk>

Hi Michael

You need a join on seqfeature_qualifier_value to get this detail. This 
table stores feature qualifiers as key/value pairs, with the 
corresponding key name ('name', 'product', etc.) belonging to the 
relation 'term', so you'll need to join on that too.

HTH

Cheers

Nick


Michael Muratet wrote:
> Greetings
>
> I am working on assembling gene CDS sequences on a medium scale, e.g.,  
> for all S. aureus strains, and I'm trying to find a way to get gene  
> names from biosql entries I created from Genbank files with  
> load_seqdatabase.pl. I'm using a query like this:
>
> SELECT
>      c.seqfeature_id, b.strand, SUBSTR(a.seq, b.start_pos, b.end_pos- 
> b.start_pos+1) as seq
> FROM
>      biosequence a
>      JOIN
>      seqfeature c
>      ON (a.bioentry_id=c.bioentry_id)
>      JOIN
>      location b
>      ON (b.seqfeature_id=c.seqfeature_id)
> WHERE
>      c.type_term_id=12
>      AND
>      c.bioentry_id=221
>
> This seems to work OK to get the sequence with the provision that one  
> needs to reverse complement the sequence if the strand is minus.
>
> But I don't see anything in the schema that will allow me to identify  
> the gene name or product from the seqfeature_id.
>
> Is gene name or product in the schema somewhere and I've missed it?
>
> Thanks
>
> Mike
>
>
> Michael Muratet, Ph.D.
> Senior Scientist
> HudsonAlpha Institute for Biotechnology
> mmuratet at hudsonalpha.org
> (256) 327-0473 (p)
> (256) 327-0966 (f)
>
> Room 4005
> 601 Genome Way
> Huntsville, Alabama 35806
>
>
>
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>   


From hlapp at drycafe.net  Tue Aug 24 22:47:44 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Tue, 24 Aug 2010 22:47:44 -0400
Subject: [BioSQL-l] Getting gene name, function etc. from biosql
In-Reply-To: <4C72C744.7090501@bham.ac.uk>
References: <803C0F6C-FD55-4AFE-9B7F-A0A749295E70@hudsonalpha.org>
	<4C72C744.7090501@bham.ac.uk>
Message-ID: <E28A4307-3196-47ED-8E24-23CCAB79EBF9@drycafe.net>

Yep - thanks for the helping out!

	-hilmar

On Aug 23, 2010, at 3:08 PM, Nick Loman wrote:

> Hi Michael
>
> You need a join on seqfeature_qualifier_value to get this detail.  
> This table stores feature qualifiers as key/value pairs, with the  
> corresponding key name ('name', 'product', etc.) belonging to the  
> relation 'term', so you'll need to join on that too.
>
> HTH
>
> Cheers
>
> Nick
>
>
> Michael Muratet wrote:
>> Greetings
>>
>> I am working on assembling gene CDS sequences on a medium scale,  
>> e.g.,  for all S. aureus strains, and I'm trying to find a way to  
>> get gene  names from biosql entries I created from Genbank files  
>> with  load_seqdatabase.pl. I'm using a query like this:
>>
>> SELECT
>>     c.seqfeature_id, b.strand, SUBSTR(a.seq, b.start_pos,  
>> b.end_pos- b.start_pos+1) as seq
>> FROM
>>     biosequence a
>>     JOIN
>>     seqfeature c
>>     ON (a.bioentry_id=c.bioentry_id)
>>     JOIN
>>     location b
>>     ON (b.seqfeature_id=c.seqfeature_id)
>> WHERE
>>     c.type_term_id=12
>>     AND
>>     c.bioentry_id=221
>>
>> This seems to work OK to get the sequence with the provision that  
>> one  needs to reverse complement the sequence if the strand is minus.
>>
>> But I don't see anything in the schema that will allow me to  
>> identify  the gene name or product from the seqfeature_id.
>>
>> Is gene name or product in the schema somewhere and I've missed it?
>>
>> Thanks
>>
>> Mike
>>
>>
>> Michael Muratet, Ph.D.
>> Senior Scientist
>> HudsonAlpha Institute for Biotechnology
>> mmuratet at hudsonalpha.org
>> (256) 327-0473 (p)
>> (256) 327-0966 (f)
>>
>> Room 4005
>> 601 Genome Way
>> Huntsville, Alabama 35806
>>
>>
>>
>>
>>
>> _______________________________________________
>> BioSQL-l mailing list
>> BioSQL-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From xupeng86 at gmail.com  Tue Aug 24 23:13:04 2010
From: xupeng86 at gmail.com (=?GB2312?B?0OzF8w==?=)
Date: Wed, 25 Aug 2010 11:13:04 +0800
Subject: [BioSQL-l] BioSQL-l Digest, Vol 76, Issue 5
In-Reply-To: <mailman.3.1282665603.7520.biosql-l@lists.open-bio.org>
References: <mailman.3.1282665603.7520.biosql-l@lists.open-bio.org>
Message-ID: <AANLkTikeipUG6+_pmEjUkpZQ-PdZr-aKWO5tbiU71acS@mail.gmail.com>

Hi, everybody.
I'm trying to split the NCBI COG flat files into mysql database.
Anyone knows if there's already a universal schema that Bioperl can
easily cope with ?
Thanks.

From hlapp at drycafe.net  Tue Aug 24 23:15:18 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Tue, 24 Aug 2010 23:15:18 -0400
Subject: [BioSQL-l] BioSQL-l Digest, Vol 76, Issue 5
In-Reply-To: <AANLkTikeipUG6+_pmEjUkpZQ-PdZr-aKWO5tbiU71acS@mail.gmail.com>
References: <mailman.3.1282665603.7520.biosql-l@lists.open-bio.org>
	<AANLkTikeipUG6+_pmEjUkpZQ-PdZr-aKWO5tbiU71acS@mail.gmail.com>
Message-ID: <4003E289-CBA6-405F-A1BA-505E718511B0@drycafe.net>

BioSQL. Which is presumably why you posted here, right?

	-hilmar

On Aug 24, 2010, at 11:13 PM, ?? wrote:

> Hi, everybody.
> I'm trying to split the NCBI COG flat files into mysql database.
> Anyone knows if there's already a universal schema that Bioperl can
> easily cope with ?
> Thanks.
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From biopython at maubp.freeserve.co.uk  Sun Aug  1 10:01:37 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 1 Aug 2010 11:01:37 +0100
Subject: [BioSQL-l] migration to github
In-Reply-To: <04BBA390-6BC0-4700-8B14-812F6E2E4705@illinois.edu>
References: <22BC0098-7BEB-41E3-9EE6-D8987323CC24@drycafe.net>
	<AANLkTi=Gfrv6LAxFUnnnefueUHb83_vbBzMbhvMHxaNc@mail.gmail.com>
	<04BBA390-6BC0-4700-8B14-812F6E2E4705@illinois.edu>
Message-ID: <AANLkTimUmE5pGW9NVT4ZotexAMzBkjirpya2CxYi_kvA@mail.gmail.com>

On Sun, Aug 1, 2010 at 12:15 AM, Chris Fields wrote:
>
> On Jul 30, 2010, at 3:17 AM, Peter wrote:
>
>> On Fri, Jul 30, 2010 at 12:08 AM, Hilmar Lapp wrote:
>>>
>>>
>>> Finally, does anyone have a strong feeling about the capitalization of
>>> BioSQL on Github? All lowercase (github.com/biosql) or capitalized
>>> (github.com/BioSQL)?
>>
>> Personally I'd pick lowercase - it seems more commonly used
>> for repositories and usernames in general. In our case it also
>> avoided Biopython vs BioPython confusion. Curiously most but
>> not all of the BioPerl repositories are in lowercase...
>>
>> Peter
>
> Okay, organization and repo name are both now 'biosql'. ?No take-backs!

Thanks for sorting this out :)

> Re: upper case with bioperl repos, do you mean the Bio-* ones?
>?The emphasis there that (1) they aren't part of bioperl core but are
> still part of the Bio namespace, and (2) the dist will match the actual
> namespace and the module name (Bio::FeatureIO, for instance),
> unlike BioPerl and the others, and (3) there is some precedent
> (Bio::Graphics being one). ?This simple thing makes it a lot easier
> for keeping track of names, and the module name can be used for
> CPAN installation, indexing, and documentation.

Not being familiar with the specifics it just looked inconsistent, but it
sounds like there is a rational and practical scheme in place. Thanks
for explaining things.

Regards,

Peter


From rmb32 at cornell.edu  Sun Aug  1 19:17:14 2010
From: rmb32 at cornell.edu (Robert Buels)
Date: Sun, 01 Aug 2010 12:17:14 -0700
Subject: [BioSQL-l] GMOD Evo Hackathon Open Call for Participation
Message-ID: <4C55C83A.3060700@cornell.edu>

We are seeking participants for the GMOD Tools for Evolutionary Biology 
Hackathon, held November 8-12, 2010 at the US National Evolutionary 
Synthesis Center (NESCent) in Durham, NC.

This hackathon targets three critical gaps in the capabilities of the 
GMOD toolbox that currently limit its utility for evolutionary research:

  1. Visualization of comparative genomics data
  2. Visualization of phylogenetic data and trees
  3. Support for population diversity and phenotype data

If you are interested in these areas and have relevant expertise, you 
are strongly encouraged to apply. Relevant areas of expertise include 
more than just software development: if you are a GMOD power user, 
visualization guru, domain expert (comparative, phylogenetics, 
population, ...), or documentation wizard, then your skills are needed!

How To Apply:

Fill out the online application form at http://bit.ly/gmodevohack. 
Applications are due August 25.

About GMOD:

GMOD is an intercompatible suite of open-source software components for 
storing, managing, analyzing, and visualizing genome-scale data. GMOD 
includes many widely-used software components: GBrowse and JBrowse, both 
genome viewers; GBrowse_syn, a comparative genomics viewer; Chado, a 
generic and modular database schema; CMap, a comparative map viewer; as 
well as many other components including Apollo, MAKER, BioMart, 
InterMine, and Galaxy. We hope to extend the functionality of existing 
GMOD components, and integrate new components as well.

About Hackathons:

A hackathon is an intense event at which a group of programmers with 
different backgrounds and skills collaborate hands-on and face-to-face 
to develop working code that is of utility to the community as a whole. 
The mix of people will include domain experts and computer-savvy end-users.

More details about the event, its motivation, organization, procedures, 
and attendees, as well as URLs to the hackathon and related websites are 
included below.

Sincerely,

The GMOD EvoHack Organizing Committee (and project affiliations as
relevant):

Nicole Washington, Chair (LBNL, modENCODE, Phenote)

Robert Buels (SGN, Chado NatDiv)

Scott Cain (OICR, GMOD)

Dave Clements (NESCent, GMOD)

Hilmar Lapp (NESCent, Phenoscape, Chado NatDiv)

Sheldon McKay (University of Arizona, iPlant, GBrowse_syn)


-----------------------------

About the GMOD Evo Hackathon

Overview

We are organizing a hackathon to fill critical gaps in the capabilities 
of the Generic Model Organism Database (GMOD) toolbox that currently 
limit its utility for evolutionary research. Specifically, we will focus 
on tools for

   1) viewing comparative genomics data;
   2) visualizing phylogenomic data; and
   3) supporting population diversity data and phenotype annotation.

The event will be hosted at NESCent and bring together a group of about 
20+ software developers, end-user representatives, and documentation 
experts who would otherwise not meet. The participants will include key 
developers of GMOD components that currently lack features critical for 
emerging evolutionary biology research, developers of informatics tools 
in evolutionary research that lack GMOD integration, and 
informatics-savvy biologists who can represent end-user requirements.

The event will provide a unique opportunity to infuse the GMOD developer 
community with a heightened awareness of unmet needs in evolutionary 
biology that GMOD components have the potential to fill, and for tool 
developers in evolutionary biology to better understand how best to 
extend or integrate with already existing GMOD components.

Before the Event

Discussion of ideas and sometimes even design actually starts well 
before the hackathon, on mailing lists, wiki pages, and conference calls 
set up among accepted attendees.  This advance work lays the foundation 
for participants to be productive from the very first day.  This also 
means that participants should be willing to contribute some time in 
advance of the hackathon itself to participate in this preparatory 
discussion.

During the Event

Typically, hackathon participants use the morning of the first day of 
the event to organize themselves into working groups of between 3 and 6 
people, each with a focused implementation objective.  Ideas and 
objectives are discussed, and attendees coalesce around the projects in 
which they have the most experience or interest.


Deliverables / Event Results

The meeting's attendance, working groups, and outcomes will be fully 
logged and documented on the GMOD wiki (http://gmod.org). Each working 
group during the event will typically have its own wiki page, linked 
from the main EvoHack page, where it documents its minutes and design 
notes, and provides links to the code and documentation it produces. 
Also, since GMOD and NESCent are both committed to open source 
principles, all code and documentation produced by participants during 
the hackathon must be published under an OSI-approved open source 
license. As contributions to existing GMOD tools, all hackathon products 
will most likely satisfy this requirement automatically.

NESCent

This event is sponsored by the US National Evolutionary Synthesis Center 
(NESCent, http://www.nescent.org) through its Informatics Whitepapers 
program (http://www.nescent.org/informatics/whitepapers.php). NESCent 
promotes the synthesis of information, concepts and knowledge to address 
significant, emerging, or novel questions in evolutionary science and 
its applications. NESCent achieves this by supporting research and 
education across disciplinary, institutional, geographic, and 
demographic boundaries (see http://www.nescent.org/science/proposals.php).

Links

Main GMOD EvoHack page, and full proposal:
http://gmod.org/wiki/GMOD_Evo_Hackathon

NESCent: http://www.nescent.org/
GMOD: http://gmod.org <http://gmod.org/>
Similar past NESCent events, see: http://hackathon.nescent.org/
GMOD hackathon application:  http://bit.ly/gmodevohack

-- 
http://gmod.org/wiki/GMOD_News
http://gmod.org/wiki/GMOD_Europe_2010
http://gmod.org/wiki/Help_Desk_Feedback


From crackeur at comcast.net  Tue Aug 17 01:49:29 2010
From: crackeur at comcast.net (Jimmy Zhang)
Date: Mon, 16 Aug 2010 18:49:29 -0700
Subject: [BioSQL-l] [ANN]VTD-XML 2.9
In-Reply-To: <4C55C83A.3060700@cornell.edu>
References: <4C55C83A.3060700@cornell.edu>
Message-ID: <257BAC75A5844DF5ADF581B97575D970@JimmyZhangPC>

VTD-XML 2.9, the next generation XML Processing API for SOA and Cloud computing, has been released. Please visit https://sourceforge.net/projects/vtd-xml/files/ to download the latest version.

* Strict Conformance 
#VTD-XML now fully conforms to XML namespace 1.0 spec 
* Performance Improvement
#Significantly improved parsing performance for small XML files 
* Expand Core VTD-XML API 
#Adds getPrefixString(), and toNormalizedString2() 
* Cutting/Splitting 
#Adds getSiblingElementFragment() 
* A number of bug fixes and code enhancement including: 
#Fixes a bug for reading very large XML documents on some platforms 
#Fixes a bug in parsing processing instruction 
#Fixes a bug in outputAndReparse() 


From rmb32 at cornell.edu  Thu Aug 19 17:09:45 2010
From: rmb32 at cornell.edu (Robert Buels)
Date: Thu, 19 Aug 2010 10:09:45 -0700
Subject: [BioSQL-l] reminder: Aug 25 deadline for GMOD Hackathon application
Message-ID: <4C6D6559.3080809@cornell.edu>

Hi all,

This is your one-week reminder: the deadline for open applications to 
the GMOD Evo hackathon is Wednesday, August 25th.

Rob

========================================

We are seeking participants for the GMOD Tools for Evolutionary Biology
Hackathon, held November 8-12, 2010 at the US National Evolutionary
Synthesis Center (NESCent) in Durham, NC.

This hackathon targets three critical gaps in the capabilities of the
GMOD toolbox that currently limit its utility for evolutionary research:

  1. Visualization of comparative genomics data
  2. Visualization of phylogenetic data and trees
  3. Support for population diversity and phenotype data

If you are interested in these areas and have relevant expertise, you
are strongly encouraged to apply. Relevant areas of expertise include
more than just software development: if you are a GMOD power user,
visualization guru, domain expert (comparative, phylogenetics,
population, ...), or documentation wizard, then your skills are needed!

How To Apply:

Fill out the online application form at http://bit.ly/gmodevohack.
Applications are due August 25.

About GMOD:

GMOD is an intercompatible suite of open-source software components for
storing, managing, analyzing, and visualizing genome-scale data. GMOD
includes many widely-used software components: GBrowse and JBrowse, both
genome viewers; GBrowse_syn, a comparative genomics viewer; Chado, a
generic and modular database schema; CMap, a comparative map viewer; as
well as many other components including Apollo, MAKER, BioMart,
InterMine, and Galaxy. We hope to extend the functionality of existing
GMOD components, and integrate new components as well.

About Hackathons:

A hackathon is an intense event at which a group of programmers with
different backgrounds and skills collaborate hands-on and face-to-face
to develop working code that is of utility to the community as a whole.
The mix of people will include domain experts and computer-savvy end-users.

More details about the event, its motivation, organization, procedures,
and attendees, as well as URLs to the hackathon and related websites are
included below.

Sincerely,

The GMOD EvoHack Organizing Committee (and project affiliations as
relevant):

Nicole Washington, Chair (LBNL, modENCODE, Phenote)

Robert Buels (SGN, Chado NatDiv)

Scott Cain (OICR, GMOD)

Dave Clements (NESCent, GMOD)

Hilmar Lapp (NESCent, Phenoscape, Chado NatDiv)

Sheldon McKay (University of Arizona, iPlant, GBrowse_syn)


-----------------------------

About the GMOD Evo Hackathon

Overview

We are organizing a hackathon to fill critical gaps in the capabilities
of the Generic Model Organism Database (GMOD) toolbox that currently
limit its utility for evolutionary research. Specifically, we will focus
on tools for

   1) viewing comparative genomics data;
   2) visualizing phylogenomic data; and
   3) supporting population diversity data and phenotype annotation.

The event will be hosted at NESCent and bring together a group of about
20+ software developers, end-user representatives, and documentation
experts who would otherwise not meet. The participants will include key
developers of GMOD components that currently lack features critical for
emerging evolutionary biology research, developers of informatics tools
in evolutionary research that lack GMOD integration, and
informatics-savvy biologists who can represent end-user requirements.

The event will provide a unique opportunity to infuse the GMOD developer
community with a heightened awareness of unmet needs in evolutionary
biology that GMOD components have the potential to fill, and for tool
developers in evolutionary biology to better understand how best to
extend or integrate with already existing GMOD components.

Before the Event

Discussion of ideas and sometimes even design actually starts well
before the hackathon, on mailing lists, wiki pages, and conference calls
set up among accepted attendees.  This advance work lays the foundation
for participants to be productive from the very first day.  This also
means that participants should be willing to contribute some time in
advance of the hackathon itself to participate in this preparatory
discussion.

During the Event

Typically, hackathon participants use the morning of the first day of
the event to organize themselves into working groups of between 3 and 6
people, each with a focused implementation objective.  Ideas and
objectives are discussed, and attendees coalesce around the projects in
which they have the most experience or interest.


Deliverables / Event Results

The meeting's attendance, working groups, and outcomes will be fully
logged and documented on the GMOD wiki (http://gmod.org). Each working
group during the event will typically have its own wiki page, linked
from the main EvoHack page, where it documents its minutes and design
notes, and provides links to the code and documentation it produces.
Also, since GMOD and NESCent are both committed to open source
principles, all code and documentation produced by participants during
the hackathon must be published under an OSI-approved open source
license. As contributions to existing GMOD tools, all hackathon products
will most likely satisfy this requirement automatically.

NESCent

This event is sponsored by the US National Evolutionary Synthesis Center
(NESCent, http://www.nescent.org) through its Informatics Whitepapers
program (http://www.nescent.org/informatics/whitepapers.php). NESCent
promotes the synthesis of information, concepts and knowledge to address
significant, emerging, or novel questions in evolutionary science and
its applications. NESCent achieves this by supporting research and
education across disciplinary, institutional, geographic, and
demographic boundaries (see http://www.nescent.org/science/proposals.php).

Links

Main GMOD EvoHack page, and full proposal:
http://gmod.org/wiki/GMOD_Evo_Hackathon

NESCent: http://www.nescent.org/
GMOD: http://gmod.org <http://gmod.org/>
Similar past NESCent events, see: http://hackathon.nescent.org/
GMOD hackathon application:  http://bit.ly/gmodevohack

-- 
http://gmod.org/wiki/GMOD_News
http://gmod.org/wiki/GMOD_Europe_2010
http://gmod.org/wiki/Help_Desk_Feedback


From mmuratet at hudsonalpha.org  Mon Aug 23 18:43:28 2010
From: mmuratet at hudsonalpha.org (Michael Muratet)
Date: Mon, 23 Aug 2010 13:43:28 -0500
Subject: [BioSQL-l] Getting gene name, function etc. from biosql
Message-ID: <803C0F6C-FD55-4AFE-9B7F-A0A749295E70@hudsonalpha.org>

Greetings

I am working on assembling gene CDS sequences on a medium scale, e.g.,  
for all S. aureus strains, and I'm trying to find a way to get gene  
names from biosql entries I created from Genbank files with  
load_seqdatabase.pl. I'm using a query like this:

SELECT
     c.seqfeature_id, b.strand, SUBSTR(a.seq, b.start_pos, b.end_pos- 
b.start_pos+1) as seq
FROM
     biosequence a
     JOIN
     seqfeature c
     ON (a.bioentry_id=c.bioentry_id)
     JOIN
     location b
     ON (b.seqfeature_id=c.seqfeature_id)
WHERE
     c.type_term_id=12
     AND
     c.bioentry_id=221

This seems to work OK to get the sequence with the provision that one  
needs to reverse complement the sequence if the strand is minus.

But I don't see anything in the schema that will allow me to identify  
the gene name or product from the seqfeature_id.

Is gene name or product in the schema somewhere and I've missed it?

Thanks

Mike


Michael Muratet, Ph.D.
Senior Scientist
HudsonAlpha Institute for Biotechnology
mmuratet at hudsonalpha.org
(256) 327-0473 (p)
(256) 327-0966 (f)

Room 4005
601 Genome Way
Huntsville, Alabama 35806


From mmuratet at hudsonalpha.org  Mon Aug 23 19:20:03 2010
From: mmuratet at hudsonalpha.org (Michael Muratet)
Date: Mon, 23 Aug 2010 14:20:03 -0500
Subject: [BioSQL-l] Getting gene name, function etc. from biosql
In-Reply-To: <4C72C744.7090501@bham.ac.uk>
References: <803C0F6C-FD55-4AFE-9B7F-A0A749295E70@hudsonalpha.org>
	<4C72C744.7090501@bham.ac.uk>
Message-ID: <EA392433-C1B6-4A3E-93FD-2020B7835E55@hudsonalpha.org>


On Aug 23, 2010, at 2:08 PM, Nick Loman wrote:

> Hi Michael
>
> You need a join on seqfeature_qualifier_value to get this detail.  
> This table stores feature qualifiers as key/value pairs, with the  
> corresponding key name ('name', 'product', etc.) belonging to the  
> relation 'term', so you'll need to join on that too.

Hi Nick

Yes, that does the trick. I knew it would be something simple ;-)

Thanks

Mike

>
> HTH
>
> Cheers
>
> Nick
>
>
> Michael Muratet wrote:
>> Greetings
>>
>> I am working on assembling gene CDS sequences on a medium scale,  
>> e.g.,  for all S. aureus strains, and I'm trying to find a way to  
>> get gene  names from biosql entries I created from Genbank files  
>> with  load_seqdatabase.pl. I'm using a query like this:
>>
>> SELECT
>>     c.seqfeature_id, b.strand, SUBSTR(a.seq, b.start_pos,  
>> b.end_pos- b.start_pos+1) as seq
>> FROM
>>     biosequence a
>>     JOIN
>>     seqfeature c
>>     ON (a.bioentry_id=c.bioentry_id)
>>     JOIN
>>     location b
>>     ON (b.seqfeature_id=c.seqfeature_id)
>> WHERE
>>     c.type_term_id=12
>>     AND
>>     c.bioentry_id=221
>>
>> This seems to work OK to get the sequence with the provision that  
>> one  needs to reverse complement the sequence if the strand is minus.
>>
>> But I don't see anything in the schema that will allow me to  
>> identify  the gene name or product from the seqfeature_id.
>>
>> Is gene name or product in the schema somewhere and I've missed it?
>>
>> Thanks
>>
>> Mike
>>
>>
>> Michael Muratet, Ph.D.
>> Senior Scientist
>> HudsonAlpha Institute for Biotechnology
>> mmuratet at hudsonalpha.org
>> (256) 327-0473 (p)
>> (256) 327-0966 (f)
>>
>> Room 4005
>> 601 Genome Way
>> Huntsville, Alabama 35806
>>
>>
>>
>>
>>
>> _______________________________________________
>> BioSQL-l mailing list
>> BioSQL-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>
>

Michael Muratet, Ph.D.
Senior Scientist
HudsonAlpha Institute for Biotechnology
mmuratet at hudsonalpha.org
(256) 327-0473 (p)
(256) 327-0966 (f)

Room 4005
601 Genome Way
Huntsville, Alabama 35806


From n.j.loman at bham.ac.uk  Mon Aug 23 19:08:52 2010
From: n.j.loman at bham.ac.uk (Nick Loman)
Date: Mon, 23 Aug 2010 20:08:52 +0100
Subject: [BioSQL-l] Getting gene name, function etc. from biosql
In-Reply-To: <803C0F6C-FD55-4AFE-9B7F-A0A749295E70@hudsonalpha.org>
References: <803C0F6C-FD55-4AFE-9B7F-A0A749295E70@hudsonalpha.org>
Message-ID: <4C72C744.7090501@bham.ac.uk>

Hi Michael

You need a join on seqfeature_qualifier_value to get this detail. This 
table stores feature qualifiers as key/value pairs, with the 
corresponding key name ('name', 'product', etc.) belonging to the 
relation 'term', so you'll need to join on that too.

HTH

Cheers

Nick


Michael Muratet wrote:
> Greetings
>
> I am working on assembling gene CDS sequences on a medium scale, e.g.,  
> for all S. aureus strains, and I'm trying to find a way to get gene  
> names from biosql entries I created from Genbank files with  
> load_seqdatabase.pl. I'm using a query like this:
>
> SELECT
>      c.seqfeature_id, b.strand, SUBSTR(a.seq, b.start_pos, b.end_pos- 
> b.start_pos+1) as seq
> FROM
>      biosequence a
>      JOIN
>      seqfeature c
>      ON (a.bioentry_id=c.bioentry_id)
>      JOIN
>      location b
>      ON (b.seqfeature_id=c.seqfeature_id)
> WHERE
>      c.type_term_id=12
>      AND
>      c.bioentry_id=221
>
> This seems to work OK to get the sequence with the provision that one  
> needs to reverse complement the sequence if the strand is minus.
>
> But I don't see anything in the schema that will allow me to identify  
> the gene name or product from the seqfeature_id.
>
> Is gene name or product in the schema somewhere and I've missed it?
>
> Thanks
>
> Mike
>
>
> Michael Muratet, Ph.D.
> Senior Scientist
> HudsonAlpha Institute for Biotechnology
> mmuratet at hudsonalpha.org
> (256) 327-0473 (p)
> (256) 327-0966 (f)
>
> Room 4005
> 601 Genome Way
> Huntsville, Alabama 35806
>
>
>
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>   


From hlapp at drycafe.net  Wed Aug 25 02:47:44 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Tue, 24 Aug 2010 22:47:44 -0400
Subject: [BioSQL-l] Getting gene name, function etc. from biosql
In-Reply-To: <4C72C744.7090501@bham.ac.uk>
References: <803C0F6C-FD55-4AFE-9B7F-A0A749295E70@hudsonalpha.org>
	<4C72C744.7090501@bham.ac.uk>
Message-ID: <E28A4307-3196-47ED-8E24-23CCAB79EBF9@drycafe.net>

Yep - thanks for the helping out!

	-hilmar

On Aug 23, 2010, at 3:08 PM, Nick Loman wrote:

> Hi Michael
>
> You need a join on seqfeature_qualifier_value to get this detail.  
> This table stores feature qualifiers as key/value pairs, with the  
> corresponding key name ('name', 'product', etc.) belonging to the  
> relation 'term', so you'll need to join on that too.
>
> HTH
>
> Cheers
>
> Nick
>
>
> Michael Muratet wrote:
>> Greetings
>>
>> I am working on assembling gene CDS sequences on a medium scale,  
>> e.g.,  for all S. aureus strains, and I'm trying to find a way to  
>> get gene  names from biosql entries I created from Genbank files  
>> with  load_seqdatabase.pl. I'm using a query like this:
>>
>> SELECT
>>     c.seqfeature_id, b.strand, SUBSTR(a.seq, b.start_pos,  
>> b.end_pos- b.start_pos+1) as seq
>> FROM
>>     biosequence a
>>     JOIN
>>     seqfeature c
>>     ON (a.bioentry_id=c.bioentry_id)
>>     JOIN
>>     location b
>>     ON (b.seqfeature_id=c.seqfeature_id)
>> WHERE
>>     c.type_term_id=12
>>     AND
>>     c.bioentry_id=221
>>
>> This seems to work OK to get the sequence with the provision that  
>> one  needs to reverse complement the sequence if the strand is minus.
>>
>> But I don't see anything in the schema that will allow me to  
>> identify  the gene name or product from the seqfeature_id.
>>
>> Is gene name or product in the schema somewhere and I've missed it?
>>
>> Thanks
>>
>> Mike
>>
>>
>> Michael Muratet, Ph.D.
>> Senior Scientist
>> HudsonAlpha Institute for Biotechnology
>> mmuratet at hudsonalpha.org
>> (256) 327-0473 (p)
>> (256) 327-0966 (f)
>>
>> Room 4005
>> 601 Genome Way
>> Huntsville, Alabama 35806
>>
>>
>>
>>
>>
>> _______________________________________________
>> BioSQL-l mailing list
>> BioSQL-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From xupeng86 at gmail.com  Wed Aug 25 03:13:04 2010
From: xupeng86 at gmail.com (=?GB2312?B?0OzF8w==?=)
Date: Wed, 25 Aug 2010 11:13:04 +0800
Subject: [BioSQL-l] BioSQL-l Digest, Vol 76, Issue 5
In-Reply-To: <mailman.3.1282665603.7520.biosql-l@lists.open-bio.org>
References: <mailman.3.1282665603.7520.biosql-l@lists.open-bio.org>
Message-ID: <AANLkTikeipUG6+_pmEjUkpZQ-PdZr-aKWO5tbiU71acS@mail.gmail.com>

Hi, everybody.
I'm trying to split the NCBI COG flat files into mysql database.
Anyone knows if there's already a universal schema that Bioperl can
easily cope with ?
Thanks.


From hlapp at drycafe.net  Wed Aug 25 03:15:18 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Tue, 24 Aug 2010 23:15:18 -0400
Subject: [BioSQL-l] BioSQL-l Digest, Vol 76, Issue 5
In-Reply-To: <AANLkTikeipUG6+_pmEjUkpZQ-PdZr-aKWO5tbiU71acS@mail.gmail.com>
References: <mailman.3.1282665603.7520.biosql-l@lists.open-bio.org>
	<AANLkTikeipUG6+_pmEjUkpZQ-PdZr-aKWO5tbiU71acS@mail.gmail.com>
Message-ID: <4003E289-CBA6-405F-A1BA-505E718511B0@drycafe.net>

BioSQL. Which is presumably why you posted here, right?

	-hilmar

On Aug 24, 2010, at 11:13 PM, ?? wrote:

> Hi, everybody.
> I'm trying to split the NCBI COG flat files into mysql database.
> Anyone knows if there's already a universal schema that Bioperl can
> easily cope with ?
> Thanks.
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================