From hamish.mcwilliam at bioinfo-user.org.uk  Thu Jan 12 11:49:25 2012
From: hamish.mcwilliam at bioinfo-user.org.uk (Hamish McWilliam)
Date: Thu, 12 Jan 2012 16:49:25 +0000
Subject: [Open-bio-l] OBDA redux?
In-Reply-To: <CAKVJ-_4basnGg4evjCn9tgzX_-k7wMjzUd3SzkFtfEoC_sBitQ@mail.gmail.com>
References: <CAKVJ-_4MBG9mBKjgOugww4TX315oNKvrSMLt34jDC0Ns1Di=FA@mail.gmail.com>
	<CAEBF844.6B69%bonnal@ingm.org>
	<CAKVJ-_4Ek2Yr9oUcNSUj7KTJqL1TVP+wcOS+Xj5dQ+FFhT1-oQ@mail.gmail.com>
	<A2D1AFCA-9353-43B7-A458-B7DDB4001BE9@illinois.edu>
	<6A5077BE-11D6-4E00-8E04-BF3D790B02CB@illinois.edu>
	<CABqDwwLpJYjLkWE0xfnResLcL_GXBX83QgVXdmTJ1Nt5F6CoZA@mail.gmail.com>
	<CAKVJ-_4basnGg4evjCn9tgzX_-k7wMjzUd3SzkFtfEoC_sBitQ@mail.gmail.com>
Message-ID: <CABqDwwJNWWXCEYrsNeJPXUa_N3O69QstVeiPfqxVcU_CyWoo3Q@mail.gmail.com>

Hi Peter,

On 16 December 2011 12:11, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Thu, Dec 15, 2011 at 10:01 PM, Hamish McWilliam
> <hamish.mcwilliam at bioinfo-user.org.uk> wrote:
>>> Just a quick update on this: the old OBDA specs were still in CVS in
>>> the obda-specs module (the old obda site had the module wrong).
>>>?I ran git cvsimport on that after I copied the CVS repo to my laptop,
>>> so it's now on github:
>>>
>>> https://github.com/OBF/OBDA
>>>
>>> We could probably work on updates from there.
>>
>> At the risk of derailing the current thread... a few comments on the
>> "modules" in the old ODBA:
>
> Well, given the broad title of OBDA redux, why not?

Exactly :-)

>> - BioCorba: while CORBA may live on in some embedded applications it
>> has mostly been replaced by SOAP and REST web services. I suspect
>> there are few copies of the BioCorba IDLs surviving today. Possibly of
>> historic interest, but since it doesn't actually include the IDLs it
>> is not really of any use.
>
> As far as I know, BioCorba is defunct.
>
>> - biofetch: originally implemented in EBI's dbfetch, also implemented
>> by BioRuby as biofetch which had a few extensions. EBI's dbfetch has
>> since been reimplemented and attempts to be compatible but only
>> provides partial support along with various extensions, including
>> those from BioRuby. See http://www.ebi.ac.uk/Tools/dbfetch/syntax.jsp.
>> I'm aware of client support in BioPerl, BioRuby and EMBOSS, not sure
>> of the current status for BioJava and BioPython.
>
> Current Biopython doesn't have anything for this, but I would probably
> want to implement this as a client not a server.

While there is a example implementation of a biofetch server in
BioPerl (http://search.cpan.org/~cjfields/BioPerl/examples/db/dbfetch),
it is the client implementations that have been the main focus in the
various projects. In BioPerl: Bio::Biblio, Bio::DB::BioFetch,
Bio::DB::EMBL, Bio::DB::RefSeq and Bio::DB::SwissProt use either
dbfetch or biofetch; in BioRuby: Bio::Fetch provides an interface to
biofetch servers, including the EBI's dbfetch.

>> - BioSQL: as you all know over at http://www.biosql.org/. The document
>> should probably be updated to point there.
>
> Agreed, done:
> https://github.com/OBF/OBDA/commit/5798f0b4a0e3b7fd0595e0ab3017d3afdda53549
>
>> - bioindex: the flat-file and BDB indexing formats. To which the
>> SQLite option will be added?
>
> Basically yes.
>
>> - naming: obsolete URN scheme. Various ontologies (e.g. EDAM) provide
>> possible replacements when required.
>
> This also has implications for the bioindex code as we need to
> specify the file format being indexed (e.g. FASTA or GenBank).

And possibly a layer of semantics for the database and data in the database.

>> - bioregistry: database discovery and meta-data. From having tried to
>> implement this, the bioregisty is too limited in the available
>> meta-data to be very useful, especially when it comes to data format
>> handling. Compare with the database definitions in EMBOSS
>> (http://emboss.sourceforge.net/docs/themes/Databases.html) and the
>> dbfetch meta-data
>> (http://www.ebi.ac.uk/Tools/dbfetch/syntax.jsp#Meta-information).

For the current EMBOSS documentation for the database definitions see
http://emboss.open-bio.org/html/adm/ch04s01.html.

> There was some partial code for this in Biopython, but it was
> deprecated and removed some time ago.

While the bioregistry stuff is conceptually quite useful... The common
format for data services to advertise the data that they provide and
the interfaces which they provide for accessing the data, which has
obvious benefits for client software. The notion of a site describing
its own services in a standardized way, so clients and crawlers can
discover the available data sources at runtime, without the inherent
problems centralized repositories present. But the current
specification is too limited since it does not allow for the
specification of data formats, or database and data semantics. Use of
a richer format and convergence with the equivalent configuration
files in EMBOSS could revive the concept, and make implementing the
client support worthwhile again.

>> - XEmbl: REST and SOAP access to EMBL-Bank entries in XML.
>> The EBI's XEmbl service was replaced by the dbfetch
>> (http://www.ebi.ac.uk/Tools/dbfetch/) and WSDbfetch
>> (http://www.ebi.ac.uk/Tools/webservices/services/dbfetch) services,
>> since these provide roughly the same functionality with wider data
>> format support.
>
> Presumably the XML format for EMBL is now one of the ISNDC
> formats also used at the NCBI? In any case, that whole folder
> is purely describing an (obsolete) EBI service, so can we just
> delete it it?

The XML formats were not described as part of the XEmbl specification,
but instead were external XML formats (BSML and Agave XML) which have
not been adopted. The current XML formats for the INSDC member
databases are in two categories:
1. INSD XML (http://insdc.org/xmlstatus.html)
2. Member database specific formats, for example ENA EMBL-Bank XML
(see http://www.ebi.ac.uk/ena/about/embl_bank_format).

The XEmbl service specification itself is obsolete and can be removed.

>> Since I've been attempting to get dbfetch to support the biofetch and
>> bioregistry specifications, my interest is much more at the web
>> service end of things. I can certainly see options for using the
>> current alternatives in dbfetch and EMBOSS to revise the
>> specifications for biofetch and bioregistry.
>>
>> Hamish
>
> How does biofetch/bioregistry compare to DAS?

biofetch specifies a HTTP GET based interface to data resources. The
databases and data formats available depend on the specific
implementation, and will generally include the main distribution
formats for the database and commonly used formats for the specific
type of data involved, for example EBI's dbfetch provides EMBL-Bank
data in:
- EMBL flatfile format
- EMBL XML
- INSD XML
- Fasta sequence format
- SeqXML

bioregistry describes available databases at a site, providing details
of how to talk to the data source and the relevant parameters required
to access a specific database. For example for EMBL-Bank via dbfetch:

[embl]
protocol=biofetch
location=http://www.ebi.ac.uk/Tools/dbfetch/dbfetch
dbname=embl

DAS is a protocol and set of data formats focused around delivery of
sequence and sequence feature data. A DAS server provides meta-data
about its capabilities and the data available through it, but knows
nothing about other DAS servers. The DAS Registry
(http://www.dasregistry.org/), provides information about registered
DAS servers and addresses this limitation, but is centralized and DAS
specific. Alternative registries (see
http://www.ebi.ac.uk/Tools/webservices/tutorials/05_registries)
address the service type limitation, but still are centralized
resources.

DAS and biofetch are complementary, DAS provides granularity and
mash-up capabilities but biofetch provides original and common data
formats.

bioregistry appears to be unused currently, but aims to provide a
format for sharing information about data services. The possibility
for convergence of this format and database configurations in EMBOSS
and service meta-data such as that provided by dbfetch would simplify
client development and simplify maintenance of database configurations
in supporting systems.

> Separately, I suggest we rename the OBDA/preamble.txt
> file to README (or README.*) so it gets shown in GitHub,
> and then update it following this discussion with some
> context (like dates current status of the different parts).

Sounds good to me.

> We should probably make the old OBDA CVS read only now.

I assume a pointer has been added to the contents of the OBDA CVS to
point to the new location on github, in which case making it read only
would be sensible.

Hamish


From pedrolopes at ua.pt  Mon Jan 16 06:54:43 2012
From: pedrolopes at ua.pt (Pedro Lopes)
Date: Mon, 16 Jan 2012 11:54:43 +0000
Subject: [Open-bio-l] [SWAT4LS] Sponsorship Opportunity for International
 School on Semantic Web Applications & Tools for Life Sciences
Message-ID: <CAMLiNo5xYPCk3fazGj8PTc1-rNq0-aWyRwMiFWA3njqz400WxA@mail.gmail.com>

*Dear sirs,

IEETA/University of Aveiro (http://www.ieeta.pt), in cooperation with the
 SWAT4LS group (http://www.swat4ls.org/), will host the "International
School on Semantic Web Applications and Tools for Life Sciences" between
May 2nd and 5th. This will be a scientific event focused on the practical
learning of new technologies and strategies associated with the Semantic
Web development paradigm. This includes ontology modeling and creation,
performance enhancements, data integration, and LinkedData services,
amongst many others. This event will gather contributions and active
participation from research staff, international project leaders, private
companies and other area experts. For this event, whose access is limited
to 40 seats, the Organizing Committee intends to offer an advanced
knowledge acquisition experience, based on the interactions, in a
privileged get-together environment, between the scientific and business
communities.

The Organizing Committee would like to have your company's official
sponsorship, which would undoubtedly contribute to this event's credibility
as a discussion forum for the various stakeholders involved in the
Portuguese innovative biomedical technologies scene. Your sponsorship can
be materialized through distinct contributions, namely through monetary
support for meals (lunches, coffe breaks, gala dinner) or speakers
(travelling, accommodation). Additionally, we can also schedule a
presentation slot for your company, where you can highlight your products,
introducing them to a highly qualified and interested audience. You can
find further information regarding available sponsorship in the attached
file (SWAT4LSAveiro_Sponsorship). If you wish to sponsor this event, your
company's logo will be included in all event dissemination materials,
namely posters, web site, mail and email, for which we ask for your
explicit authorization as well as image usage rules.*

All contacts to conference organizers should be forwarded to:
SWAT4LS Aveiro 2012
A/c Pedro Lopes
IEETA
Campus Universit?rio de Santiago
3810-193 Aveiro

email: schools at swat4ls.org
web: http://www.swat4ls.org/schools/aveiro2012/
tel. 234 370 500

We look forward to hearing from you,

Best regards,
Pedro
@pedrolopes <https://twitter.com/#!/pedrolopes>
Bioinformatics Research & Development | http://bioinformatics.ua.pt
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SWAT4LSAveiro_Promo.pdf
Type: application/pdf
Size: 336956 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/open-bio-l/attachments/20120116/12a24aae/attachment-0002.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SWAT4LSAveiro_Sponsorship.pdf
Type: application/pdf
Size: 327284 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/open-bio-l/attachments/20120116/12a24aae/attachment-0003.pdf>

From p.j.a.cock at googlemail.com  Fri Jan 20 05:46:18 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 20 Jan 2012 10:46:18 +0000
Subject: [Open-bio-l] NCBI adoption of AGP v2.0 and new qualifiers in
	GenBank/EMBL
Message-ID: <CAKVJ-_68TGz776oFFDtTNwtS3L6EDJjiW2cZDQN=-rTSo0_3OQ@mail.gmail.com>

Dear all,

I just spotted this via the @NCBI twitter feed,
http://www.ncbi.nlm.nih.gov/projects/genome/assembly/agp/agp_spec_change.shtml

In addition to the NCBI switch from AGP v1.1 to v2.0, the INSDC have
recently added a new feature type called "assembly_gap", and the
associated qualifiers "gap_type" and "linkage_evidence" to the INSDC
Feature Table Definitons.

Quoting from version 10.0, dated Dec 2011
http://www.insdc.org/documents/feature_table.html#7.2
> Feature Key           assembly_gap
>
>
> Definition            gap between two components of a CON record that is
> 		      part of a genome assembly;
>
> Mandatory qualifiers  /estimated_length=unknown or <integer>
> 		      /gap_type="TYPE"
>                       /linkage_evidence="TYPE" (Note: Mandatory only if the
>                       /gap_type is "within scaffold" or "repeat within
>                       scaffold".If there are multiple types of linkage_evidence
>                       they will appear as multiple /linkage_evidence="TYPE"
>                       qualifiers. For all other types of assembly_gap
>                       features, use of the /linkage_evidence qualifier is
>                       invalid.)
>
> Comment               the location span of the assembly_gap feature for an
> 		      unknown gap is 100 bp, with the 100 bp indicated as
> 		      100 "n"'s in sequence.
>

i.e. DDBJ, ENA & GenBank flat-files will start to use the "assembly_gap"
features to display information derived from version 2.0 AGP files from
10th Feb 2012. Probably this will affect the XML variants as well.

Unless any of the parsers/writers for GenBank or EMBL flat files use a white
list approach, the new feature key and qualifiers shouldn't cause a problem.

Peter

From hamish.mcwilliam at bioinfo-user.org.uk  Thu Jan 12 16:49:25 2012
From: hamish.mcwilliam at bioinfo-user.org.uk (Hamish McWilliam)
Date: Thu, 12 Jan 2012 16:49:25 +0000
Subject: [Open-bio-l] OBDA redux?
In-Reply-To: <CAKVJ-_4basnGg4evjCn9tgzX_-k7wMjzUd3SzkFtfEoC_sBitQ@mail.gmail.com>
References: <CAKVJ-_4MBG9mBKjgOugww4TX315oNKvrSMLt34jDC0Ns1Di=FA@mail.gmail.com>
	<CAEBF844.6B69%bonnal@ingm.org>
	<CAKVJ-_4Ek2Yr9oUcNSUj7KTJqL1TVP+wcOS+Xj5dQ+FFhT1-oQ@mail.gmail.com>
	<A2D1AFCA-9353-43B7-A458-B7DDB4001BE9@illinois.edu>
	<6A5077BE-11D6-4E00-8E04-BF3D790B02CB@illinois.edu>
	<CABqDwwLpJYjLkWE0xfnResLcL_GXBX83QgVXdmTJ1Nt5F6CoZA@mail.gmail.com>
	<CAKVJ-_4basnGg4evjCn9tgzX_-k7wMjzUd3SzkFtfEoC_sBitQ@mail.gmail.com>
Message-ID: <CABqDwwJNWWXCEYrsNeJPXUa_N3O69QstVeiPfqxVcU_CyWoo3Q@mail.gmail.com>

Hi Peter,

On 16 December 2011 12:11, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Thu, Dec 15, 2011 at 10:01 PM, Hamish McWilliam
> <hamish.mcwilliam at bioinfo-user.org.uk> wrote:
>>> Just a quick update on this: the old OBDA specs were still in CVS in
>>> the obda-specs module (the old obda site had the module wrong).
>>>?I ran git cvsimport on that after I copied the CVS repo to my laptop,
>>> so it's now on github:
>>>
>>> https://github.com/OBF/OBDA
>>>
>>> We could probably work on updates from there.
>>
>> At the risk of derailing the current thread... a few comments on the
>> "modules" in the old ODBA:
>
> Well, given the broad title of OBDA redux, why not?

Exactly :-)

>> - BioCorba: while CORBA may live on in some embedded applications it
>> has mostly been replaced by SOAP and REST web services. I suspect
>> there are few copies of the BioCorba IDLs surviving today. Possibly of
>> historic interest, but since it doesn't actually include the IDLs it
>> is not really of any use.
>
> As far as I know, BioCorba is defunct.
>
>> - biofetch: originally implemented in EBI's dbfetch, also implemented
>> by BioRuby as biofetch which had a few extensions. EBI's dbfetch has
>> since been reimplemented and attempts to be compatible but only
>> provides partial support along with various extensions, including
>> those from BioRuby. See http://www.ebi.ac.uk/Tools/dbfetch/syntax.jsp.
>> I'm aware of client support in BioPerl, BioRuby and EMBOSS, not sure
>> of the current status for BioJava and BioPython.
>
> Current Biopython doesn't have anything for this, but I would probably
> want to implement this as a client not a server.

While there is a example implementation of a biofetch server in
BioPerl (http://search.cpan.org/~cjfields/BioPerl/examples/db/dbfetch),
it is the client implementations that have been the main focus in the
various projects. In BioPerl: Bio::Biblio, Bio::DB::BioFetch,
Bio::DB::EMBL, Bio::DB::RefSeq and Bio::DB::SwissProt use either
dbfetch or biofetch; in BioRuby: Bio::Fetch provides an interface to
biofetch servers, including the EBI's dbfetch.

>> - BioSQL: as you all know over at http://www.biosql.org/. The document
>> should probably be updated to point there.
>
> Agreed, done:
> https://github.com/OBF/OBDA/commit/5798f0b4a0e3b7fd0595e0ab3017d3afdda53549
>
>> - bioindex: the flat-file and BDB indexing formats. To which the
>> SQLite option will be added?
>
> Basically yes.
>
>> - naming: obsolete URN scheme. Various ontologies (e.g. EDAM) provide
>> possible replacements when required.
>
> This also has implications for the bioindex code as we need to
> specify the file format being indexed (e.g. FASTA or GenBank).

And possibly a layer of semantics for the database and data in the database.

>> - bioregistry: database discovery and meta-data. From having tried to
>> implement this, the bioregisty is too limited in the available
>> meta-data to be very useful, especially when it comes to data format
>> handling. Compare with the database definitions in EMBOSS
>> (http://emboss.sourceforge.net/docs/themes/Databases.html) and the
>> dbfetch meta-data
>> (http://www.ebi.ac.uk/Tools/dbfetch/syntax.jsp#Meta-information).

For the current EMBOSS documentation for the database definitions see
http://emboss.open-bio.org/html/adm/ch04s01.html.

> There was some partial code for this in Biopython, but it was
> deprecated and removed some time ago.

While the bioregistry stuff is conceptually quite useful... The common
format for data services to advertise the data that they provide and
the interfaces which they provide for accessing the data, which has
obvious benefits for client software. The notion of a site describing
its own services in a standardized way, so clients and crawlers can
discover the available data sources at runtime, without the inherent
problems centralized repositories present. But the current
specification is too limited since it does not allow for the
specification of data formats, or database and data semantics. Use of
a richer format and convergence with the equivalent configuration
files in EMBOSS could revive the concept, and make implementing the
client support worthwhile again.

>> - XEmbl: REST and SOAP access to EMBL-Bank entries in XML.
>> The EBI's XEmbl service was replaced by the dbfetch
>> (http://www.ebi.ac.uk/Tools/dbfetch/) and WSDbfetch
>> (http://www.ebi.ac.uk/Tools/webservices/services/dbfetch) services,
>> since these provide roughly the same functionality with wider data
>> format support.
>
> Presumably the XML format for EMBL is now one of the ISNDC
> formats also used at the NCBI? In any case, that whole folder
> is purely describing an (obsolete) EBI service, so can we just
> delete it it?

The XML formats were not described as part of the XEmbl specification,
but instead were external XML formats (BSML and Agave XML) which have
not been adopted. The current XML formats for the INSDC member
databases are in two categories:
1. INSD XML (http://insdc.org/xmlstatus.html)
2. Member database specific formats, for example ENA EMBL-Bank XML
(see http://www.ebi.ac.uk/ena/about/embl_bank_format).

The XEmbl service specification itself is obsolete and can be removed.

>> Since I've been attempting to get dbfetch to support the biofetch and
>> bioregistry specifications, my interest is much more at the web
>> service end of things. I can certainly see options for using the
>> current alternatives in dbfetch and EMBOSS to revise the
>> specifications for biofetch and bioregistry.
>>
>> Hamish
>
> How does biofetch/bioregistry compare to DAS?

biofetch specifies a HTTP GET based interface to data resources. The
databases and data formats available depend on the specific
implementation, and will generally include the main distribution
formats for the database and commonly used formats for the specific
type of data involved, for example EBI's dbfetch provides EMBL-Bank
data in:
- EMBL flatfile format
- EMBL XML
- INSD XML
- Fasta sequence format
- SeqXML

bioregistry describes available databases at a site, providing details
of how to talk to the data source and the relevant parameters required
to access a specific database. For example for EMBL-Bank via dbfetch:

[embl]
protocol=biofetch
location=http://www.ebi.ac.uk/Tools/dbfetch/dbfetch
dbname=embl

DAS is a protocol and set of data formats focused around delivery of
sequence and sequence feature data. A DAS server provides meta-data
about its capabilities and the data available through it, but knows
nothing about other DAS servers. The DAS Registry
(http://www.dasregistry.org/), provides information about registered
DAS servers and addresses this limitation, but is centralized and DAS
specific. Alternative registries (see
http://www.ebi.ac.uk/Tools/webservices/tutorials/05_registries)
address the service type limitation, but still are centralized
resources.

DAS and biofetch are complementary, DAS provides granularity and
mash-up capabilities but biofetch provides original and common data
formats.

bioregistry appears to be unused currently, but aims to provide a
format for sharing information about data services. The possibility
for convergence of this format and database configurations in EMBOSS
and service meta-data such as that provided by dbfetch would simplify
client development and simplify maintenance of database configurations
in supporting systems.

> Separately, I suggest we rename the OBDA/preamble.txt
> file to README (or README.*) so it gets shown in GitHub,
> and then update it following this discussion with some
> context (like dates current status of the different parts).

Sounds good to me.

> We should probably make the old OBDA CVS read only now.

I assume a pointer has been added to the contents of the OBDA CVS to
point to the new location on github, in which case making it read only
would be sensible.

Hamish


From pedrolopes at ua.pt  Mon Jan 16 11:54:43 2012
From: pedrolopes at ua.pt (Pedro Lopes)
Date: Mon, 16 Jan 2012 11:54:43 +0000
Subject: [Open-bio-l] [SWAT4LS] Sponsorship Opportunity for International
 School on Semantic Web Applications & Tools for Life Sciences
Message-ID: <CAMLiNo5xYPCk3fazGj8PTc1-rNq0-aWyRwMiFWA3njqz400WxA@mail.gmail.com>

*Dear sirs,

IEETA/University of Aveiro (http://www.ieeta.pt), in cooperation with the
 SWAT4LS group (http://www.swat4ls.org/), will host the "International
School on Semantic Web Applications and Tools for Life Sciences" between
May 2nd and 5th. This will be a scientific event focused on the practical
learning of new technologies and strategies associated with the Semantic
Web development paradigm. This includes ontology modeling and creation,
performance enhancements, data integration, and LinkedData services,
amongst many others. This event will gather contributions and active
participation from research staff, international project leaders, private
companies and other area experts. For this event, whose access is limited
to 40 seats, the Organizing Committee intends to offer an advanced
knowledge acquisition experience, based on the interactions, in a
privileged get-together environment, between the scientific and business
communities.

The Organizing Committee would like to have your company's official
sponsorship, which would undoubtedly contribute to this event's credibility
as a discussion forum for the various stakeholders involved in the
Portuguese innovative biomedical technologies scene. Your sponsorship can
be materialized through distinct contributions, namely through monetary
support for meals (lunches, coffe breaks, gala dinner) or speakers
(travelling, accommodation). Additionally, we can also schedule a
presentation slot for your company, where you can highlight your products,
introducing them to a highly qualified and interested audience. You can
find further information regarding available sponsorship in the attached
file (SWAT4LSAveiro_Sponsorship). If you wish to sponsor this event, your
company's logo will be included in all event dissemination materials,
namely posters, web site, mail and email, for which we ask for your
explicit authorization as well as image usage rules.*

All contacts to conference organizers should be forwarded to:
SWAT4LS Aveiro 2012
A/c Pedro Lopes
IEETA
Campus Universit?rio de Santiago
3810-193 Aveiro

email: schools at swat4ls.org
web: http://www.swat4ls.org/schools/aveiro2012/
tel. 234 370 500

We look forward to hearing from you,

Best regards,
Pedro
@pedrolopes <https://twitter.com/#!/pedrolopes>
Bioinformatics Research & Development | http://bioinformatics.ua.pt
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SWAT4LSAveiro_Promo.pdf
Type: application/pdf
Size: 336956 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/open-bio-l/attachments/20120116/12a24aae/attachment-0004.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SWAT4LSAveiro_Sponsorship.pdf
Type: application/pdf
Size: 327284 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/open-bio-l/attachments/20120116/12a24aae/attachment-0005.pdf>

From p.j.a.cock at googlemail.com  Fri Jan 20 10:46:18 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 20 Jan 2012 10:46:18 +0000
Subject: [Open-bio-l] NCBI adoption of AGP v2.0 and new qualifiers in
	GenBank/EMBL
Message-ID: <CAKVJ-_68TGz776oFFDtTNwtS3L6EDJjiW2cZDQN=-rTSo0_3OQ@mail.gmail.com>

Dear all,

I just spotted this via the @NCBI twitter feed,
http://www.ncbi.nlm.nih.gov/projects/genome/assembly/agp/agp_spec_change.shtml

In addition to the NCBI switch from AGP v1.1 to v2.0, the INSDC have
recently added a new feature type called "assembly_gap", and the
associated qualifiers "gap_type" and "linkage_evidence" to the INSDC
Feature Table Definitons.

Quoting from version 10.0, dated Dec 2011
http://www.insdc.org/documents/feature_table.html#7.2
> Feature Key           assembly_gap
>
>
> Definition            gap between two components of a CON record that is
> 		      part of a genome assembly;
>
> Mandatory qualifiers  /estimated_length=unknown or <integer>
> 		      /gap_type="TYPE"
>                       /linkage_evidence="TYPE" (Note: Mandatory only if the
>                       /gap_type is "within scaffold" or "repeat within
>                       scaffold".If there are multiple types of linkage_evidence
>                       they will appear as multiple /linkage_evidence="TYPE"
>                       qualifiers. For all other types of assembly_gap
>                       features, use of the /linkage_evidence qualifier is
>                       invalid.)
>
> Comment               the location span of the assembly_gap feature for an
> 		      unknown gap is 100 bp, with the 100 bp indicated as
> 		      100 "n"'s in sequence.
>

i.e. DDBJ, ENA & GenBank flat-files will start to use the "assembly_gap"
features to display information derived from version 2.0 AGP files from
10th Feb 2012. Probably this will affect the XML variants as well.

Unless any of the parsers/writers for GenBank or EMBL flat files use a white
list approach, the new feature key and qualifiers shouldn't cause a problem.

Peter