[BioRuby] NCBI adoption of AGP v2.0 and new qualifiers in GenBank/EMBL
Peter Cock
p.j.a.cock at googlemail.com
Fri Jan 20 10:46:18 UTC 2012
Dear all,
I just spotted this via the @NCBI twitter feed,
http://www.ncbi.nlm.nih.gov/projects/genome/assembly/agp/agp_spec_change.shtml
In addition to the NCBI switch from AGP v1.1 to v2.0, the INSDC have
recently added a new feature type called "assembly_gap", and the
associated qualifiers "gap_type" and "linkage_evidence" to the INSDC
Feature Table Definitons.
Quoting from version 10.0, dated Dec 2011
http://www.insdc.org/documents/feature_table.html#7.2
> Feature Key assembly_gap
>
>
> Definition gap between two components of a CON record that is
> part of a genome assembly;
>
> Mandatory qualifiers /estimated_length=unknown or <integer>
> /gap_type="TYPE"
> /linkage_evidence="TYPE" (Note: Mandatory only if the
> /gap_type is "within scaffold" or "repeat within
> scaffold".If there are multiple types of linkage_evidence
> they will appear as multiple /linkage_evidence="TYPE"
> qualifiers. For all other types of assembly_gap
> features, use of the /linkage_evidence qualifier is
> invalid.)
>
> Comment the location span of the assembly_gap feature for an
> unknown gap is 100 bp, with the 100 bp indicated as
> 100 "n"'s in sequence.
>
i.e. DDBJ, ENA & GenBank flat-files will start to use the "assembly_gap"
features to display information derived from version 2.0 AGP files from
10th Feb 2012. Probably this will affect the XML variants as well.
Unless any of the parsers/writers for GenBank or EMBL flat files use a white
list approach, the new feature key and qualifiers shouldn't cause a problem.
Peter
More information about the BioRuby
mailing list