From p.j.a.cock at googlemail.com Fri Jan 20 05:46:18 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 20 Jan 2012 10:46:18 +0000 Subject: [emboss-dev] NCBI adoption of AGP v2.0 and new qualifiers in GenBank/EMBL Message-ID: Dear all, I just spotted this via the @NCBI twitter feed, http://www.ncbi.nlm.nih.gov/projects/genome/assembly/agp/agp_spec_change.shtml In addition to the NCBI switch from AGP v1.1 to v2.0, the INSDC have recently added a new feature type called "assembly_gap", and the associated qualifiers "gap_type" and "linkage_evidence" to the INSDC Feature Table Definitons. Quoting from version 10.0, dated Dec 2011 http://www.insdc.org/documents/feature_table.html#7.2 > Feature Key assembly_gap > > > Definition gap between two components of a CON record that is > part of a genome assembly; > > Mandatory qualifiers /estimated_length=unknown or > /gap_type="TYPE" > /linkage_evidence="TYPE" (Note: Mandatory only if the > /gap_type is "within scaffold" or "repeat within > scaffold".If there are multiple types of linkage_evidence > they will appear as multiple /linkage_evidence="TYPE" > qualifiers. For all other types of assembly_gap > features, use of the /linkage_evidence qualifier is > invalid.) > > Comment the location span of the assembly_gap feature for an > unknown gap is 100 bp, with the 100 bp indicated as > 100 "n"'s in sequence. > i.e. DDBJ, ENA & GenBank flat-files will start to use the "assembly_gap" features to display information derived from version 2.0 AGP files from 10th Feb 2012. Probably this will affect the XML variants as well. Unless any of the parsers/writers for GenBank or EMBL flat files use a white list approach, the new feature key and qualifiers shouldn't cause a problem. Peter From p.j.a.cock at googlemail.com Fri Jan 20 10:46:18 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 20 Jan 2012 10:46:18 +0000 Subject: [emboss-dev] NCBI adoption of AGP v2.0 and new qualifiers in GenBank/EMBL Message-ID: Dear all, I just spotted this via the @NCBI twitter feed, http://www.ncbi.nlm.nih.gov/projects/genome/assembly/agp/agp_spec_change.shtml In addition to the NCBI switch from AGP v1.1 to v2.0, the INSDC have recently added a new feature type called "assembly_gap", and the associated qualifiers "gap_type" and "linkage_evidence" to the INSDC Feature Table Definitons. Quoting from version 10.0, dated Dec 2011 http://www.insdc.org/documents/feature_table.html#7.2 > Feature Key assembly_gap > > > Definition gap between two components of a CON record that is > part of a genome assembly; > > Mandatory qualifiers /estimated_length=unknown or > /gap_type="TYPE" > /linkage_evidence="TYPE" (Note: Mandatory only if the > /gap_type is "within scaffold" or "repeat within > scaffold".If there are multiple types of linkage_evidence > they will appear as multiple /linkage_evidence="TYPE" > qualifiers. For all other types of assembly_gap > features, use of the /linkage_evidence qualifier is > invalid.) > > Comment the location span of the assembly_gap feature for an > unknown gap is 100 bp, with the 100 bp indicated as > 100 "n"'s in sequence. > i.e. DDBJ, ENA & GenBank flat-files will start to use the "assembly_gap" features to display information derived from version 2.0 AGP files from 10th Feb 2012. Probably this will affect the XML variants as well. Unless any of the parsers/writers for GenBank or EMBL flat files use a white list approach, the new feature key and qualifiers shouldn't cause a problem. Peter