[Biopython-dev] [Bug 3175] Caret in genbank files leads to GenBank Parser crash in Biopython 1.54
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Thu Feb 10 14:05:33 UTC 2011
http://bugzilla.open-bio.org/show_bug.cgi?id=3175
------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2011-02-10 09:05 EST -------
(In reply to comment #7)
> NT_022184.15 is the record containing IGKV2-40 (and the associated caret) in
> my file. What I said about Nucleotide still applies, though.
>
Yes, you're right. My mistake, NT_015926.15 was the last good record.
Had you noticed this was the last gene in this record? It runs right up to
the end of the sequence and beyond (missing the right most end, i.e. the 5'
start of the gene since it is on the reverse strand). From the FTP site:
LOCUS NT_022184 68452323 bp DNA linear CON 28-OCT-2010
DEFINITION Homo sapiens chromosome 2 genomic contig, GRCh37.p2 reference
primary assembly.
...
gene complement(68451760..>68452323)
/gene="IGKV2-40"
/gene_synonym="IGKV240; O11; O11a"
/note="Derived by automated computational analysis using
gene prediction method: Curated Genomic."
/db_xref="GeneID:28916"
/db_xref="HGNC:5789"
/db_xref="IMGT/GENE-DB:IGKV2-40"
V_segment complement(68451760..68452073^68452074)
/gene="IGKV2-40"
/gene_synonym="IGKV240; O11; O11a"
/standard_name="IGKV2-40"
/note="Derived by automated computational analysis using
gene prediction method: Curated Genomic."
/db_xref="GeneID:28916"
CDS complement(<68451760..68452072^68452073)
/gene="IGKV2-40"
/gene_synonym="IGKV240; O11; O11a"
/exception="rearrangement required for product"
/note="Derived by automated computational analysis using
gene prediction method: Curated Genomic."
/codon_start=1
/db_xref="GeneID:28916"
/db_xref="IMGT/LIGM:IGKV2-40"
/db_xref="HGNC:5789"
/db_xref="IMGT/GENE-DB:IGKV2-40"
If we look at the record via Entrez,
http://www.ncbi.nlm.nih.gov/nuccore/NT_022184.15?report=gbwithparts
gene complement(68451760..>68452323)
/gene="IGKV2-40"
/gene_synonym="IGKV240; O11; O11a"
/note="Derived by automated computational analysis using
gene prediction method: Curated Genomic."
/db_xref="GeneID:28916"
/db_xref="HGNC:5789"
/db_xref="IMGT/GENE-DB:IGKV2-40"
V_segment complement(68451760..68452074)
/gene="IGKV2-40"
/gene_synonym="IGKV240; O11; O11a"
/standard_name="IGKV2-40"
/note="Derived by automated computational analysis using
gene prediction method: Curated Genomic."
/db_xref="GeneID:28916"
CDS complement(<68451760..68452073)
/gene="IGKV2-40"
/gene_synonym="IGKV240; O11; O11a"
/exception="rearrangement required for product"
/note="Derived by automated computational analysis using
gene prediction method: Curated Genomic."
/codon_start=1
/db_xref="IMGT/LIGM:IGKV2-40"
/db_xref="GeneID:28916"
/db_xref="HGNC:5789"
/db_xref="IMGT/GENE-DB:IGKV2-40"
So this appears to have been updated to avoid the funny caret location,
but I think they made a mistake - surely the CDS should be
complement(68451760..>68452073) not complement(<68451760..68452073)
as stated?
Have you contacted the NCBI about this? If not, I will.
I believe that the caret location in the FTP GenBank file is invalid and
Biopython is right to reject it (but I would like to confirm this with the
NCBI). For now the simplest solution is for you to manually edit that feature.
Thanks,
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list