[EMBOSS] seqret/entret problems using acc from ensembl-embl
    Peter Rice 
    pmr at ebi.ac.uk
       
    Mon Nov 20 15:52:31 UTC 2006
    
    
  
Hi David,
> I think that the format used by Ensembl for assigning IDs and ACCs is
> causing the problems. For example the first entry from the flat file:
> 
> claudia at pc-31-18-86-200:~> head
> /local/bioinfo/db/ensembl/embl/Homo_sapiens.0.dat
> ID   1    standard; DNA; HTG; 970768 BP.
> XX
> AC   chromosome:NCBI36:1:1000001:1970768:1
> XX
> SV   chromosome:NCBI36:1:1000001:1970768:1
> XX
> DT   5-OCT-2006
> XX
> DE   Homo sapiens chromosome 1 NCBI36 partial sequence 1000001..1970768
> DE   annotated by Ensembl
> 
> I tried replacing the ":" character of the AC line with a "_" using sed
> but after indexing and I get the same error message with seqret or
> entret. Is there any length limit for IDs or ACCs in EMBOSS? Is there
> any workaround for this problem?
Those IDs are horrible and not really EMBL format... certainly not valid 
accession numbers.
We will add an ENSEMBL format for the next release... as a sequence format and 
as a format for dbiflat and the (preferred) dbxflat.
Hope that helps,
Peter
    
    
More information about the EMBOSS
mailing list