[Bioperl-l] Indexing CDS file

Heikki Lehvaslaiho heikki.lehvaslaiho at gmail.com
Wed Feb 11 12:44:08 UTC 2009


Dave,

Looks good. Are you going to do the changes in to the EMBL parser?

   -Heikki

2009/2/11 Dave Messina <David.Messina at sbc.su.se>:
> Thanks, Heikki.
>
> I took a closer look at the EBI ftp site where Sviya and I got the file, and
> in their README (ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/README.txt) it
> says:
>
> PA line - contains the accession.version of the "parent" EMBL entry
>           (entry where the CDS is annotated)
>
>
> So, unfortunately they've decided that a CDS record, which has no accession
> of its own, doesn't get its parent's accession number, but gets to refer to
> its parent's accession number via the PA line.
>
> Furthermore, there's an
>
> OX line - contains the NCBI taxid for the organism; taxonomic data are taken
>           from the parent EMBL entries
>
> which is also not part of the the formal spec. (although this one is a more
> worthwhile addition, IMO)
>
> Sooooo, I think we'll need to add support for these.
>
> 'PA' seems easy enough -- the EMBL parser can look for it if there isn't an
> 'AC' line.
>
> As for 'OX', is there a standard slot for a taxonID in a RichSeq SeqFeature
> table? Coming from a Genbank record or a vanilla EMBL record, this is
> normally encoded as
>
> primary tag: source
> tag: db_xref
> value: taxon:9606
>
> right?
>
> Should do the same if we're coming from an EMBL entry, even though it's not
> actually in the feature table?
>
>
> Dave
>
>



-- 
    -Heikki
Heikki Lehvaslaiho - heikki lehvaslaiho gmail com
Sent from: Johannesburg Gauteng South Africa.



More information about the Bioperl-l mailing list