[EMBOSS] Getting headers from Seqret
simon andrews (BI)
simon.andrews at bbsrc.ac.uk
Wed Aug 1 12:56:08 UTC 2001
[sent to Emboss mailing list]
Dear All,
I'm having trouble getting header information back through seqret, from a
database formatted using dbiflat against a genbank flat file (refseq
actually). I'm sure plenty of people must have done this before, but I've
read through the documentation, and I can't see where I'm going wrong!
The database formatted OK, and I can fetch sequences back from it, but at
some point I will need to retrieve the entire header from the original file
to get at some of the extra information in there (feature tables, cross
references, authors etc). I've tried several different output USAs with
seqret, but the most I can seem to get back is the name, accession number
and description.
I can't believe that this information is thrown away by seqret (it's still
there in the flat file after all), so how can I retrieve it?
Thanks for any help
Simon
[Potentially useful details follow]
----
Simon Andrews PhD
Bioinformatics Dept
The Babraham Institute
simon.andrews at bbsrc.ac.uk
+44 (0)1223 496463
##########################################################################
Emboss version = 2.0.0
Platform = DEC alpha (OSF1 v4.0)
My emboss.default entry for the database looks like;
DB refseq [
type: N
method: emblcd
format: gb
dir: /usr/users/andrewss/Refseq/Genbank
file: "*.gbff"
release: "1.0"
comment: "Refseq Hum Mus Rat"
]
and an example of the output of seqret with a debug USA is (with the
documentation space suspiciously blank!);
Sequence output trace
=====================
Name: 'NM_031360'
Accession: 'NM_031360'
Description: 'Rattus norvegicus neutral sphingomyelinase (Smpd2), mRNA.'
Type: 'N'
Database: 'refseq'
Full name: ''
Date: ''
Usa: 'debug::test.seq'
Ufo: ''
Input format: 'gb'
Output format: 'debug'
Filename: 'test.seq'
Entryname: 'NM_031360'
File name: 'test.seq'
Extension: 'fasta'
Single: 'No'
Features: 'No'
Count: 'No'
Documentation:...
1 atgaagcaca acttttctct gcggctgagg gttttcaacc tcaactgctg 50
51 ggacatcccc tacctaagca agcatagggc cgaccgcatg aagcgcttgg 100
etc.
The extra stuff I'm after is this sort of thing;
LOCUS NM_031360 1269 bp mRNA ROD 12-JUN-2001
DEFINITION Rattus norvegicus neutral sphingomyelinase (Smpd2), mRNA.
ACCESSION NM_031360
VERSION NM_031360.1 GI:14389300
KEYWORDS .
SOURCE Norway rat.
ORGANISM Rattus norvegicus
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
Euteleostomi;
Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae;
Rattus.
REFERENCE 1 (sites)
AUTHORS Mizutani,Y., Tamiya-Koizumi,K., Irie,F., Hirabayashi,Y., Miwa,M.
and Yoshida,S.
TITLE Cloning and expression of rat neutral sphingomyelinase:
enzymological characterization and identification of essential
histidine residues
JOURNAL Biochim. Biophys. Acta 1485 (2-3), 236-246 (2000)
MEDLINE 20292884
COMMENT PROVISIONAL REFSEQ: This record has not yet been subject to
final
NCBI review. The reference sequence was derived from AB047002.1.
FEATURES Location/Qualifiers
source 1..1269
/organism="Rattus norvegicus"
/strain="Sprague-Dawley"
/db_xref="taxon:10116"
/chromosome="X"
/chromosome="14"
/chromosome="2"
/chromosome="3"
/chromosome="17"
/map="Xq28"
/map="14q"
/map="2 36.0 cM"
/map="Xq11.1"
/map="3"
/map="17q12-q21"
/sex="male"
/tissue_type="liver"
/clone_lib="rat liver lambda cDNA library
(STRATAGENE,#936513)"
gene 1..1269
/gene="Smpd2"
/note="EBS3; EBS4; K14; CK; MAGE5; MAGE10; Tdo; Araf"
/db_xref="LocusID:83537"
/db_xref="MGD:MGI:98246"
/db_xref="MIM:148066"
/db_xref="MIM:300340"
/db_xref="MIM:300343"
/db_xref="MIM:601443"
/db_xref="RATMAP:36372"
/db_xref="RGD:36372"
CDS 1..1269
/gene="Smpd2"
/note="lyso-platelet activating factor-phospholipase C;
cytokeratin 14; Raf related protein;
Synaptosomal-associated protein"
/codon_start=1
/db_xref="LocusID:83537"
/db_xref="MGD:MGI:98246"
/db_xref="MIM:148066"
/db_xref="MIM:300340"
/db_xref="MIM:300343"
/db_xref="MIM:601443"
/db_xref="RATMAP:36372"
/db_xref="RGD:36372"
/product="neutral sphingomyelinase"
/protein_id="NP_112650.1"
/db_xref="GI:14389301"
/translation="MKHNFSLRLRVFNLNCWDIPYLSKHRADRMKRLGDFLNLESFDL
ALLEEVWSEQDFQYLKQKLSLTYPDAHYFRSGIIGSGLCVFSRHPIQEIVQHVYTLNG
YPYKFYHGDWFCGKAVGLLVLHLSGLVLNAYVTHLHAEYSRQKDIYFAHRVAQAWELA
QFIHHTSKKANVVLLCGDLNMHPKDLGCCLLKEWTGLRDAFVETEDFKGSEDGCTMVP
KNCYVSQQDLGPFPFGVRIDYVLYKAVSGFHICCKTLKTTTGCDPHNGTPFSDHEALM
ATLCVKHSPPQEDPCSAHGSAERSALISALREARTELGRGIAQARWWAALFGYVMILG
LSLLVLLCVLAAGEEAREVAIMLWTPSVGLVLGAGAVYLFHKQEAKSLCRAQAEIQHV
LTRTTETQDLGSEPHPTHCRQQEADRAEEK"
misc_feature 91..837
/note="AP_endonucleas1; Region: AP endonuclease family
1"
More information about the EMBOSS
mailing list