EMBOSS 1.10.0
ableasby at hgmp.mrc.ac.uk
ableasby at hgmp.mrc.ac.uk
Sun Feb 18 16:07:32 UTC 2001
EMBOSS 1.10.0
This release contains several new applications, some which are still
under active development. We hope to provide some of the data files
referred to on our ftp server soon.
MARSCAN
Matrix/scaffold attachment regions (MARs/SARs) are genomic elements
thought to delineate the structural and functional organisation of the
eukaryotic genome. Originally, MARs and SARs were identified through
their ability to bind to the nuclear matrix or scaffold. Binding
cannot be assigned to a unique sequence element, but is dispersed over
a region of several hundred base pairs. These elements are found
flanking a gene or a small cluster of genes and are located often in
the vicinity of cis-regulatory sequences. This has led to the
suggestion that they contribute to higher order regulation of
transcription by defining boundaries of independently controlled
chromatin domains. There is indirect evidence to support this notion.
In transgenic experiments MARs/SARs dampen position effects by
shielding the transgene from the effects of the chromatin structure at
the site of integration. Furthermore, they may act as boundary
elements for enhancers, restricting their long range effect to only
the promoters that are located in the same chromatin domain.
marscan finds a bipartite sequence element that is unique for a large
group of eukaryotic MARs/SARs. This MAR/SAR recognition signature
(MRS) comprises two individual sequence elements that are <200 bp
apart and may be aligned on positioned nucleosomes in MARs. The MRS
can be used to correctly predict the position of MARs/SARs in plants
and animals, based on genomic DNA sequence information alone.
Experimental evidence from the analysis of >300 kb of sequence data
from several eukaryotic organisms show that wherever a MRS is observed
in the DNA sequence, the corresponding genomic fragment is a
biochemically identifiable SAR.
The MRS is a bipartite sequence element that consists of two
individual sequences of 8 (AATAAYAA) and 16 bp (AWWRTAANNWWGNNNC)
within a 200 bp distance from each other. One mismatch is allowed in
the 16 bp pattern. The patterns can occur on either strand of the DNA
with respect to each other.
Not all SARs contain a MRS. Analysis of >300 kb of genomic sequence
from a variety of eukaryotic organisms shows that the MRS faithfully
predicts 80% of MARs and SARs, suggesting that at least one other type
of MAR/SAR may exist which does not contain a MRS.
SCOPE
scope parses the scop classification file available at
http://scop.mrc-lmb.cam.ac.uk/scop/search.cgi?dir=lin and writes the
scop classification to an embl-like format file. This file
(Escop.dat) should be placed in the emboss/data directory.
NRSCOPE
nrscope parses the embl-like format scop classification file generated
by the EMBOSS application scope, and writes in the same format a file
of non-redundant domains. The format of these files is explained in
the scope documentation. The current version of nrscope removes
redundancy at the level of the scop family, i.e. entries belonging to
the same family will be non-redundant.
DOMAINER
domainer parses an embl-like format scop classification file generated
by the EMBOSS applications scope or nrscope, and clean protein
coordinate files generated by the coorde application (not currently in
emboss, email Jon Ison jison at hgmp.mrc.ac.uk) and writes, for each
domain in the scop classification file, clean domain coordinate files
in embl-like and pdb formats . Each of these files contains
coordinates for a single scop domain.
STAMPS (under development)
stamps parses an embl-like format scop classification file generated
by the EMBOSS applications scope or nrscope, and calls stamp to
generate structural alignments for each SCOP family. It is still
under active development. You have to "make stamp" in the
applications directory to create "stamps".
Developers Notes
1. Most C datatypes have changed in the libraries. This is a prelude
to getting true 64 bit operation. Notably ints are now "ajint"s and
longs are now "ajlong"s. An ajint can be equal in size to an ajlong
depending on the hardware; however, an ajlong should be used
whenever a 64 bit int might be used.
2. The function ajFmtScanS has been added. This can be regarded as
the EMBOSS version of the C function sscanf and operates
similarly. It has several extensions, particularly %S is used
for dynamically allocated string objects (AjPStr).
This function makes reading data files considerably easier and
many applications will be rewritten to use it rather than having
to rely on tokenisation.
As usual I've probably forgotten to mention some things and my colleagues
will no doubt correct any oversights.
Alan
More information about the EMBOSS
mailing list