Version 4.1.0
	A new ACD attribute outputmodifier: "Y" identifies qualifiers that
	cause the kinds of output changes that can break parsers. An
	obvious example is the -html qualifier in many of the utility
	programs. This attribute is a warning to wrapper developers and
	maintainers that they may want to fix the value of this qualifier
	and not allow users to change it. In some cases (as with toggle
	qualifiers) it may be useful to wrap each possible value
	separately. For example, tfm can run as an HTML version (-html)
	and a text version (-nohtml -nomore).

	Backtranseq now keeps stop positions in the sequence and replaces
	them with the most common stop codon. Previous releases converted
	stops to 'X' and back translated them as 'NNN'.

	Reading sequences in NBRF (or PIR) format now only removes one '*'
	from the end, allowing protein sequences to end with a stop codon.

	Reading NBRF format sequences in FASTA format was retaining a ';'
	in front of the sequence ID. This is now fixed.

	Pattern files and regular expression files now use the -pformat
	and -pname associated qualifiers which were ignored when they
	first appeared in 4.0.0. Pattern file formats are "fasta" for the
	original format in 4.0.0 with FASTA style identifiers, and
	"simple" for files with a single pattern on each line. The format
	defaults to testing the first character for a '>'. The pattern
	name is used to set a name of "name1", "name2" and so on if no
	name is in the FASTA file. By default patterns are called
	pattern1, regular expressions are called "regex1".

	Added a new function to read from a buffered file and trim
	newlines. It was not needed before because input functions were
	doing their own trimming.

	Valgrind memory leak tests now cover all QA tests. The command
	line is captured and used to generate test cases. Script
	valgrind.pl knows about the few cases that need input files copied
	and preprocesses them by name. A few tests can be flagged as
	ignored. This is intended for tests known to run for a very long
	time under valgrind. Memory leaks are fixed for all programs in
	the main EMBOSS package and for the most used ones in the EMBRASSY
	packages.

	A new environment variable ACDCOMMANDLINELOG takes a filename as
	its value. This saves the command line equivalent of a program
	run, converting user responses to prompts into their command line
	equivalents. A number of bugs in command line saving for report
	headers were identifier and fixed.

	Two string functions had their names reversed. ajStrRemoveWhite is
	to remove all white space from a string, ajStrRemoveWhiteExcess is
	to remove white space from the ends and replace internal
	whitespace with single spaces. When function names were
	standardized these names were reversed. As function calls were
	converted automatically EMBOSS code worked as before, but
	developers will notice the functions to not behave as
	expected. This is now corrected, and all existing calls in the
	EMBOSS code have been checked and converted.
	
	Showseq with a sequence end position now stops output at the end
	of the user-specified range, Previous releases printed the whole
	of the line with the last base/residue.
	
	SRS servers use "gid" as the field name for GI numbers. The field
	name has been changed to allow GI searches with local SRS and
	remote SRSWWW access to Genbank.

	A new configure option for developers --enable-devwarnings
	turns on many more warning messages from the gcc compiler. Not all
	warnings are useful - the less useful gcc options are documented
	(and commented out) in the configure.in file devwarnings section.
	Warnings include missing function prototypes, signed/unsigned
	comparisons, potential loss of precision in casts, use of global
	names (index for example) as variables.

	Function names in ajseqwrite.c have been standardised. Old names are
	still accepted but are marked as "deprecated" and will generate
	warnings with the gcc compiler (see ajstr below). Other compilers
	will see no difference.

	Edialign is a new application, a port of the DIALIGN2 program by
	B. Morgenstern, using an ACD file written by Guy Bottu.  It takes
	as input nucleic acid or protein sequences and produces as output
	a multiple sequence alignment. The sequences need not be similar
	over their complete length, since the program constructs
	alignments from gapfree pairs of similar segments of the
	sequences.
	
	Wordfinder is a new application to find word-based matches of
	limited size. It is based on code from supermatcher. The inputs
	are reversed so the query sequence set (unaligned) is compared to
	a streamed database of sequences. (Supermatcher should perhaps
	have its inputs in this order too). Limits are provided for the
	length of the word match and the length of the alignment. The
	default gap penalties are also increased to limit the gaps allowed
	in alignment.

	Word-based algorithms found too many matches where both sequences
	contains runs of X (protein) or N (nucleotide). These are now
	ignored when building the word table.

	Word-based algorithms complained if a sequence was shorter than
	the wordsize. This was a problem for database searches with some
	short sequences present. They now run silently and simply return
	no word matches.

	The EMBL format sequence entry parser was able to read swissprot
	sequence data, but not the feature table. Efficiency improvements
	to set the sequence type to nucleotide for EMBL entries showed
	that swissprot entries were being read by the EMBL parser. A test
	for swissprot protein information on the ID line should redifrect
	these entries to the swissprot parser. In previous release the
	seuqnece type was not set, so there was no problem with the
	sequence type - although feature lines may not have been readable
	form swissprot format flat files. Database definitions specify the
	swiss or embl format so they are not affected.

	Large sequences were running very slowly. This was traced to the
	way sequence types are tested using regular expressions processed
	by calls to the PCRE library. These calls were replaced by simple
	string functions as they are only testing that a sequence is
	entirely composed of characteres from an allowed set. An
	additional speedup was achieved by defining only upper case
	characters as required (almost halving the number of tests) and
	testing the upper case version of the sequence characters.

	Sequence translation in the reverse direction adds extra amino
	acids for partial codons. In the forward direction the overhang
	was miscalculated so these codons were missed. No users have
	complained, probably because in most cases they are translated as
	'X' (it needs a 4-base wobble in the code to convert the first 2
	bases of a codon into a single amino acid).

	Sequence translation was relatively slow, at least on very large
	sequences. Profiling with gprof indicated some changed to reduce
	the number of string handling calls (each was very fast, but
	there was a very large number of calls. The internal tables were
	resized (from 15 elements to 16) for more efficient mapping.

	Parsing NCBI format ID lines saves the database. This is available
	for writing NCBI formatted output ID lines, but is not to be used
	in reporting the USA.

	Added "refseq" as a sequence and feature format. Initially a
	simple alias of GenBank but we may let them diverge later.

	REFSEQ entries have their own idea of what a ProteinID in the
	feature table looks like, as they use REFSEQP protein IDs.
	Validation now allows the third character to be an underscore.

	Large numbers of database files could make the dbi indexing
	programs (dbiflat, dbifasta, dbigcg, dbiblast) fail at the sort
	merge stage when the index files are combined. The sort merge is
	now in 2 steps to limit the number of open files required in the
	system sort utility.

	Added a script emblsplit.pl to split EMBL and UniProt database files
	into 2Gbyte chunks.

	The -sid qualifier now overwrites the sequence id if used. The
	-sid value will be used for creating the output filename and for
	reporting the sequence identifier in output files. For more than
	one sequence as input currently the same ID is used. We may change
	this in future to generate new IDs from this base name.

	New sequence format gifasta is the same as "ncbi" but uses the GI
	number as the identifier. Because the output is the same for both
	formats we have to require -sformat gifasta to be on the
	commandline. The default for such files will remain "ncbi" as the
	automatically processed format. On output if there is no GI number
	a dummy value of "000000" is currently used.

	coderet now writes non-coding sequence to a new output file.

	New feature function ajFeatLocMark marks selected features as
	lower case. Used by coderet to report non-coding regions.

	The help output now correctly reports output sequence default
	filenames.

	Phylip input distance matrices now allow integer values to be
	treated as reals, although there is a possible confusion over
	integer replicate values so the use of a trailing ".0" is strongly
	recommended.
	
	Sequences with NCBI deflines and no ID after the final "|" were
	using the version part of the seqversion ("1" from "AB123456.1")
	instead of the "AB123456" part to set the ID.

	Graph titles were not standard on the general "graph" type output,
	but are consistent for xygraph outputs. A new attribute gdesc
	defines a prefix for graph titles which can be appended to by the
	calling program, usually with a description of the input (sequence
	USA, input filename). A new call ajGraphSetTitlePlus defines the
	text to add to the gdesc as "[gdesc] of [text]". All graphs were
	standardized except pepinfo which has 10 subplot titles already in
	the intended format. This will be corrected later to have standard
	main titles and shorter subplot titles.

	The version of plplot we use has a bug in calculating character
	sizes where the origin in user units is not the default of
	(0,0). This has been fixed in the plgchrW and plstrlW functions in
	the copy that is included with EMBOSS.

	Dreg and preg ignored sequence begin and end positions. Both
	programs now use the embpatlist function calls to process sequence
	ranges.

	Fuzznuc, fuzzpro and fuzztran lost the ability to use the sequence
	begin and end positions when we switched to pattern lists. This
	has been restored in the pattern list processing code.
	
	The logfile caused a file close error if it was read only (because
	it had not been successfully opened). Opening the logfile now
	tests the file is writable and ignores logging for a read-only file.

	More case-sensitive sequence comparison and matching functions
	added to be consistent about providing both versions.

	A few sequence databases have no accession number. For these a new
	database attribute hasaccession: "N" in emboss.default prevents
	EMBOSS trying to search the ACC field in addition to the ID field.

	A few databases with duplicate IDs should be treated as
	case-sensitive. The original example was a pdbprot database,
	containing FASTA format sequences of individual chains from PDB
	entries. In PDB, the entry itself is a 4-character string, and the
	chain is a single character A through Z. When an entry has more
	than 26 chains, the next 26 are labelled a through z. Pdbprot
	appends these as _A, _B, etc. PDBPROT is available from some
	public SRS servers - see the official list at
	http://downloads.lionbio.co.uk/publicsrs.html.
	This is resolved by adding a new database attribute caseidmatch in
	emboss.default. A value of "Y" will force EMBOSS to exactly match
	the case of the whole ID. This is done by post-processing and
	rejecting entries with an ID that fails to match.
	
	The run date included in report output has changed format to have
	the day first and to lose the leading zero when the day is 1st to
	9th of the month.

	Program cpgplot can run on more than one input sequence, but the
	plot failed on the second sequence. Fixing this required adding a
	new function ajGraphDataReplaceI to replace the 1st, 2nd 3rd,
	etc. subgraph. Some memory cleanup was also added to remove
	the replaced graph data objects.

	Programs pepwindow and pepwindowall can now process any
	protein sequence. In previous versions pepwindow was restricted to
	pureprotein (no ambiguity codes) while pepwindowall accepted any
	protein sequence (it has to handle gaps) but was using a score of
	zero for unknown amino acid residues. Changed so that missing amino
	acid values can be filled in using Dayhoff frequency weighted
	averages for B, J and Z and an overall average for X, J and O.

	Program octanol can accept any protein sequence. Interpolated
	values are used for B, Z and J. An average over all values is used
	for X and also for O and U where there is no data. Interpolations
	and averages used the Dayhoff amino acid frequencies.

	Program iep can accept any protein sequence. Ambiguity codes B and
	Z are resolved by converting to the carboxylic acid (D or E) or
	amide (N or Q) according to the Dayhoff amino acid frequencies,
	giving a consistent value for any input protein.

	Sequence set type testing was checking whether the seqset is
	defined as protein but ignoring the type of the first
	sequence. This is now fixed.

	Program tfm looks in the obsolete install directory with the -html
	option. Changed to find the embassy package name from the
	installed ACD file and then to find the installed HTML file. If
	EMBOSS has not been installed, will also search the original
	source files.

	Modified NCBI/FASTA format to preserve the database name from the
	NCBI style ID. The database name is reported in one of the many
	and varied NCBI syntax variants, depending on whether there is a
	version or accession number, and whether there is an EMBOSS
	database name also involved (for example, an entry in a file
	indexed with dbxfasta or dbifasta)

	Modified "pearson" sequence format to keep the FASTA file ID
	complete. For historical reasons GCG-style dbname:id syntax was
	still having the db part trimmed. This will still be trimmed from
	fasta or ncbi format.

	The report for digest has Cterm and Nterm columns capitalised to
	match the rest of the report. Sequence ranges now give correct
	cterm and nterm results.

	The list file Cut.index for codon usage tables was changed to
	remove old file names (commented out list at the end) and to
	remove underscores from the species names.

	Programs water, needle, merger and prophet calculate an internal
	path size from the lengths of the input sequences. For sequences
	that are too long, a fatal error is produced. But if the sequences
	are extremely long, the test failed and the program gave a
	segmentation fault. This fix tests in a different way that will
	catch all cases. (added as a fix to 4.0.0)

	The new MRS access method used a general search. This gave strange
	results when the ID or accession appeared in any other entry. It
	appears that MRS can search for id or accession only. This worked
	on the main MRS server at least. (added as a fix to 4.0.0)

	New database access methods MRS and DBFETCH need to be explicitly
	turned on so that showdb can report them. (added as a fix to
	4.0.0)

	When deleting the last line of buffered input, failed to reset the
	pointer to the last buffered line. This only affected debug
	traces. Unfortunately, the ajFileBuffClear function does call the
	debug trace. In practice we have only seen this bug when
	processing sequence data in EMBL format from an MRS server. (added
	as a fix to 4.0.0)

	Pattern and regular expression searches failed to correctly
	reverse a nucleotide sequence. The change is to use
	ajSeqReverseForce (always reverses the sequence provided) instead
	of ajSeqReverseDo (which only reverses if the reverse flag is
	set). (added as a fix to 4.0.0)

	Reports in list format failed to write a usable USA for "asis"
	sequence input, and incorrectly reported reverse strand nucleotide
	features. (added as a fix to 4.0.0)

	The lists files Matrices.nucleotide, Matrices.protein and
	Matrices.proteinstructure now have comment headers explaining
	their format.  Fixed issues with nucleotide features in the
	reverse direction in reports. The start/end positions were stored
	the wrong way around and then reversed again when repiorted in one
	of the report formats. However, reporting as EMBL features showed
	the incorrect storage. ajFeatNewII now checks start/end and
	reverses the feature if start is ggreater than end. ajFeatNewIIRev
	sets the reverse strand and also checks that the start position is
	greater than (or equal to) the end position (added as a fix to 4.0.0)

	To reduce the size of very large reports, for example when fuzznuc
	or fuzzpro run over very large databases, new qualifiers are added
	to report output. -rmaxseq gived the maximum hits for any one
	sequence, -maxall gives the total maximum number of hits. The
	report tail contains a record of the number of hits reported and
	found. The qualifiers are intended for web interfaces to control
	the maximum output they need to report. When the maximum hits
	figure is reached, ajReportWrite returns false so that programs
	can terminate at that point. (added as a fix to 4.0.0)

	Reports now write a header and tail when closed, to make sure that
	all programs will write something to the report file. The default
	header contains the command line provenance, the tail contains the
	number of sequences and hits. (added as a fix to 4.0.0)