Article
Organization
Home »
Linoleum Source Code »
Dictionary (Radix Tree) »
Source Code Library
Scope
This is the source code of the Dictionary (Radix Tree) library, implemented in the
Linoleum programming language.
Author
Herbert Glarner
Source Code
(Dictionary "DIC.txt" 1.0 - Copyright 2006 Herbert Glarner)
(*****************************************************************************)
(=========================================================================)
( Copyright Herbert Glarner [2006]
Mail: herbert.glarner@bluewin.ch
Free usage granted for all purposes, including commercial, provided
that above credits are given.
Suggestions for performance improvement welcome.
Though thoroughly tested, no liability for any damage usage may cause.)
(=========================================================================)
(=========================================================================)
(Implements a Dictionary in RAM.)
(The dedicated RAM can be dynamic or static. The required memory is
managed completely by the DSA Library.
Insertion and retrieval of keys is managed in this library with the
implementation of a Radix Tree with variable-length nodes. The user's
payload for such a node is variable-length data, realized as an independent
memory chunk unidirectionally associated from said key.
Lexicographical order of the keys can be ensured by the BST library, which
converts keys such, that their symbols are readable from left to right in
the appropriate symbol width. This DIC library expects the keys already
being in lexicographical order, left-aligned within units. Apply BST
outside of this library on your keys before adding new entries, if you
want to take advantage of lexicographical order.
A single node consists of 5 units and a key of variable length, 1 unit
for 1...32 significant bits, 2 units for 33...64 significant bits etc.:
+0 +1 +2 +3 +4 +5 +n
+-----+--------+------+-------+------+-------/ /--+
| Len | Parent | Left | Right | Data | Key / / |
+-----+--------+------+-------+------+-----/ /----+
Len actually consists of 2 fields: the 2 MSBs 31:30 are used for branch
prediction, the remaining 30 bits 29:0 contain the length of the key,
expressed in bits, not bytes or units, providing a theoretical max
limit of 2^30 = 1 GBits = 128 MBytes = 32 MUnits per single key, a
value which you most likely never will consider as a key.
Parent, Left and Right contain addresses to other nodes within the
Radix Tree. They may contain -1 = no entry. The only node having -1
in the Parent field is the root node. Left and Right may contain -1
if there are no children. Note, that a Radix Node usually does not
have just one child: there are either 2 children or no children at
all. However, if a parent is also a leaf, as is the case in e.g.:
2 nodes with keys 'herb' and 'herbert', then also just one child
may be present. Internally, the /DSA/ library, which is used by this
DIC library to dynamically manage memory, maintains a header prior
each memory chunk, but it does not expect us to deal with the
addresses or contents of those: the stored addresses point to the
DSA 'user data' after the DSA header, i.e. to the individual radix
tree nodes' start addresses.
Data points to another memory chunk managed by the DSA library. This
is the payload of the associated radix node. It can have any arbitrary
length. Here, the length is measured in units at 4 bytes each,as usual.
The length is not stored explicitely in the DIC library, because it
can be assumed, that the user knows what data he is going to retrieve
when accessing a node he once stored. However, if such an information
is needed, there are 2 possibilities: Firstly, the DSA library has a
function named 'Dsa Size' which tells the length of the memory chunk in
question; and secondly, the user can provide the 'record length' in the
user data chunk itself. The Data field can contain -1, indicating that
this node is no leaf entry, but exists only as an internal node with a
common prefix of its children's keys. Note, that radix trees may be a
leaf /and/ have children, which btw is the only situation in which a
node can have just one child.
Key is a bitstream. It contains the /left-aligned/ key in as many
significant bits as are indicated by the Len field. Note, that on
Little Endian systems such as Intel, characters are not normally
aligned such that they form a left-aligned bitstream. For example do
the 6 ASCII characters of the word 'Yvonne' appear in 2 consecutive
units as 'novY' and '00en'. To make this a left-aligned bitstream, use
the BST library on the key before storing the entry. BST will convert
any input-width into a bitstream, resulting in the desired 'Yvon' and
'ne00' for given example. Up to 32 bits can be coded within each unit.
There are as many units as needed to accomodate the whole key, up to a
theoretical 32 Mega-Units for a single key. As node coalescing and node
splitting occurs during deletions of existing nodes and insertions of
new nodes, superfluous units are returned to the memory or missing units
are requested from memory, resp., in order to ensure a minimal internal
memory fragmentation.
WARNING: NEVER STORE NODE ADDRESSES! They may change during insertions,
deletions and execution of some special routines.
The object's data as pointed to by the radix node's Data field is an
individual memory chunk of variable length itself, consisting of as many
units as requested by the user. It does not carry any additional information
relevant to this library, especially is there no header or similar data
structure. However, it is a memory chunk managed by the /DSA/ library, which
means, that the DSA library itself /does/ prefix the memory chunk with a
DSA header.
+0 +n
+--------------/ /-+
| Object / / |
+------------/ /---+
Outsourcing the management information into radix nodes and keeping the
object data separate has the double advantage of facilitating fragmentation
handling and of moving around less data less times. Basically, the object
data chunk exists at the place of creation for as long as the object
exists, whereas the usually short radix node may be moved around during
this library's splitting and coalescing processes.
This library supports a programmer's First/Next loops with the methods
'Dic Get First' and 'Dic Get Next'. 'Dic Get Next' works whenever a method
has accessed a valid node, e.g. after 'Dic Find Entry', enabling fast
retrieval of such queries like 'all entries which start with G' [or more
accurately, as keys are bitstreams: '... which start with the bits 01000111'].
Two identical data sets always will construct the same radix tree. This
means, that /logically/ it does not matter if the input keys are lexico-
graphically sorted or are fed completely at random: at last the tree is the
same, when all keys exist in both sets, no matter their original feeding
position. However, there is a /physical/ difference: nodes will be created
as keys are fed, and if fed with random keys, lexicographically close keys
may appear in physical RAM in nodes far apart from each other. This causes
a lot of hardware cache misses when access is to be in lexicographical order,
and consequently a really significant performance loss. On the author's
machine, on which this library was developed and tested, a 3,066 MHz
Celeron D with a 16 KiB L1 cache with 32 Bytes wide cache lines, First/Next
loops needed 183% of the time when a 173,526 words long wordlist was input
in random order, compared to the 100% for the lexicographically presorted
variant - despite the fact that the trees were absolutely identical.
For this reason, the optimization method 'Dic Linearize' is provided,
speeding up accesses by making sure that the hardware cache has a lot of
work to do. The First/Next loop over the random feed was sped up from the
slow 183% to a fast 43%, nearly quadruple the speed. Of course this suggests,
that also lexicographically sorted inputs will benefit, although at a smaller
rate: in the tested case it doubled the access speed from 100% to 42%. -
Read additional information in the header of routine 'Dic Linearize'.)
(=========================================================================)
(=========================================================================)
(Routines summary:
Name of routine Short description)
(-------------------------------------------------------------------------)
(
======================================================================
PUBLIC ROUTINES:
----------------------------------------------------------------------
[Dedicated Ram Area Handling]
Dic Create Do this first to tell about memory
Dic Destroy Do this when the dictionary is not
needed any longer, returns RAM to OS
----------------------------------------------------------------------
[Working with keys]
Dic Add Entry Inserting a new entry into the dic
Dic Find Entry Returns the address of the object data
associated with the provided key
Dic Remove Entry Removing an existing entry from the dic
----------------------------------------------------------------------
[Working lexicographically, for sequential access loops in alphabetical
order etc.]
Dic Get First Gets key+data of the lex. first entry
Dic Get Next Gets key+data of the lex. next entry
----------------------------------------------------------------------
[Special purpose routines]
Dic Linearize Use after adding large unsorted batch,
speeds up accesses greatly, especially
with First/Next loops
Dic Get Key Returns the key of a valid node address
READ WARNINGS about storing node addr.
======================================================================
======================================================================
INTERNAL ROUTINES:
----------------------------------------------------------------------
Usage of all Internal Routines is discouraged.
Dic Locate Attempts to locate a searched node
Dic Reduce Key Shifts out n bits of a bitstream
Dic Modify Key Replaces n bits within a bitstream
Dic Resize Node Assigns a node new storage
======================================================================
)
(=========================================================================)
(=========================================================================)
(Memory requirements:
9 units = 36 bytes for 9 variables
5 units = 20 bytes for a workspace holding the radix tree's root header
Calculation of required space when allocating memory:
Per entry: The user data units
plus 4 units DSA management
plus 1 node for the leaf
plus 0...1 nodes as internal nodes, depends on key structure
Per node: 5 header units
plus 4 units DSA management
1...n key units depending on the key size,
with n typically in the range of 1-2 units for ASCII files
If removing entries, allow some space for unavoidable fragmentation,
although DSA does a good job in keeping it minimized; also consider
some units for the DSA's AVL management: 6 units per memory chunk with
a distinct length different from others {same lengths are free}.
'Dsa Fragmentation' may return valuable data for analysis.
)
(=========================================================================)
(=========================================================================)
(Version history:
08 Dec 2006: 1.0 Released
)
(=========================================================================)
(*****************************************************************************)
"libraries"
(=========================================================================)
(DSA Dynamic Storage Allocator)
(-------------------------------------------------------------------------)
(The DSA library manages a fragmentation reduced block of memory for the
purposes of this DIC library. It firstly stores the nodes of a Radix Tree;
and secondly the object data itself.
http://herbert.wikispaces.com/DSA
)
(-------------------------------------------------------------------------)
DSA; (Requires version 1.1 or higher)
(=========================================================================)
(*****************************************************************************)
"directors"
program name = { DIC };
unit = 32;
(*****************************************************************************)
"constants" (Convention: "DIC ...", UPPER CASE)
(=========================================================================)
(General constants)
(-------------------------------------------------------------------------)
DIC WIDTH OF BITS IN UNIT = 5; (2^5=32, for number of bits 0...31)
DIC BITS IN UNIT = 32; (Number of bits in a unit)
DIC ALL BITS SET = minus 1; (FFFF FFFFh in 32 bit environments)
(-------------------------------------------------------------------------)
(The following constants all evaluate to 31 in 32-bit-environments, but
they are used in different contexts and thus have different names to enhance
source code legibility.)
DIC MASK NUM BITS IN UNIT = 11111b; (5 LSBs counting bits in a unit)
DIC ROUNDING BITS TO UNIT = 31; (bits + 31, then > 5 = units)
DIC MAX BIT INDEX IN UNIT = 31; (Bit number 31 is the MSB)
(=========================================================================)
(=========================================================================)
(Error codes)
(-------------------------------------------------------------------------)
DIC ERR NONE = 0; (No error)
(-------------------------------------------------------------------------)
(From 'Dic Create')
DIC ERR DIC EXISTING = 1; (Destroy existing dictionary first)
DIC ERR DSA ERROR = 2; (Details available in [dsa error].)
(-------------------------------------------------------------------------)
(From 'Dic Destroy')
(DIC ERR DSA ERROR = 2;) (Details available in [dsa error].)
DIC ERR DIC NOT EXISTING = 3; ('Dic Create' not done/not successful)
(-------------------------------------------------------------------------)
(From 'Dic Add Entry')
(DIC ERR DSA ERROR = 2;) (Details available in [dsa error].)
(DIC ERR DIC NOT EXISTING = 3;) ('Dic Create' not done/not successful)
DIC ERR INVALID KEY LENGTH = 4; (Must be 1...2^30-1 Bits)
DIC ERR INVALID OBJECT LENGTH = 5; (Must be 1...2^30-1 Units)
DIC ERR KEY NOT UNIQUE = 6; (Attempt to provide already exist. key)
(-------------------------------------------------------------------------)
(From 'Dic Find Entry')
(DIC ERR DIC NOT EXISTING = 3;) ('Dic Create' not done/not successful)
(DIC ERR INVALID KEY LENGTH = 4;) (Must be 1...2^30-1 Bits)
DIC ERR KEY NOT EXISTING = 7; (Attempt to find not existing key)
(-------------------------------------------------------------------------)
(From 'Dic Get First')
(DIC ERR DIC NOT EXISTING = 3;) ('Dic Create' not done/not successful)
DIC ERR NO ENTRIES = 8; (Not a single key in the dictionary)
(-------------------------------------------------------------------------)
(From 'Dic Get Next')
(DIC ERR DIC NOT EXISTING = 3;) ('Dic Create' not done/not successful)
DIC ERR END OF DICTIONARY = 9; (No more entries)
(-------------------------------------------------------------------------)
(From 'Dic Remove Entry')
(DIC ERR DSA ERROR = 2;) (Details available in [dsa error].)
(DIC ERR DIC NOT EXISTING = 3;) ('Dic Create' not done/not successful)
(DIC ERR INVALID KEY LENGTH = 4;) (Must be 1...2^30-1 Bits)
(DIC ERR KEY NOT EXISTING = 7;) (Attempt to find not existing key)
(-------------------------------------------------------------------------)
(From 'Dic Get Key')
(DIC ERR DIC NOT EXISTING = 3;) ('Dic Create' not done/not successful)
DIC ERR NODE IS NO LEAF = 10; (Can only be applied on leaves)
(=========================================================================)
(=========================================================================)
(Radix Node record)
(-------------------------------------------------------------------------)
(Each node consists of a fixed length record of 5 units plus a variable
length key field. The offsets of the fields within a node:)
DIC HEADER SIZE = 5; (Excluding the key field)
DIC LEN = 0; (Radix node's key length in BITS)
(-----------------------------------------------------------------)
(DIC LEN can not be 0 in radix trees: such a node would be coalesced
with its parent. The only exception is the root, whose only purpose
is to hold the initial left/right branches as per the first bit
of the key bitstreams. Note, that DIC LEN actually consists of two
bit fields.)
DIC LEN BITS = 3FFF FFFFh; (All bits except the 2 MSBs)
DIC BRANCH PREDICTOR = C000 0000h; (Only the 2 MSBs)
(-----------------------------------------------------------------)
(The following constants all deal with the two uppermost bits)
DIC BRANCH PRED COUNTER = 30; (The whole counter in bit 31:30)
DIC BRANCH PRED DIRECTION = 31; (Taking direction is in bit 31)
DIC BRANCH PRED STRONG LEFT = 0000 0000h; (Count 00b)
DIC BRANCH PRED WEAK LEFT = 4000 0000h; (Count 01b)
DIC BRANCH PRED WEAK RIGHT = 8000 0000h; (Count 10b)
DIC BRANCH PRED STRONG RIGHT = C000 0000h; (Count 11b)
(-----------------------------------------------------------------)
DIC PARENT = 1; (address of parent if any, or -1)
DIC LEFT = 2; (address of left child if any, or -1)
DIC RIGHT = 3; (address of right child if any, or -1)
DIC DATA = 4; (address of object data if any, or -1)
(DIC PARENT, DIC LEFT, DIC RIGHT and DIC DATA may contain -1 as
an indicator that there is no valid pointer.)
DIC NO ADDRESS = DIC ALL BITS SET;
DIC KEY = 5; (First unit with key, LEFT ALIGNED BITS)
(Note, that the key may exceed 32 bits, in which case two or more
consecutive units can be found. How many significant bits there
are is told by the DIC LEN field.)
(=========================================================================)
(=========================================================================)
(Status flags)
(-------------------------------------------------------------------------)
(The variable [dic status] will hold some flags about the current status
of the dictionary.)
DIC STATUS CREATED = 0000 0001h; (set when 'Dic Create' was successful)
DIC STATUS LEFT = 0000 0002h; (set in 'Dic Locate' when Left Child)
DIC STATUS FOUND = 0000 0004h; (set when a searched entry was found)
DIC STATUS SPLIT = 0000 0008h; (set when insertion requires splitting)
DIC STATUS NEW LEAF= 0000 0010h; (set when 1st split part is new leaf)
DIC STATUS PREDICT RIGHT = 0000 0020h; (Bit number 5)
DIC STATUS PREDICT RIGHT POS = 5; (Shift value to right-align)
(Theoretically, the other bits within the [dic status] variable are
free to be used. However, usage is discouraged.)
(=========================================================================)
(*****************************************************************************)
"variables" (Convention: "dic ...", lower case)
(=========================================================================)
(Error condition as per DIC ERR XXX constants.)
(-------------------------------------------------------------------------)
dic error = 0;
(=========================================================================)
(=========================================================================)
(Some flags about the current status of the library. The bits are given in
the DIC STATUS XXX constants.)
(-------------------------------------------------------------------------)
dic status = 0;
(=========================================================================)
(=========================================================================)
(The 'Dic Get Next/Prev' Routines need to know, what leaf we accessed last,
in order to execute efficiently. This node's address is held in the
variable [dic last leaf address]. Also held in here is the length of the
leaf's key, in bits.)
(-------------------------------------------------------------------------)
dic last leaf address = 0;
dic last key length = 0;
(=========================================================================)
(=========================================================================)
(Used to temporarily hold the number of identical bits in keys spanning
more than 32 bits.)
(-------------------------------------------------------------------------)
dic ident bits = 0;
(=========================================================================)
(=========================================================================)
(Statistical counts.)
(-------------------------------------------------------------------------)
dic number of nodes = 0; (All nodes, whether internal or leaves)
dic number of leaves = 0; (Data chunks, one per leaf)
(-------------------------------------------------------------------------)
(Do not change the order of the following 2 variables, nor do insert any
other variable between them: 'dic number incorrect pred' is treated as
offset +1 unit relative to 'dic number correct pred'.)
dic number correct pred = 0;
dic number incorrect pred = 0;
(=========================================================================)
(*****************************************************************************)
"workspace" (Convention: "dic ...", lower case)
(=========================================================================)
(The radix tree's root node never has a key, thus its key length is 0 bit
and therefore it consists only of the DIC HEADER SIZE management units. The
4 fields PARENT, LEFT, RIGHT and DATA all are set to -1 initially. PARENT
remains on -1 during the whole existence of the radix tree: this indicates
that it is the root node. Initialization is done in the 'Dic Create'
routine. The KEY field is not needed.)
(-------------------------------------------------------------------------)
dic root = DIC HEADER SIZE; (No DIC KEY field for the root)
(=========================================================================)
(*****************************************************************************)
"programme" (Convention: "Dic ...", Mixed case)
(=========================================================================)
(This is a library only, it can not be run stand-alone. See "DIC Test.txt"
for a test setup.)
(-------------------------------------------------------------------------)
end;
(=========================================================================)
(*****************************************************************************)
"Dic Create"
(=========================================================================)
(Initializes a Dictionary. To free the allocated resources, use the
function 'Dic Destroy' when not needing the dictionary any longer.)
(-------------------------------------------------------------------------)
(Input: D: Number of dedicated memory units at 4 Bytes each, max 1G-1
Output: -
Calculation of required space when allocating memory:
Per entry: The user data units
plus 4 units DSA management
plus 1 node for the leaf
plus 0...1 nodes as internal nodes, depends on key structure
Per node: 5 header units
plus 4 units DSA management
1...n key units depending on the key size,
with n typically in the range of 1-2 units for ASCII files
If removing entries, allow some space for unavoidable fragmentation,
although DSA does a good job in keeping it minimized; also consider
some units for the DSA's AVL management: 6 units per memory chunk with
a distinct length different from others {same lengths are free}.
'Dsa Fragmentation' may return valuable data for analysis.
All registers unchanged.
Can cause errors. If the routine fails, the error code is in [dic error].
It can contain:
DIC ERR DIC EXISTING A dictionary exists already, use 'Dic Destroy'
DIC ERR DSA ERROR Call to 'Dsa Create' was not successful
)
(=========================================================================)
(VERIFICATION)
(No dictionary may exist yet. A dictionary already exists, when the flag
DIC STATUS CREATED is set in the [dic status] variable. If it does exist
already, use 'Dic Destroy' first before assigning a new dictionary.)
? [dic status] + DIC STATUS CREATED -> Dic Create Already Existing;
(=========================================================================)
(PROCESS)
(Either this routine exits with an error, in which case the tree is
invalid, or the routine succeeds and provides a successful initialization
flag in the status word.)
[dic status] = 0; (Not created yet, all flags clear)
(-------------------------------------------------------------------------)
(Allocate the designated RAM via the DSA library.)
(D = D;)
=> Dsa Create;
? failed -> Dic Create Dsa Error; (Details are in [dsa error])
(-------------------------------------------------------------------------)
(The radix tree's root node is held in workspace. It never has a key,
therefore it does not need any KEY field. All pointers are set to -1 and
remain so, until the first entry is added.)
[dic root plus DIC LEN] = 0; (0 bits key length)
[dic root plus DIC PARENT] = DIC NO ADDRESS;
[dic root plus DIC LEFT] = DIC NO ADDRESS;
[dic root plus DIC RIGHT] = DIC NO ADDRESS;
[dic root plus DIC DATA] = DIC NO ADDRESS;
(-------------------------------------------------------------------------)
(Statistics)
[dic number of nodes] = 0; (Counts all the nodes in the radix tree)
[dic number of leaves] = 0; (Counts all data chunks, 1 per leaf)
(-------------------------------------------------------------------------)
(Internal flag to tell that the dictionary was successfully initialized.
All other flags are cleared.)
[dic status] = DIC STATUS CREATED;
(-> Dic Create Exit OK;)
(=========================================================================)
(EXIT)
("Dic Create Exit OK")
[dic error] = AVL ERR NONE;
end;
(-------------------------------------------------------------------------)
"Dic Create Exit Failed"
fail;
(=========================================================================)
(ERRORS)
"Dic Create Already Existing"
(Error Code of library call is in [dsa error]. In [dic error] we just tell
that the underlying reason can be found there.)
[dic error] = DIC ERR DIC EXISTING;
-> Dic Create Exit Failed;
(-------------------------------------------------------------------------)
"Dic Create Dsa Error"
(Error Code of library call is in [dsa error]. In [dic error] we just tell
that the underlying reason can be found there.)
[dic error] = DIC ERR DSA ERROR; (Check [dic error] for reason)
-> Dic Create Exit Failed;
(=========================================================================)
(*****************************************************************************)
"Dic Destroy"
(=========================================================================)
(Destroys a Dictionary environment formerly created by 'Dic Create'. A call
to this routine returns the RAM area dedicated to the dictionary back to
the OS.)
(-------------------------------------------------------------------------)
(Input: -
Output: -
All registers unchanged. In fact, none is used.
May return errors, check [dic error] after 'failed'. It may contain:
DIC ERR DIC NOT EXISTING 'Dic Create' was not used/successful
DIC ERR DSA ERROR Call to 'Dsa Destroy' was not successful
Return with 'ok' guarantees that the requested RAM has been given back to
the OS. Returning with an error means that the RAM is still alocated; if
it ever was created, that is.
)
(=========================================================================)
(VERIFICATION)
(Make sure, that there was a former successful call to 'Dic Create'. If
there was none, or if it was not successful, we really should not reset
the RAM to an aribitrary content and better leave it as it is.)
? [dic status] - DIC STATUS CREATED -> Dic Destroy No Dictionary;
(=========================================================================)
(PROCESS)
(During 'Dic Create' the DSA routine 'Dsa Create' was called, which saved
the old content of [RAM Top] into the variable [dsa old ram top]. It is
reset to this value now, discarding the whole content which may be in
there.)
=> Dsa Destroy;
? failed -> Dic Destroy Dsa Error;
(-------------------------------------------------------------------------)
(Statistics)
[dic number of nodes] = 0;
[dic number of leaves] = 0;
(Reflect, that we have no dictionary anymore by clearing the status bit.
This makes it ready for another call to 'Dic Create'. We can XOR here
because it is sure that the flag currently is set: if tit was not we would
have left already with an error.)
[dic status] # DIC STATUS CREATED;
(-> Dic Destroy Exit OK;)
(=========================================================================)
("Dic Destroy Exit OK")
[dic error] = DIC ERR NONE;
end;
"Dic Destroy Exit Failed"
fail;
(=========================================================================)
(ERRORS)
"Dic Destroy No Dictionary"
[dic error] = DIC ERR DIC NOT EXISTING;
-> Dic Destroy Exit Failed;
(-------------------------------------------------------------------------)
"Dic Destroy Dsa Error"
[dic error] = DIC ERR DSA ERROR; (Check [dic error] for reason)
-> Dic Destroy Exit Failed;
(=========================================================================)
(*****************************************************************************)
"Dic Add Entry"
(=========================================================================)
(Adds an entry to the dictionary. Entries consist of a key of any length,
expressed in BITS, and object data of any length, expressed in UNITS. The
key is treated separately from the data. Both use an individual memory
chunk, allocated by the DSA. The data chunk is referenced by a radix node,
which latter also maintains several pointers to other nodes, altogether
forming a Radix Tree with bit based left-aligned bitstreams. To allow
traversal in lexicographical order, make sure to use the BST library on the
key buffer /PRIOR/ to calling this function.)
(-------------------------------------------------------------------------)
(Input: A: Address of first unit with the Key
The key is a left-aligned bitstream of B bits.
WARNING: This buffer belongs to the DIC library, not to
the client's application. The key in this buffer will be
destroyed. If the client needs the key after this call,
make sure that this buffer only points to a /copy/ of the
original key!
B: Length of key in BITS, must be > 0
Empty keys are not allowed: the radix tree's root node
never points to a data chunk.
C: Address of first Unit with the User Data
This typically points to a user-defined object or record:
this library does not care what is in this record. It is
/not/ the passed address which is integrated into the
dictionary, but the whole object's content in the specified
size.
D: Length of data in UNITS at 4 bytes each, must be > 0
Output: -
All registers unchanged.
Can cause errors. If the routine fails, the error code is in [dic error].
It can contain:
DIC ERR DIC NOT EXISTING 'Dsa Create' not called/not successful
DIC ERR DSA ERROR Call to 'Dsa Alloc' was not successful
DIC ERR KEY NOT UNIQUE An entry with such a key already exists
DIC ERR INVALID KEY LENGTH Must be 1...2^30-1 Bits
DIC ERR INVALID OBJECT LENGTH Must be 1...2^30-1 Units
)
(=========================================================================)
(ENTRY)
A -->; B -->; C -->; D -->; E -->;
(=========================================================================)
(VERIFICATION)
(The dictionary must have been successfully initialized with 'Dic Create'.
If this was not the case, error DIC ERR DIC NOT EXISTING is communicated.)
? [dic status] - DIC STATUS CREATED -> Dic Add Entry No Dictionary;
(-------------------------------------------------------------------------)
(The key length must be larger than 0 bits and it may not exceed a length
codeable in 30 bits, i.e. 2^30-1. Also, the data object must be at least
1 unit long and it may not exceed the DSA limit of 1 GU-1.)
? B = 0 -> Dic Add Entry Invalid Key Length;
? D = 0 -> Dic Add Entry Invalid Object Length;
? B + C000 0000h -> Dic Add Entry Invalid Key Length;
? D + C000 0000h -> Dic Add Entry Invalid Object Length;
(=========================================================================)
(PROCESS)
(First the userdata is stored into a new memory chunk: this requires little
work and we only need to carry on a pointer to the address where the data
was stored instead of the 2 registers C and D. Note, however, that this is
executed speculatively: should it turn out later, that such a key already
exists, then this memory chunk will be deallocated again.)
(D = D;) (Number of units)
=> Dsa Alloc; (Delivers address of chunk in E)
? failed -> Dic Add Entry No Data Chunk Created;
[dic number of leaves] + 1; (Statistics)
(In E is the address of the new userdata chunk. From address C we copy D
units to address E. Since D is not stored anywhere in the Radix tree, and
because it was made sure that D is larger than 0, we can use it as a
counter to copy the units, while incrementing the source and target
addresses. However, we need the original E back when done, because it
is the address which needs to be referred to by the new radix node later
on. Also, we will load it back to register C when done, not to E, as we
will need E multiple times to pass arguments to other routines.)
E -->; (Save data address, later in C)
"Dic Add Entry Copy Data" (Loop start)
[E] = [C]; (Copy a unit of user data)
C + 1; E + 1; (Advance source and target)
D ^ Dic Add Entry Copy Data; (Copy all user data units)
<-- C; (Restore data address in C, not E)
(-------------------------------------------------------------------------)
(C contains the address of the new user data chunk. Registers D and E are
free to be used. This was planned so, that we conveniently can allocate the
second memory chunk to store the radix node for the key. In register D, we
need to provide the node's size in units. In B, we have the key length in
bits; furthermore we have DIC HEADER SIZE units of node management data.
However, we can not simply calculate the needed length yet, because in
radix trees we store partial keys only. To find out, what portion still
needs to be stored, we need to follow the tree as far as we can. We begin
with the root, which is held in workspace. Recall that the root never has
a key, thus its key length is 0 bit and therefore it consists only of the
DIC HEADER SIZE management units. What we basically attempt is to locate
the key. If we can not find it, we are ready to insert it at the position
where the search stopped.)
A -->; (Save address to key buffer)
(A = A;) (Address of start of key)
(B = B;) (Key length in bits)
=> Dic Locate; (Future parent returned in E)
('Dic Locate' can not cause errors, don't query 'ok'. Note, that above
'Dic Locate' reduces the key so much as it could be traced down the radix
tree. This is perfect for this library, but it also means, that the original
key is of no more use for the client.)
(If we could find the key, the caller provided a non-unique key to be added
once again. This is a usage error: only unique keys are allowed when
inserting an entry. Since we are not expecting the user to do this
intentionally, we speculatively already took the opportunity to create the
data chunk for the future node. Since there will be no such additional node,
we also must remove the alerady created data chunk in such a case. Its
address is still in register C. The following conditional jump to the given
address handles the user data deallocation.)
? [dic status] + DIC STATUS FOUND -> Dic Add Entry Key Already Existing;
(-------------------------------------------------------------------------)
(The key does not exist yet. This means, that we can insert it as a child
of the already figured out future parent node. This parent node's address
is in E. It can be the root node or any other node several levels down.
We also know already, which child our new node needs to be: 'Dic Locate'
did that job. If the status flag DIC STATUS LEFT is set, it must become the
parents left child, else its right child. Such a child may already exist.
The partial key was already well defined by 'Dic Locate': It is the number
of bits which remain as per register B, and they are left-aligned starting
at unit 'A'. A currently is on the stack.)
(If the flag DIC STATUS NEW LEAF is set, then an additional node is required:
this flag signals, that the first part of the 2 splits is to become the new
key, rather than merely being a non-leaf node with a common prefix. E.g.:
insertion of a new key 'magic' while a key 'magical' already exists.)
? [dic status] + DIC STATUS NEW LEAF -> Dic Add Entry No New Node;
(-> Dic Add Entry New Node;)
(-------------------------------------------------------------------------)
("Dic Add Entry New Node") (Typically the majority case)
(The future parent is in E. Allocate memory space for the new node. We do
this regardless of the fact that the future parent E might additionally be
split into two parts. The split will be done after the addition of the new
key. Such a split would be indicated by a set flag DIC STATUS SPLIT. That
flag would have been set already by above 'Dic Locate'.)
D = B; (Number of Bits for partial key)
E -->; (Save parent address)
D + DIC ROUNDING BITS TO UNIT; (Rounding up to whole units)
D > DIC WIDTH OF BITS IN UNIT; (Number of Units for partial key)
D -->; (Need that number back later)
D + DIC HEADER SIZE; (Plus header fields of radix node)
=> Dsa Alloc; (Address of created chunk in E)
? failed -> Dic Add Entry Child Not Created;
[dic number of nodes] + 1; (Statistics)
(-------------------------------------------------------------------------)
(In E is the address of the new child node, on the stack the address of
this child's parent. The Data chunk's address still is in C. The number of
bits for the new child's partial key is still in B. The new child has no
branch prediction bits yet, thus we have to initialize the counter. Since
it has no children yet, we usually can't predict if we would take the
node's left or right branch as soon as such children come into existence:
the chance is 50:50 for randomly distributed binary keys. However, since
our 2 bit prediction counter forces us to choose a side, we take into
consideration, that we typically might process character keys such as ASCII
codes. Because this new node is a leaf, it must end with the last bits of
such a character. If it ever has to have own children, such children would
most likely start out with a 0 bit, because all ASCII characters start with
a 0 [not ANSI though, but even then the vast part consist of the lower
ASCII codes]. Thus, not completely arbitrarily we choose to initialize the
counter as 'weakly taken left'. Tests show, that this leads to 69.59%
correct predictions. Note, that 'strong left' for ASCII files would lead
to a slightly better hit rate, namely 69.63%, but for real binary files it
would mean a possibly heavy misprediction rate, because in case of right
branches, i.e. when continuing with a 1 bit, we would need to be proven
wrong twice before we would predict correctly. And just for completeness
the other data for ASCII files: Weak right = 67.65%, strong right = 67.44%.
Why bother, 2% difference does not seem much? Well, each percent typically
translates to hundreds of thousands or millions of avoided cache misses:
recall that /one/ instruction in 'Dic Locate' accounts for 50% of the whole
procedure's runtime due to cache misses, if left unattended.)
[E plus DIC LEN] = DIC BRANCH PRED WEAK LEFT; (Init pred. counter)
[E plus DIC LEFT] = DIC NO ADDRESS; (No own children yet)
[E plus DIC LEN] | B; (Partial key length)
[E plus DIC RIGHT] = DIC NO ADDRESS;(No own children yet)
<-- B; (Number of units for partial key)
<-- D; (Parent address)
[E plus DIC DATA] = C; (Data chunk's address)
[E plus DIC PARENT] = D; (Link child to parent)
(-------------------------------------------------------------------------)
(All headers of both radix nodes were updated, with the exception of the
parent linking to the new child, which won't be done before a possible
parent split was handled. - The other missing thing now is the remaining
partial key for which the child was created. The number of required UNITS
is in B, it is 1 or larger; the address where the key starts is in A.
Adding the node's header length to E results in the address to which B units
from source A have to be copied.)
C = A; (Potential result from 'Dic Locate')
<-- A; (Address of new key)
E -->; (Save parent address)
E + DIC KEY; (Destination of partial key)
"Dic Add Entry Copy Key" (Loop typically executed only once)
[E] = [A]; (Copy unit)
A + 1; E + 1; (Next addresses)
B ^ Dic Add Entry Copy Key; (Until all units were copied)
<-- E; (Restore parent address)
(-------------------------------------------------------------------------)
(DIC STATUS LEFT tells, if the new child is the parent's left or right
child. However, this link may not be updated, before a possible split was
handled: in case of a split, we would potentially overwrite an already
existing child, which would need to be inherited to the second part of
the parent split.)
(With this exception, the new child E is complete. As for the issue of the
parental node possibly required to be split into 2 nodes: If this is the
case, the flag DIC STATUS SPLIT would be set. If it is not, we can proceed
with linking the new child properly to its parent.)
? [dic status] - DIC STATUS SPLIT -> Dic Add Entry No Split;
(-------------------------------------------------------------------------)
(The parent node D of the new child E must be split into 2 nodes.)
(To be able to do the correct links between the new child and the parent,
we need their addresses back when the split was done.)
D -->; E -->;
(The split occurs after C bits of the actual parent key. The parent stays
in its current position, but its key length needs to be reduced, causing a
temporary loss of information, as only so many bits as indicated are
treated as significant. The lost bits will then become the key for the new
node.)
A = [D plus DIC LEN]; (Get old parent length)
B = A; (Save old parent length...)
A & DIC BRANCH PREDICTOR; (Keep branch prediction)
B & DIC LEN BITS; (...without branch prediction bits)
A | C; (Provide new parent length)
[D plus DIC LEN] = A; (Write back new length, old pred.)
(Now an additional node is created. It is to become the parent's other
child and takes the second part of the actual parent's key.)
A = D; (Save parent's address)
D = B; (Old parent length, must be in D)
D + DIC ROUNDING BITS TO UNIT; (Rounding up to whole units)
D > DIC WIDTH OF BITS IN UNIT; (Number of bits in key)
D + DIC HEADER SIZE; (Number of Units for split node)
=> Dsa Alloc; (Address of new chunk in E)
? failed -> Dic Add Entry Split Not Created;
[dic number of nodes] + 1; (Statistics)
(The split node's parent is also the parent for the new node.)
[E plus DIC PARENT] = A; (New node's parent is A)
(The split parent inherits its parent's children.)
[E plus DIC LEFT] = [A plus DIC LEFT];
[E plus DIC RIGHT] = [A plus DIC RIGHT];
(This means for the newly linked children, that their parent changes to
the split parent. Note, however, that such children do not neccessarily
need to exist, and if they do, there can be one or two of them: we must
test both individually and adjust them accordingly.)
C = [E plus DIC LEFT];
? C = DIC NO ADDRESS -> Dic Add Entry No Left Child;
[C plus DIC PARENT] = E;
"Dic Add Entry No Left Child"
C = [E plus DIC RIGHT];
? C = DIC NO ADDRESS -> Dic Add Entry No Right Child;
[C plus DIC PARENT] = E;
"Dic Add Entry No Right Child"
(The parent becomes a 'leaf-less' internal node now with the only role of
holding a common partial key prefix.)
[E plus DIC DATA] = [A plus DIC DATA]; (Data address from parent)
A -->; (Moved up for better performance)
E -->; (Moved up for better performance)
[A plus DIC DATA] = DIC NO ADDRESS; (Parent is no leaf anymore)
(The parent's key unit/s are copied to the new node's key unit/s now.
In B is the parent's old key length in bits.)
(A -->;) (Still need both node addresses...)
(E -->;) (...Both done above already)
C = B; (Save parent's old key length)
E + DIC KEY; (Destination of parent key)
B + DIC ROUNDING BITS TO UNIT; (Rounding up to whole units)
A + DIC KEY; (Source of parent key)
B > DIC WIDTH OF BITS IN UNIT; (Number of units for the key)
"Dic Add Entry Copy Parent Key" (Loop typically executed just once)
[E] = [A]; (Copy unit)
A + 1; E + 1; (Next addresses)
B ^ Dic Add Entry Copy Parent Key; (Until all copied)
<-- E; (Restore first of two addresses)
B = C; (Belongs to following code block)
<-- A; (Restore second of two addresses)
(Now finally tell the effective key length in bits of the new split
node.)
(B = C;) (Old parent's length, done above)
D = [A plus DIC LEN]; (Length of Split parent's parent)
D & DIC LEN BITS; (Without prediction bits)
[E plus DIC LEN] & DIC BRANCH PREDICTOR; (Keep pred. bits)
B - D; (Remaining effective key length)
[E plus DIC LEN] | B; (Store effective key length)
(Because the new node's key still is the parent's old key, but the parent
now manages the first [A plus DIC LEN] bits of it, the key of the new
node must be reduced by these bits.)
B = C; (OLD Length of key in bits)
A = E; (Address of new node)
(The key length from [A plus DIC LEN] without prediction bits is still
in D. We need it in C now. It is the number of bits to shift left.)
C = D; (Need A's key length in C)
A + DIC KEY; (Address of start of key bits)
=> Dic Reduce Key; (Adjust the new node's key)
(Last thing missing is the link from the parent to the new node. While
the flag DIC STATUS LEFT instructed us how to deal with the new node for
the new partial key, we need to insert the pointer in the other child
field for the split node.)
D = [E plus DIC PARENT]; (Address of parent)
D + DIC LEFT; (Assume new node is left child)
? [dic status] - DIC STATUS LEFT -> Dic Add Entry Link Split;
D + DIC RIGHT minus DIC LEFT; (Admit new node is right child)
"Dic Add Entry Link Split" (In D is address of child field)
[D] = E; (Bidirectionally linked now)
(Split done. Getting back addresses for relink)
<-- E; <-- D;
(-> Dic Add Entry No Split;)
(-------------------------------------------------------------------------)
"Dic Add Entry No Split" (Also jumped to if no split needed)
(DIC STATUS LEFT tells, if the new child is the parent's left or right
child. This link was not updated before a possible parental split was
handled, which is the case now, if it was needed at all.)
D + DIC RIGHT; (Assume new child is D's right child)
? [dic status] - DIC STATUS LEFT -> Dic Add Entry Link Parent To Child;
D + DIC LEFT minus DIC RIGHT; (Admit it's D's left child)
"Dic Add Entry Link Parent To Child" (Field addr DIC LEFT/DIC RIGHT in D)
[D] = E; (Link parent to new left child)
(-> Dic Add Entry Exit OK;) (Done)
(=========================================================================)
(EXIT. Located atypically early in the routine to enable fall-through for
the major case.)
"Dic Add Entry Exit OK"
[dic last leaf address] = E; (Remember this leaf for 'Dic Get Next')
<-- E; <-- D; <-- C; <-- B; <-- A;
[dic last key length] = B; (Also remember its key length in bits)
[dic error] = DIC ERR NONE;
end;
"Dic Add Entry Exit Failed"
<-- E; <-- D; <-- C; <-- B; <-- A;
fail;
(=========================================================================)
(Typically, the following sections are the minority case. They were moved
to this location to grant the majority case fall-through conditions, taking
advantage of the hardware branch prediction mechanisms.)
"Dic Add Entry No New Node" (Typically the minority case)
(We want to split the current node E after B bits. The first part is to
become the new key's node. We create a new node now, which is to hold the
second part of the split E.)
D = [E plus DIC LEN]; (Actual key length of parent, BITS)
D & DIC LEN BITS; (The 2 MSBs are not part of DIC LEN)
<-- A; (Clean up stack)
D + DIC ROUNDING BITS TO UNIT; (Rounding up to get number of units)
E -->; (Save parent, future first part)
D > DIC WIDTH OF BITS IN UNIT; (Number of units for key)
D + DIC HEADER SIZE; (Number of Units for split node)
=> Dsa Alloc; (Address of future 2nd part in E)
? failed -> Dic Add Entry No Second Part Created;
[dic number of nodes] + 1; (Statistics)
(The second part of the node to split was allocated. Its address is in E.
The address of the parent, the future first part, is on the stack.)
A = D; (Save key length, in UNITS)
<-- D; (Future first part into D, not E)
A - DIC HEADER SIZE; (Key without header units)
(The future first part = the already existing node is in D now, not in E
anymore. In E is the new node, it's the second part. The second part E
inherits from the first part D: its key length including branch prediction
bits; its children, if any; its data pointer; and its key, although the key
along with the key length will be reduced later on.)
[E plus DIC LEN] = [D plus DIC LEN]; (Also inherits branch predictor)
[E plus DIC LEFT] = [D plus DIC LEFT];
[E plus DIC RIGHT] = [D plus DIC RIGHT];
[E plus DIC DATA] = [D plus DIC DATA];
(In A is the length of the key in units. It is 1 or larger. We use it as
a counter to copy the key.)
D -->; E -->; (Need addresses back)
"Dic Add Entry Copy Second Key" (Loop typically executed just once)
[E plus DIC KEY] = [D plus DIC KEY];
D + 1; E + 1;
A ^ Dic Add Entry Copy Second Key;
<-- E; <-- D; (Restore addresses)
(The data to be inherited was transferred from first part D to second part
E. After linking the second part to the first part, all fields of the second
part have a content, although the key still needs to be reduced.)
[E plus DIC PARENT] = D; (2nd part's parent is the 1st part)
(Now the first part can be updated. Its key is reduced to the length of
bits common to both the first part and the new key. This value is in B.
Since B can not be longer than before, we don't need to check for overflow
into the branch prediction bits. However, we can't just assign because of
the branch prediction bits and need to OR them into their place.)
[D plus DIC LEN] & DIC BRANCH PREDICTOR;
[D plus DIC LEN] | B; (Set 1st part's new key length)
(The first part's data pointer needs to refer to the memory chunk that was
created initially: its address still is in C.)
[D plus DIC DATA] = C; (New data chunk for first part)
(The first part's parent needs not to be modified. The only update left
to do is to link the appropriate child field to the second part. For this
we need to examine the bit B+1: but because we need to reduce the second
part's key anyway, this process conveniently allows us to examine the first
remaining bit /after reduction/, and so this final task is delayed. Let's
reduce the second part's key now, which still is identical to the original
key before splitting.)
A = E; (Start of second part)
C = B; (Shifting out length of 1st part)
A + DIC KEY; (Address of key start in 2nd part)
B = [E plus DIC LEN]; (Current length of key)
B & DIC LEN BITS; (Without branch predictor)
=> Dic Reduce Key;
(Now the effective key length of the second part is the original key
length minus the bits shifted out. The branch prediction bits remain
untouched: they still are valid for this node, only that the node is a
level deeper now.)
A = [E plus DIC LEN];
A & DIC LEN BITS;
[E plus DIC LEN] & DIC BRANCH PREDICTOR;
A - C;
[E plus DIC LEN] | A; (New key length 1st part)
(And now the MSB of the key field of the 2nd part tells us, which child
of the first part D the 2nd part E has to become: if clear, its left child,
else its right child. This also sets the first branch prediction into the
same direction, with a count of 01b weakly left or 10b weakly right.)
C = D; (Address of first part)
[D plus DIC LEN] & DIC LEN BITS; (Clear prediction counter)
[D plus DIC LEFT] = DIC NO ADDRESS; (One of these remains -1...)
C + DIC RIGHT; (Assume: E is right child)
[D plus DIC LEN] | DIC BRANCH PRED WEAK RIGHT; (Weakly right)
[D plus DIC RIGHT] = DIC NO ADDRESS; (...the other gets the 2nd part)
? [E plus DIC KEY] + 8000 0000h -> Dic Add Entry Link Second Part;
C + DIC LEFT minus DIC RIGHT; (Admit: E is left child)
[D plus DIC LEN] - DIC BRANCH PRED WEAK LEFT; (Admit: weakly left)
(Above '-' is correct: We assumed DIC BRANCH PRED WEAK RIGHT = 80000000h
first, now subtracting DIC BRANCH PRED WEAK LEFT = 40000000h leaves us with
DIC BRANCH PRED WEAK LEFT = 40000000h.)
"Dic Add Entry Link Second Part" (C has the addr to be modified)
[C] = E; (Link child to correct field)
(The second part E has inherited the children of the first part D. These
children of E, if any, still point to their former parent D and must be
corrected to point to E. Note, that there may be any number of children:
none, one or two, so we must test them individually.)
A = [E plus DIC LEFT];
B = [E plus DIC RIGHT];
? A = DIC NO ADDRESS -> Dic Add Entry No Left Kid;
[A plus DIC PARENT] = E;
"Dic Add Entry No Left Kid"
? B = DIC NO ADDRESS -> Dic Add Entry No Right Kid;
[B plus DIC PARENT] = E;
"Dic Add Entry No Right Kid"
(All done. The original node was split into 2 parts. The first part points
to the new key now, the second to the original data, but with a reduced
key and as a child of the first part. We can leave.)
-> Dic Add Entry Exit OK; (Done)
(=========================================================================)
(ERRORS)
"Dic Add Entry No Dictionary"
[dic error] = DIC ERR DIC NOT EXISTING;
-> Dic Add Entry Exit Failed;
(-------------------------------------------------------------------------)
"Dic Add Entry Invalid Key Length"
[dic error] = DIC ERR INVALID KEY LENGTH;
-> Dic Add Entry Exit Failed;
(-------------------------------------------------------------------------)
"Dic Add Entry Invalid Object Length"
[dic error] = DIC ERR INVALID OBJECT LENGTH;
-> Dic Add Entry Exit Failed;
(-------------------------------------------------------------------------)
"Dic Add Entry No Data Chunk Created"
[dic error] = DIC ERR DSA ERROR; (Check [dic error] for reason)
-> Dic Add Entry Exit Failed;
(-------------------------------------------------------------------------)
"Dic Add Entry Key Already Existing"
<-- A; (Restore stack before leaving)
[dic error] = DIC ERR KEY NOT UNIQUE;
(We could find the key, the caller provided a non-unique key to be added
once again. This is a usage error: only unique keys are allowed when
inserting an entry. Since we are not expecting the user to do this
intentionally, we speculatively already took the opportunity to create the
data chunk for the future node. Since there will be no such additional node,
we also must remove the speculatively created data chunk. Its address is
still in register C.)
E = C; (The data chunk's address)
=> Dsa Free; (Remove the storage)
(If above fails, then we have a more serious problem that just a non-
unique key. In this case we change the error code to a fatal error.)
? failed -> Dic Add Entry Data Not Removed;
[dic number of leaves] - 1; (Statistics)
-> Dic Add Entry Exit Failed;
(-------------------------------------------------------------------------)
(Jumped to from 'Dic Add Entry Key Already Existing' if not able to
remove a speculatively added data chunk. If it's convenient for the client
to be able to distinguish between a DSA error for this special case, then
a distinct error code may be implemented here. The author does not need
this distinction.)
"Dic Add Entry Data Not Removed"
[dic error] = DIC ERR DSA ERROR; (Check [dic error] for reason)
-> Dic Add Entry Exit Failed;
(-------------------------------------------------------------------------)
"Dic Add Entry No Second Part Created"
<-- E; (Clean up stack before leaving)
[dic error] = DIC ERR DSA ERROR; (Check [dic error] for reason)
-> Dic Add Entry Exit Failed;
(-------------------------------------------------------------------------)
"Dic Add Entry Child Not Created"
<-- D; <-- E; <-- A; (Clean up stack before leaving)
[dic error] = DIC ERR DSA ERROR; (Check [dic error] for reason)
-> Dic Add Entry Exit Failed;
(-------------------------------------------------------------------------)
"Dic Add Entry Split Not Created"
[dic error] = DIC ERR DSA ERROR; (Check [dic error] for reason)
-> Dic Add Entry Exit Failed;
(=========================================================================)
(*****************************************************************************)
"Dic Find Entry"
(=========================================================================)
(Finds an entry in the dictionary by its key. Entries consist of a key of
any length, and object data of any length. The key is fed to the routine as
a left-aligned bitstream, and the pointer to the userdata is returned, if
an entry with the provided key exists.)
(-------------------------------------------------------------------------)
(Input: A: Address to a buffer with the first unit of the key: the
caller has to provide this key, it is what we search for.
WARNING: This buffer belongs to the DIC library, not to
the client's application. The key in this buffer will be
destroyed. If the client needs the key after this call,
make sure that this buffer only points to a /copy/ of the
original key!
B: Length of key in BITS
Output: C: If 'ok', DIC STATUS FOUND is set, and C contains:
Address of first Unit with the object Data. Note that
the DIC library does not store the size of this data.
If you don't know what you retrieve, code it in the
data object's structure, or use 'Dsa Size' on this
address to retrieve the data chunk's size.
If 'failed', DIC STATUS FOUND is clear, and C contains:
DIC NO ADDRESS = minus 1 = DIC ALL BITS SET. Don't
treat this like an address. Always check for errors.
Check the status flag DIC STATUS FOUND in [dic status] to examine the
result.
All registers except C unchanged.
Can cause errors. If the routine fails, the error code is in [dic error].
It can contain:
DIC ERR DIC NOT EXISTING
DIC ERR INVALID KEY LENGTH
DIC ERR KEY NOT EXISTING
)
(=========================================================================)
(ENTRY)
E -->;
(=========================================================================)
(VERIFICATION)
(The dictionary must have been successfully initialized with 'Dic Create'.
If this was not the case, error DIC ERR DIC NOT EXISTING is communicated.)
? [dic status] - DIC STATUS CREATED -> Dic Find Entry No Dictionary;
(-------------------------------------------------------------------------)
(The key length must be larger than 0 bits and it may not exceed a length
codeable in 30 bits, i.e. 2^30-1.)
? B = 0 -> Dic Find Entry Invalid Key Length;
? B + C000 0000h -> Dic Find Entry Invalid Key Length;
(=========================================================================)
(PROCESS)
('Dic Locate' modifies A and B, but we need to return them unmodified.)
A -->; B -->; (Saving key data)
(A = A;) (Address of start of key)
(B = B;) (Key length in bits)
=> Dic Locate; (Radix node returned in E)
('Dic Locate' can not cause errors, don't query 'ok'.)
<-- B; <-- A; (Retrieving key data)
(-------------------------------------------------------------------------)
(If we could not find the key, the caller provided a not existing key.
This would be a usage error: only nodes with existing keys can be
retrieved. Whether the entry was found or not is indicated by the status
flag DIC STATUS FOUND.)
? [dic status] - DIC STATUS FOUND -> Dic Find Entry Node Not Found;
(-------------------------------------------------------------------------)
(In E is the address of a found radix node with the searched key. Its
DIC DATA field contains the address to the memory chunk with the userdata.
If the content of this field is -1, however, then a radix node with this
key exists as an internal node only and not as a leaf. As a consequence,
'Not Found' will be the result: such a key never was added to the
dictionary and the hit was a mere coincidence.)
C = [E plus DIC DATA]; (May be -1)
? C = DIC NO ADDRESS -> Dic Find Entry Node Not Found;
(-------------------------------------------------------------------------)
(Remember this node to continue from here with the very fast 'Dic Get Next'
routine, if lexicographical continuation is desired. This allows for fast
queries like «all entries starting with 'Herb'».)
[dic last leaf address] = E; (Remember this leaf for Prev/Next)
[dic last key length] = B; (Also remember its key length in bits)
(-> Dic Find Entry Exit OK;)
(=========================================================================)
(EXIT)
"Dic Find Entry Exit OK"
<-- E;
[dic error] = DIC ERR NONE;
end;
(-------------------------------------------------------------------------)
"Dic Find Entry Exit Failed"
<-- E;
C = DIC NO ADDRESS; ('Result')
fail;
(=========================================================================)
(ERRORS)
"Dic Find Entry No Dictionary"
[dic error] = DIC ERR DIC NOT EXISTING;
-> Dic Find Entry Exit Failed;
(-------------------------------------------------------------------------)
"Dic Find Entry Invalid Key Length"
[dic error] = DIC ERR INVALID KEY LENGTH;
-> Dic Find Entry Exit Failed;
(-------------------------------------------------------------------------)
"Dic Find Entry Node Not Found"
[dic error] = DIC ERR KEY NOT EXISTING;
-> Dic Find Entry Exit Failed;
(=========================================================================)
(*****************************************************************************)
"Dic Remove Entry"
(=========================================================================)
(Removes an entry from the dictionary. Entries consist of a key of any
length, and object data of any length. The key is fed to the routine and
all associated memory chunks are freed, if an entry with the provided key
exists.)
(-------------------------------------------------------------------------)
(Input: A: Address to a buffer with the first unit of the key: the
caller has to provide this key, it is the memory associated
with it that is removed.
WARNING: This buffer belongs to the DIC library, not to
the client's application. The key in this buffer will be
destroyed. If the client needs the key after this call,
make sure that this buffer only points to a /copy/ of the
original key!
B: Length of key in BITS
Output: None
All registers unchanged.
Can cause errors. If the routine fails, the error code is in [dic error].
It can contain:
DIC ERR DIC NOT EXISTING
DIC ERR INVALID KEY LENGTH
DIC ERR KEY NOT EXISTING
DIC ERR DSA ERROR
)
(=========================================================================)
(ENTRY)
A -->; B -->; C -->; D -->; E -->;
(=========================================================================)
(VERIFICATION)
(The dictionary must have been successfully initialized with 'Dic Create'.
If this was not the case, error DIC ERR DIC NOT EXISTING is communicated.)
? [dic status] - DIC STATUS CREATED -> Dic Remove Entry No Dictionary;
(-------------------------------------------------------------------------)
(The key length must be larger than 0 bits and it may not exceed a length
codeable in 30 bits, i.e. 2^30-1.)
? B = 0 -> Dic Remove Entry Invalid Key Length;
? B + C000 0000h -> Dic Remove Entry Invalid Key Length;
(=========================================================================)
(PROCESS)
('Dic Locate' modifies A and B, but we want them unmodified.)
A -->; B -->; (Saving key data)
(A = A;) (Address of start of key)
(B = B;) (Key length in bits)
=> Dic Locate; (Radix node returned in E)
('Dic Locate' can not cause errors, don't query 'ok'.)
<-- B; <-- A; (Retrieving key data)
(-------------------------------------------------------------------------)
(If we could not find the key, the caller provided a not existing key.
This would be a usage error: only existing keys can be removed. Whether the
entry was found or not is indicated by the status flag DIC STATUS FOUND.)
? [dic status] - DIC STATUS FOUND -> Dic Remove Entry Node Not Found;
(In E is the address of a found radix node. Its DIC DATA field contains the
address to the memory chunk with the userdata. If the content of this
field is -1, then a radix node with the provided key exists, but as an
internal node only and not as a leaf. Such a key never was intended and
that we found the node is a mere coincidence due to internal organization.
As a consequence, 'Not Found' will be the result.)
C = [E plus DIC DATA]; (Address of userdata area)
? C = DIC NO ADDRESS -> Dic Remove Entry Node Not Found;
(-------------------------------------------------------------------------)
(In E is the address of the node to be removed. In C is the address of its
user data area. The allocated memory chunk for the userdata is /not/ yet
removed, though. Consider the following scenario: 1. We remove the data
chunk, it succeeds. 2. We remove the node, it does /not/ succeed. We would
be left with a node without appropriate data. However, if we instead first
remove the node, and /then/ the data, the situation looks better. Even if
removing the data chunk should fail eventually, no harm is done besides of
having allocated and unreferenced memory; as annoying as this might be, it
does not harm the application using this library and in any case is the
better situation when compared with the first scenario.)
(-------------------------------------------------------------------------)
(Removing node E can not be done just so by deallocationg the used memory:
For one there are pointers which need to be reset first, but probably more
importantly the DIC KEY field contains DIC LEN bits which are part of a
bitstream forming a key. Sometimes the bits of the node to be removed need
to be coalesced with other nodes, which then inherit the relevant bits
of the key, but there are other times, when the node may not be removed at
all, because of acting as an internal node anyway. The following sections
cope with these issues.)
(We never can coalesce E with its /parent/. There are two different cases
to consider: - Either E is the parent's only child, in which case it at
first glance should be possible to coalesce with it, at least when the node
to be removed has own children, but then the question arises: why does the
parent exist at all? The only possible reason is, that the parent is not
just an internal node, but also a leaf. Retaining the key and the key
length is crucial when we want to be able to find that parental leaf again,
so we may not coalesce. - Or E is just one of two children of its parent.
In this case the parent either marks the end of the common prefix for both
children and the children nodes exist as the parent's continuations, or the
parent is a leaf, or both: in all three cases we may not coalesce E with the
parent and must leave it alone. - However, with this it is not said, that
the node is never coalesced: if the parent has TWO children, of which E is
one, AND if E has no children itself, AND if the parent is not a leaf, THEN
we can coalesce the OTHER child with the parent and E can leave the stage.
Checking for this situation now.)
D = [E plus DIC PARENT]; (Address of E's parent)
(ORing both children returns -1 only if at least 1 child does not exist.
Well, admittedly there is another possibility: complementing addresses,
but this would imply a set bit 31 for one of these addresses, which would
indicate a RAM location above 2 GUnits = above 8 GB, which is not possible
to be handled by a 32 bit OS. So -1 after ORing means: E is the only child.
ORing is done instead of an 'If block' because it's faster.)
A = [D plus DIC LEFT];
A | [D plus DIC RIGHT]; (A is -1 if there is no other child)
? A = DIC NO ADDRESS -> Dic Remove Entry No Parent Coalesc;
(E's parent D has two children, of which E is one. One of the 3 conditions
for a parental merge is fulfilled. If the parent is a leaf, no coalescing
between D and E's sibling is possible, though.)
? [D plus DIC DATA] != DIC NO ADDRESS
-> Dic Remove Entry No Parent Coalesc;
(Two of the 3 conditions for a parental merge are fulfilled. E's parent D
has two children, of which E is one. D is just an internal node, not also
a leaf. This internal node is not required any longer, IF the branch
leading to E terminates with E, i.e. if E has no children itself. To save
superfluous jumps, we directly jump to the appropriate labels instead of
jumping to 'Dic Remove Entry No Parent Coalesc' as done above, where we
would compare what we can compare now.)
? [E plus DIC LEFT] != DIC NO ADDRESS
-> Dic Remove Entry Has Left Child;
? [E plus DIC RIGHT] != DIC NO ADDRESS
-> Dic Remove Entry Right Child Only;
(All 3 conditions for a parental merge are fulfilled.)
(-------------------------------------------------------------------------)
(Start of handling the parental merge case)
(Parent D and E's sibling can be coalesced to a single node. Then we can
remove the storage allocation for both children, plus E's userdata chunk,
and are done.)
(We need the address of E's sibling. It must be one of the two children of
parent D, and it is not E. We store its address into A.)
A = [D plus DIC RIGHT]; (Assume sibling is D's right child)
? E = [D plus DIC LEFT] -> Dic Remove Entry Got Sibling;
A = [D plus DIC LEFT]; (Admit E's sibling is D's left chd)
"Dic Remove Entry Got Sibling" (In A is E's sibling, a child of D)
(A will now be merged into D. Before concatenating the two partial keys,
we need to make sure, that there is enough space for the longer key. This
code fragment does not calculate the number of units needed yet. It merely
tests if the rarer case occurs /that/ we need to make the key field larger.)
B = [D plus DIC LEN]; (Current key length in bits)
C = [A plus DIC LEN]; (Prefetching, in case of fall-through)
B & DIC MASK NUM BITS IN UNIT; (Bits used in last key unit)
(The last instruction also removed the branch prediction bits and leaves
us with a result of 0...31. 0 does not mean that no bit was used in the last
key unit: it rather means that all 32 bits were filling up the last key unit.
Whatever the number of bits to be added, we must make the key field larger.)
C & DIC LEN BITS; (Prefetching, in case of fall-through)
? B = 0 -> Dic Remove Entry Make Larger; (All bits used in last unit)
(C = [A plus DIC LEN];) (Already done above)
(C & DIC LEN BITS;) (Already done above)
B + C; (New number of bits in last unit)
(If the current number of bits in the last unit plus the number of bits
to be added still fit into the last key unit, i.e. if the result is 32
or less, then we don't need an additional unit.)
? B <= DIC BITS IN UNIT -> Dic Remove Entry Merge To Parent; (Has space)
"Dic Remove Entry Make Larger"
(We need to make E larger to be able to hold the new key. Now we need the
real number of units. Since we don't store the actual length, we need to
calculate it.)
B = [D plus DIC LEN]; (Current key length in bits)
C = [A plus DIC LEN]; (Additional bits to add)
B & DIC LEN BITS; (Without branch prediction bits)
C & DIC LEN BITS; (Without branch prediction bits)
B + C; (Total number of all bits)
(Calculate number of units of new radix node)
B + DIC ROUNDING BITS TO UNIT; (Rounding up to whole units)
B > DIC WIDTH OF BITS IN UNIT; (Number of units to fit all bits)
B + DIC HEADER SIZE; (The node's header units)
(Node D needs to be removed and a new larger node E with a total size of
B units is to inherit all its data.)
D <> E; (Parent node to be resized in E)
D <> B; (Units to allocate in D)
=> Dic Resize Node; (Provides a new E !)
? failed -> Dic Remove Entry Exit Failed; ([dic error] already loaded)
(NOTE: the routine 'Dic Resize Node' changed the address of E, but also
returned it, so that we can continue to work seamlessly as if nothing had
happened. The tree integrity was maintained in 'Dic Resize Node', i.e.
all pointers which needed adjustment /were/ adjusted. Also, the routine
'Dic Resize Node' will actualize the address of node [dic last leaf address]
if needed, so that 'Dic Linearize' even can be used within 'Dic Get First'/
'Dic Get Next' loops.)
D <> B;
D <> E;
(Registers as before. D: Parent into which to merge, but with a new address
now; E: Node to be removed; A: Sibling to merge into parent D.)
"Dic Remove Entry Merge To Parent"
(Concatenate the key of E's sibling, whose node address is in A, to the
common parent D's key.)
E -->; (Save address of node to remove)
B = [D plus DIC LEN]; (Retain first part of bitstream)
C = [A plus DIC LEN]; (Add this number of bits)
B & DIC LEN BITS; (Without branch prediction bits)
C & DIC LEN BITS; (Without branch prediction bits)
E = A; (Need Input buffer in E)
A = D; (Need Output buffer in A)
E + DIC KEY; (Input buffer with new bits)
A + DIC KEY; (Output buffer of expanded key)
=> Dic Modify Key; (Concatenate the compound key)
A - DIC KEY; (Restore start address of node)
E - DIC KEY; (Restore start address of node)
D = [A plus DIC LEN]; (Length of common parent)
D & DIC LEN BITS; (Without branch prediction bits)
[A plus DIC LEN] & DIC BRANCH PREDICTOR; (Keep branch prediction)
D + C; (New total length)
[A plus DIC LEN] | D; (Store new total length)
<-- D; (Restore address of node to remove)
(Note, that all involved nodes are somewhere else now: In D the node to be
removed, in A the common parent, and in E is D's sibling which is being
merged with A.)
(Now parent A inherits the sibling E's own children, if any. A's parent
remains the same. The length and key were just updated. The data field,
if any, is inherited from E as well.)
[A plus DIC LEFT] = [E plus DIC LEFT]; (May be -1)
[A plus DIC RIGHT] = [E plus DIC RIGHT]; (May be -1)
[A plus DIC DATA] = [E plus DIC DATA]; (Also data may be -1)
(The inherited children, if any, still point to their old parent E. They
must be corrected to point to A now. There may be 0, 1 or 2 children.)
B = [A plus DIC LEFT];
? B = DIC NO ADDRESS -> Dic Remove Entry Left Inherit Ok;
[B plus DIC PARENT] = A; (Left child's new parent is A)
"Dic Remove Entry Left Inherit Ok"
B = [A plus DIC RIGHT];
? B = DIC NO ADDRESS -> Dic Remove Entry Right Inherit Ok;
[B plus DIC PARENT] = A; (Right child's new parent is A)
"Dic Remove Entry Right Inherit Ok"
(By now, the former children of A, i.e. the node D which was initially
requested to be removed, and also node E, the sibling which was merged
with the common parent, both are completely isolated and not reachable
anymore from anywhere in the tree. They both can safely be removed. Also,
D's userdata area can be removed, but not E's userdata area, which is now
linked to A, whether or not E had such a userdata area.)
(One of the two nodes must be removed here, the other node and a data
segment can be removed by jumping down to 'Dic Remove Entry Remove Node',
provided that the node's address is in A and the userdata area address
in C. Since the storage deallocator requires the address to be passed
in E, where the address of one of the two nodes already resides, we
remove node E first.)
(E = E;) (Address of the merged sibling)
=> Dsa Free; (Remove the storage)
? failed -> Dic Remove Entry Sibling Not Removed;
[dic number of nodes] - 1; (Statistics)
(Loading the addresses of the remaining node and its userdata area into the
correct registers to be processable by aforementioned code block.)
A = D; (Address of other node to remove)
C = [D plus DIC DATA]; (Address of userdata area to remove)
-> Dic Remove Entry Remove Node; (Deallocate both memory chunks)
(End of handling the parental merge case)
(-------------------------------------------------------------------------)
"Dic Remove Entry No Parent Coalesc"
(If E has just one child, we can coalesce that child with E, since E's
status changes from 'Leaf' to 'Internal Node' and an internal node is not
needed when there is just one child. If E has TWO children, then E
represents the common prefix for both children and it in general may NOT be
removed: it merely loses its 'Leaf' status; however, if E's parent is an
internal node only, AND E is the only child of that parent, then E can be
merged into E's parent. If E has no children at all, it can be removed
altogether and the according parent pointer to it can be set to -1.)
? [E plus DIC LEFT] = DIC NO ADDRESS -> Dic Remove Entry No Left Child;
(-> Dic Remove Entry Has Left Child;)
(-------------------------------------------------------------------------)
"Dic Remove Entry Has Left Child"
(E has a left child. Does it also have a right child?)
? [E plus DIC RIGHT] = DIC NO ADDRESS -> Dic Remove Entry Left Child Only;
(-> Dic Remove Entry Two Children;)
(-------------------------------------------------------------------------)
(Start of handling for nodes with two children.)
("Dic Remove Entry Two Children")
(The leaf to be removed is E. Since it has two children, E also is a common
prefix to these children, i.e. the node itself may not be removed and merely
changes its staus from leaf to internal node. If, however, E's parent is
just an internal node and not also a leaf, AND E is the only child of its
parent, then E /could/ be coalesced with its parent. However, such a case
can not arise, as explained below step by step.)
(E's userdata chunk's address is in C. It is not needed any longer. E is
not treated as a leaf any longer, only as an internal node.)
E -->; (Need E back)
E = C; (Need address of data chunk in E)
=> Dsa Free; (Remove the storage)
? failed -> Dic Remove Entry Own Data Not Removed;
[dic number of leaves] - 1; (Statistics)
<-- E; (Address of node to be removed)
[E plus DIC DATA] = DIC NO ADDRESS; (Internal node only now, not leaf)
(Excurse: To see why there never is the need to coalesce, follow the
explanations. The commented code is just there to explain what we would do
as per comments. Leave them commented, please, they are not needed.)
(Step 1. If E's parent is a leaf, then E continues to act as an internal
node with a common prefix to its two children. In this case, we are done
and can leave aready. The address of the parent is in D.)
(? [D plus DIC DATA] != DIC NO ADDRESS -> Dic Remove Entry Exit OK;)
(Step 2. Still here? Then we have the situation, that E and its parent D
both are just internal nodes. Let's find out, if E is the only child of D.
If this is not the case, i.e. if D has two children, then both D and E
still are required to exist. In this case we can leave. ORing both children
returns -1 only if at least one child does not exist.)
(A = [D plus DIC LEFT];
A | [D plus DIC RIGHT];
? A != DIC NO ADDRESS -> Dic Remove Entry Exit OK;)
(Step 3. Still here? Then E's parent D has just one child, which obviously
must be E. Since E and D both are just internal nodes, they /could/ be
merged now: no need to keep them both. BUT: this would also mean, that the
parent D was a node with just one child, namely E, all the time. Such a
case can not arise, as D would have been fused together with E as soon as
that situation came into existence. - As a consequence, we can spare all
the tests elaborated above: all possible cases unconditionally lead to an
ok exit, which is what we do now - uncommented ;)
-> Dic Remove Entry Exit OK;
(End of handling for nodes with two children.)
(-------------------------------------------------------------------------)
"Dic Remove Entry No Left Child"
(No left child, but possibly a right child?)
? [E plus DIC RIGHT] = DIC NO ADDRESS -> Dic Remove Entry No Children;
(-> Dic Remove Entry Right Child Only;)
(-------------------------------------------------------------------------)
"Dic Remove Entry Right Child Only"
(E's right child can be coalesced with E. Beneath correcting pointers to
each other to maintain tree integrity, we need to chain together the two
partial keys: The E+DICLEN bits from E+DIC KEY are the first part, and to
this the child's bits need to be concatenated as the second part. Get the
child's address.)
A = [E plus DIC RIGHT]; (Address of E's only child)
-> Dic Remove Entry Coalesce Only Child;
(-------------------------------------------------------------------------)
"Dic Remove Entry Left Child Only"
(E's left child can be coalesced with E. Beneath correcting pointers to
each other to maintain tree integrity, we need to chain together the two
partial keys: The E+DIC LEN bits from E+DIC KEY are the first part, and to
this the child's bits need to be concatenated as the second part. Get the
child's address.)
A = [E plus DIC LEFT]; (Address of E's only child)
(-> Dic Remove Entry Coalesce Only Child;)
(-------------------------------------------------------------------------)
(Start of handling for nodes with just one child.)
"Dic Remove Entry Coalesce Only Child"
(A is the address of E's only child. It is easier to actually remove the
child A and retain E as the coalesced node, because replacing bits is
easier and faster than actually inserting them which would require shifts
over possibly many units. Since this library never communicates the
positions of the nodes, we don't expect the user to store such addresses.
It's actually discouraged anyway. There's only one function working with
such addresses, 'Dic Get Key'.)
(Before concatenating the two partial keys, we need to make sure, that
there is enough space for the longer key.)
D = [E plus DIC LEN]; (Current key length in bits)
B = [A plus DIC LEN]; (Prefetch for fall-through)
D & DIC MASK NUM BITS IN UNIT; (Bits used in last key unit)
(In D is 0...31 now. If D is 0, this means that we used up ALL bits of the
last unit and need a larger key in any case to accomodate new bits.)
B & DIC LEN BITS; (Prefetch for fall-through)
? D = 0 -> Dic Remove Entry Larger Key;
(D has the number 1 to 31 bits occupied by the last unit, to which some
more bits will be added.)
(B = [A plus DIC LEN];) (Additional bits to add, done above)
(B & DIC LEN BITS;) (Without branch prediction bits)
D + B; (Number of bits of longer key)
? D <= DIC BITS IN UNIT -> Dic Remove Entry Combine Keys; (Has space)
"Dic Remove Entry Larger Key"
(We need to make E larger to be able to hold the new key. Now we need the
real number of required units. We still don't know the actual length,
as we don't save it, but we know already that we need more space.)
D = [E plus DIC LEN]; (Current key length in bits)
B = [A plus DIC LEN]; (Additional bits to add)
D & DIC LEN BITS; (Without branch prediction bits)
B & DIC LEN BITS; (Without branch prediction bits)
D + B; (Number of bits for new key)
(Calculate number of units)
D + DIC ROUNDING BITS TO UNIT; (Rounding up to whole units)
D > DIC WIDTH OF BITS IN UNIT; (Number of units to fit all bits)
D + DIC HEADER SIZE; (The node's header units)
(Note, that we actually remove the child A later on. However, coalescing
is done into node E, and it is said E that needs to be resized, i.e.
removed and a re-assigned as a new larger node E with a total size of D
units, which is to inherit all its data.)
(D = D;) (Units to allocate)
(E = E;) (Node to be resized)
=> Dic Resize Node; (Provides a new E !)
? failed -> Dic Remove Entry Exit Failed; ([dic error] already loaded)
(NOTE: the routine 'Dic Resize Node' changed the address of E, but also
returned it, so that we can continue to work seamlessly as if nothing had
happened. The tree integrity was maintained, i.e. all pointers which
needed adjustment /were/ adjusted.)
"Dic Remove Entry Combine Keys"
(Add child's key to actual node's key.)
C -->; (Save address of userdata area)
A <> E; (Swap addresses of A and E)
B = [A plus DIC LEN]; (Retain first part of bitstream)
C = [E plus DIC LEN]; (Add this number of bits)
B & DIC LEN BITS; (Without branch prediction bits)
C & DIC LEN BITS; (Without branch prediction bits)
A + DIC KEY; (Output buffer of expanded key)
E + DIC KEY; (Input buffer with new bits)
=> Dic Modify Key; (Concatenate the compound key)
A - DIC KEY; (Restore start address)
E - DIC KEY; (Restore start address)
B = [A plus DIC LEN];
A <> E; (A: Child, E: Actual Node)
B & DIC LEN BITS;
[E plus DIC LEN] & DIC BRANCH PREDICTOR;
B + C; (New total length)
[E plus DIC LEN] | B;
<-- C; (Restore address of userdata area)
(Now E inherits A's own children, if any. E's parent remains the same.
The length and key was just updated. The data field, if any, is inherited
from A as well.)
[E plus DIC LEFT] = [A plus DIC LEFT]; (May be -1)
[E plus DIC RIGHT] = [A plus DIC RIGHT]; (May be -1)
[E plus DIC DATA] = [A plus DIC DATA]; (May be -1 as well)
(Note, that above overwrites the original userdata address, which was not
physically removed yet. However, we still have it present in C.)
(The inherited children, if any, still point to their old parent A. They
must be corrected to point to E now.)
B = [E plus DIC LEFT];
? B = DIC NO ADDRESS -> Dic Remove Entry Left Child Ok;