Dynamic Storage Allocation Source Code Library


URI:

http://herbert.gandraxa.com/herbert/dsa_lib.asp

Link template:   

<a href="http://herbert.gandraxa.com/herbert/dsa_lib.asp">Dynamic Storage Allocation Source Code Library</a>


Link symbols:   

Local LinkOn current page | DocumentOn this site | External PageOn external site | WikipediaWikipedia article | Compressed ArchiveZIP archive | PDF documentPDF | E-MailE-Mail


Article

Organization

DocumentHome » DocumentLinoleum Source Code » DocumentDynamic Storage Allocation » Source Code Library

Scope

This is the source code of the Dynamic Storage Allocation library, implemented in the WikipediaLinoleum programming language.

Author

DocumentHerbert Glarner


Source Code

(Dynamic Storage Allocation "DSA.txt" 1.2 - Copyright 2006 Herbert Glarner)

(IMPORTANT: Includes the 'AVL' library. It must be version 1.2 or higher: the
first release 1.0 will cause an error, because it lacked support for free
usage of unused flag bits. AVL 1.1 did contain a bug, causing a wrong tree
balance in special circumstances. Details see in that library's description.
The current AVL version as per time of releasing is 1.3, including several
performance optimizations.)

(*****************************************************************************)

    (=========================================================================)
    (   Copyright Herbert Glarner [2006]
        Mail: herbert.glarner@bluewin.ch

        Free usage granted for all purposes, including commercial, provided
        that above credits are given.

        Suggestions for performance improvement welcome.
        No liability for any damage usage may cause.   )
    (=========================================================================)


    (=========================================================================)
    (Implements Dynamic Storage Allocation for a dedicated Heap Storage.
    The dedicated RAM is allocated dynamically. Consumption of RAM is
    bidirectional: from top to bottom grows a fragmention-free AVL tree
    controlling free memory sizes, from bottom to top grow the chunks used by
    the caller, separated by free chunks causing fragmentation. In between
    lies the memory pool from where chunks are assigned, when they can not be
    granted otherwise. Fragmentation is reduced by immediate coalescing.
    Reusage of coalesced areas is done by exact-fit, if possible, else by
    best-fit splitting.

        +-----------------------+ [avl tree top] never changes after 'Create'
        | AVL Controller        |
        |                       |
        |                       |
        +-----------------------+ [avl tree extent] Grows downwards
        | Memory Pool           |    ^
        | [always free]         |    | When these two would overlap
        |                       |    | we are 'Out of Memory'
        |                       |    v
        +-----------------------+ [avl tree bottom] Grows upwards during
        | Memory Chunks         |                   allocations
        | [Allocated and free   |
        | chunks]               |
        +-----------------------+ [dsa ram bottom]  never changes after 'Create'

        [dsa ram bottom] is a variable pertaining to this library. It stores
        permanently the low boundary of the RAM having been dedicated for DSA
        purposes.

        [avl tree top], [avl tree extent] and [avl tree bottom] all belong
        to the AVL library. Whereas the first is static and never changes,
        the other two are subject to changes. [avl tree extent] is controlled
        by the AVL library itself, but [avl tree bottom] is manipulated by this
        DSA library: it tells the AVL library, how far down the AVL Tree may
        grow, and this is determined by the memory chunks being defined.)
    (=========================================================================)


    (=========================================================================)
    (The AVL tree's entries carry 3 pieces of information each:

        Size:   Information:            Naming in this Lib:

        30 bit  Key                     SIZE
        30 bit  Start chunk chain       DATA
        30 bit  End chunk chain         AUX

    30 bit? Well yes, we manage addresses in the 32-bit-room with this library.
    With 32 bit 2^32 = 4 GB can be addressed. But Linoleum addresses do not
    address individual bytes, but units of 4 bytes each, and these units are
    numbered from 0 on. Thus, even if we really had and would address 4 GB, we
    number them from 0 to 1 GigaUnit-1. To store these numbers, 30 suffice.

    And why is this important?

    An AVL node effectively has a 32 bit Key and a 32 bit Data field, so this
    obviously is not enough for all the information. However, there's also the
    Flags field: as long as we do not interfer with the 4 flags managed by the
    AVL library itself, we can use this field's remaining 28 bits.

    Now the 28 extra bits are not quite enough to store an additional 30-bit-
    address. Therefore we store the 2 missing bits somewhere else. This place
    is the 2 most significant bits {MSB} of the node's DATA field, of which we
    only use the 30 least significant bits {LSB} anyway, for exactly the same
    reason.

    In this library, we call the 30 LSB of the Data field 'DATA', and the
    combination of the 2 MSB of the Data field plus the 28 MSB of the Flags
    field 'AUX'.

    The 3 fields of an AVL node map this way:

        32 bit                  32 bit                  32 bit
        +-------------------+   +-------------------+   +-------------------+
        | AVL KEY           |   | AVL DATA          |   | AVL FLAGS         |
        +-------------------+   +-------------------+   +-------------------+
        Determines the nodes    The lower 30 LSB are    The 4 LSB are used by
        key. It is identical    the 'DATA' field. The   the AVL library. The
        with the chunks sizes   2 MSB are the upper     28 MSB are the lower
        which are managed by    compenent of AUX.       compenent of AUX.
        this node.

        2 bit  30 bit           2 bit  30 bit           28 bit         4 bit
        +-----+-------------+   +-----+-------------+   +--------------+----+
        | '00'| KEY         |   | AUX1| DATA        |   | AUX2         |XXXX|
        +-----+-------------+   +-----+-------------+   +--------------+----+
                                                        NEVER touch the 4 LSB
                                                        bits.

                                AUX1 and AUX2 put together form the AUX field.

        2 bit  30 bit                 30 bit                2 bit  28 bit
          +-------------+             +-------------+       +-----+---------+
        00| KEY         |           00| DATA        |     00| AUX1| AUX2    |
          +-------------+             +-------------+       +-----+---------+
        The 2 MSB are               The 2 MSB are         The 2 MSB are
        always clear.               always clear.         always clear.
    )
    (=========================================================================)


    (=========================================================================)
    (Routines summary:
        Name of routine                 Short description)
    (-------------------------------------------------------------------------)
    (   ----------------------------------------------------------------------
        PUBLIC ROUTINES:
        ----------------------------------------------------------------------

            ------------------------------------------------------------------
            [Dedicated Ram Area Handling]
            Dsa Create                  Do this first to tell about memory
            Dsa Destroy                 Do this last to clean up memory
            ------------------------------------------------------------------
            [Allocating and deallocating chunks]
            Dsa Alloc                   Allocate a memory chunk by size
            Dsa Free                    Free an allocated chunk by address
            ------------------------------------------------------------------
            [Working with allocated chunks]
            Dsa Clear                   Write NULL into all usable units
            Dsa Size [1]                Get the size of an allocated chunk
            ******************************************************************
            UPDATE 1.2 START
            [Special purpose routines, newly introduced with update 1.2]
            Dsa Fragmentation           Analyzes fragmentation
            Dsa Cache Alignment         Analyzes cache alignment
            UPDATE 1.2 END
            ******************************************************************

            ------------------------------------------------------------------
            [1] However, why would you not know the size of the object you
                once allocated? If this can be the case for some reason,
                consider to store the length as a part of the record/object/
                structure for which the memory chunk is allocated.
            ------------------------------------------------------------------

        ----------------------------------------------------------------------
        INTERNAL ROUTINES:
        ----------------------------------------------------------------------
            Dsa Add to Node
            Dsa Bypass Chunk
            Dsa Make Top Chunk
            Dsa Read Node Aux
            Dsa Remove From Node
            Dsa Write Node Aux
        ----------------------------------------------------------------------
        Usage of Internal Routines from the outside world is strongly
        discouraged: Most change some chunks pointers in a way granting a nice
        crash. The only 'undangerous' one is 'Dsa Read Node Aux'.
        ----------------------------------------------------------------------
    )
    (=========================================================================)


    (=========================================================================)
    (Memory requirements:
        As much as is assigned with 'Dsa Create', plus 8 units for variables
        and an additional 6 units to save the library's own AVL data set.
    )
    (=========================================================================)


    (=========================================================================)
    (Version history:
        23 Jun 2006: 1.0 Released
        25 Jul 2006: 1.1 Released. Encapsulates all AVL data used by this lib
                     such, that an indipendent AVL data set outside of this lib
                     is not affected, thus enabling the parallel usage of an
                     independent AVL tree in co-existence with this library.
                     Or in short: Just use the external AVL tree as usual, this
                     library's AVL tree is transparent and causes no conflicts.
        29 Nov 2006: 1.2 Released.
                     STATISTICS. [dsa num alloc] and [dsa num free] added to
                     count the total number of both the successfull allocations
                     and deallocations for statistical purposes.
                     FLOW CONTROL. [dsa control] and DSA CONTROL XXX constants
                     control behaviour of the DSA library for fine-tuning issues
                     related to faster cache access.
                     NEW ANALYZE FUNCTIONS. The routines 'Dsa Fragmentation'
                     and 'Dsa Cache Alignment' allow to analyze how the storage
                     is treated by the DSA library.
                     OPTIMIZATIONS. Careful code review regarding common branch
                     prediction mechanisms and instruction scheduling led to
                     modifications in several dozen places.
    )
    (=========================================================================)

(*****************************************************************************)
"libraries"

    (=========================================================================)
    (AVL Tree)
    (-------------------------------------------------------------------------)
    (The AVL library provides a controller to quickly find free memory chunks
    for re-usage, thus avoiding memory fragmentation as much as possible. It
    sits on top of the dedicated RAM. The source of the AVL Library can be
    found at
        http://herbert.wikispaces.com/AVL+Tree
    )
    (-------------------------------------------------------------------------)
        AVL;                            (Requires version 1.2 or higher)
    (=========================================================================)

(*****************************************************************************)
"directors"
        program name = { DSA };
        unit = 32;

(*****************************************************************************)
"constants"                             (Convention: "DSA ...", UPPER CASE)

    (=========================================================================)
    (Error codes)
    (-------------------------------------------------------------------------)
        DSA ERR NONE = 0;               (No error)
    (-------------------------------------------------------------------------)
    (From 'Dsa Create')
        DSA ERR DSA EXISTS = 1;         (Use 'Dsa Destroy' to get rid of it)
        DSA ERR LIMIT EXCEEDED = 2;     (We can't allocate 1 GUnit or more)
        DSA ERR NOT ENOUGH SPACE = 3;   (Not even space for a single node)
        DSA ERR REQUESTED TOO MUCH = 4; (Attempt to dedicate too much RAM)
        DSA ERR AVL INIT FAILED = 5;    (AVL tree could not be initialized)
    (-------------------------------------------------------------------------)
    (From 'Dsa Destroy')
        DSA ERR DSA NOT EXISTING = 6;   ('Dsa Create' was not used/successful)
        DSA ERR SPACE NOT RETURNED = 7; (RAM not given back to the OS)
    (-------------------------------------------------------------------------)
    (From 'Dsa Alloc')
        ( DSA ERR LIMIT EXCEEDED = 2;)  (We can't request 1 GUnit or more)
        ( DSA ERR DSA NOT EXISTING = 6;)('Dsa Create' was not used/successful)
        DSA ERR OUT OF MEMORY = 8;      (Request can not be fulfilled)
        DSA ERR AVL ERROR = 9;          (Details are in [avl error])
    (-------------------------------------------------------------------------)
    (From 'Dsa Free')
        ( DSA ERR LIMIT EXCEEDED = 2;)  (We can't free addresses >= 1 GUnit)
        ( DSA ERR DSA NOT EXISTING = 6;)('Dsa Create' was not used/successful)
        ( DSA ERR OUT OF MEMORY = 8;)   (Request can not be fulfilled)
        ( DSA ERR AVL ERROR = 9;)       (Details are in [avl error])
        DSA ERR NO ALLOCATED CHUNK = 10;(Attempt to free a not allocated chunk)
    (-------------------------------------------------------------------------)
    (From 'Dsa Size')
        ( DSA ERR LIMIT EXCEEDED = 2;)  (Such an address was never allocated)
        ( DSA ERR DSA NOT EXISTING = 6;)('Dsa Create' was not used/successful)
        ( DSA ERR NO ALLOCATED CHUNK = 10;)(Such an address was never allocated)
    (-------------------------------------------------------------------------)
    (From 'Dsa Clear')
        ( DSA ERR LIMIT EXCEEDED = 2;)  (Such an address was never allocated)
        ( DSA ERR DSA NOT EXISTING = 6;)('Dsa Create' was not used/successful)
        ( DSA ERR NO ALLOCATED CHUNK = 10;)(Such an address was never allocated)
    (-------------------------------------------------------------------------)
    (From internal routine 'Dsa Make Top Chunk', passed to 'Dsa Alloc')
        ( DSA ERR OUT OF MEMORY = 8;)   (Request can not be fulfilled)
    (-------------------------------------------------------------------------)
    (From internal routine 'Dsa Add To Node', passed to 'Dsa Alloc' or
    'Dsa Free', resp.)
        ( DSA ERR OUT OF MEMORY = 8;)   (Request can not be fulfilled)
        ( DSA ERR AVL ERROR = 9;)       (Details are in [avl error])
    (-------------------------------------------------------------------------)
    (From internal routine 'Dsa Remove From Node', passed to 'Dsa Alloc' or
    'Dsa Free', resp.)
        ( DSA ERR AVL ERROR = 9;)       (Details are in [avl error])
    (-------------------------------------------------------------------------)
    (From internal routine 'Dsa Bypass Chunk', passed to 'Dsa Free'.
    If appearing, these would be programming errors, which need to be fixed, as
    they would compromise the library's functionality. However, multi-billion
    tests with random length allocations and deallocations did not cause such
    inconsistencies. If they happen to you, a notification would be much
    appreciated.)
        DSA ERR INTERNAL 1 = 11;        (Comments see 'Dsa Bypass Chunk')
        DSA ERR INTERNAL 2 = 12;        (Comments see 'Dsa Bypass Chunk')
        DSA ERR INTERNAL 3 = 13;        (Comments see 'Dsa Bypass Chunk')
    (*************************************************************************)
    (UPDATE 1.2 START)
    (From 'Dsa Cache Alignment')
        ( DSA ERR DSA NOT EXISTING = 6;)('Dsa Create' was not used/successful)
        DSA ERR INV CACHE LINE SIZE = 14;  (Cache line size must be power of 2)
    (UPDATE 1.2 END)
    (*************************************************************************)
    (=========================================================================)

    (=========================================================================)
    (Memory chunk header, just before the user data. In essence, the 4 units
    provide a quadruply linked list. The first two chain memory chunks to each
    other as they appear in physical memory. There is just one such chain
    encomprising all allocated and free memory chunks. Memory chunks can be
    coalesced or split, which leads to insertion and removal of these pointers.
    The other two chain memory chunks of same size, but only for free chunks.
    Their start and end is referenced by an AVL Node, whose key is the chunk
    size. Thus, depending on the different 'object' sizes an application
    manages, there may be just a few or also many individual chains. The first
    of the 4 pointers also contains a flag in its MSB, telling if this chunk
    currently is free or allocated. If it is free, it is part of a size chain.)
    (-------------------------------------------------------------------------)
        DSA HEADER SIZE = 4;
            DSA CHUNK PREV = 0;         (Pointer to previous memory chunk)
                (If the following flag is set in CHUNK PREV, then the chunk is
                allocated and in use; else it is a free memory chunk.)
                DSA ALLOCATED = 8000 0000h;
            DSA CHUNK NEXT = 1;         (Pointer to next memory chunk)
            DSA SIZE PREV = 2;          (Pointer to prev. chunk with same size)
            DSA SIZE NEXT = 3;          (Pointer to next chunk with same size)
                (In all 4 pointers, the following '31-bit value' -1 can appear
                as an indication, that this is the start or end, resp., of the
                actual chain.)
                DSA CHAIN END = 7FFF FFFFh;
    (=========================================================================)

    (=========================================================================)
    (The variable [dsa status] will hold two flags about the current status
    of the Dynamic Storage Allocator.)
    (-------------------------------------------------------------------------)
    (The following flag is set when 'Dsa Create' was successful.)
        DSA STATUS INIT      = 0000 0001h;
    (-------------------------------------------------------------------------)
    (UPDATE 1.1 START)
    (This flag is set, when the library currently uses its own AVL data set.
    If it is clear, then we would access the AVL data set valid outside of
    this library: without switching to the local set a crash is guaranteed.)
        DSA STATUS LOCAL SET = 0000 0002h;
    (UPDATE 1.1 END)
    (-------------------------------------------------------------------------)
    (Theoretically, the other bits within the [dsa status] variable are
    free to be used. However, usage is discouraged.)
    (=========================================================================)

    (*************************************************************************)
    (UPDATE 1.2 START)
    (The variable [dsa control] will hold flags enabling the caller to control
    certain runtime behaviours of the Dynamic Storage Allocator.)
    (-------------------------------------------------------------------------)
    (If the following flag is set, memory always is allocated from the memory
    pool and never reused from free chunks. In certain circumstances, this can
    be exploited to make advantage of the L1 data cache.)
        DSA CONTROL NO REUSE = 0000 0001h;
    (-------------------------------------------------------------------------)
    (Theoretically, the other bits within the [dsa status] variable are
    free to be used. However, usage is discouraged.)
    (UPDATE 1.2 END)
    (*************************************************************************)

(*****************************************************************************)
"variables"                             (Convention: "dsa ...", lower case)

    (=========================================================================)
    (Error condition as per DSA ERR XXX constants.)
    (-------------------------------------------------------------------------)
        dsa error = 0;
    (=========================================================================)

    (=========================================================================)
    (Flags about the current status of the library. The bits are given in
    the DSA STATUS XXX constants.)
    (-------------------------------------------------------------------------)
        dsa status = 0;
    (=========================================================================)

    (=========================================================================)
    (The AVL Controller's nodes must be stored somewhere. The variables of the
    AVL library itself tell what is stored where. They are:
        avl tree top                    exclusive the given address
        avl tree extent                 bottommost node of the AVL controller
    The AVL library variable [avl tree bottom] holds the maximum extent
    towards the bottom to which the AVL tree may grow. Since this is not a
    static value, but changes with ongoing allocations and deallocations, it
    is programmatically manipulated within this library, as soon as such is
    required. However, it still is neccessary, that we store the physically
    bottommost address of the dedicated RAM somewhere, and the following
    variable is this place. Initially, it is identical with [avl tree bottom].)
    (-------------------------------------------------------------------------)
        dsa ram bottom = 0;
    (=========================================================================)

    (=========================================================================)
    (When usage of this library is not needed any longer, the memory dedicated
    to it should be given back to the OS by calling 'Dsa Destroy'. For this
    reason, the original content of [RAM Top] when calling 'Dsa Create' must
    be restored. To be restored, it must be saved somewhere. This is done in
    this variable during creation.)
    (-------------------------------------------------------------------------)
        dsa old ram top = 0;            (Only valid if DSA STATUS INIT is set)
    (=========================================================================)

    (=========================================================================)
    (This variable stores the header address of the physically last memory
    chunk below [avl tree bottom]. A content of -1 signals, that there is no
    such chunk yet/anymore.)
    (-------------------------------------------------------------------------)
        dsa last chunk = minus 1;
    (=========================================================================)

    (*************************************************************************)
    (UPDATE 1.1 START)

    (This library uses an own AVL data set completely independent from whatever
    the AVL variables may contain outside of this library. The original content
    is saved in these variables when a DSA routine is called. It is restored
    to the original values when the called DSA routine is left.)
    (-------------------------------------------------------------------------)
        dsa avl error = 0;
        dsa avl status = 0;
        dsa avl tree top = 0;
        dsa avl tree bottom = 0;
        dsa avl tree root = 0;
        dsa avl tree extent = 0;
    (UPDATE 1.1 END)
    (*************************************************************************)

    (*************************************************************************)
    (UPDATE 1.2 START)

    (For statistical reasons, the total number of successful allocations as
    well as the total number of succesful deallocations are counted. These
    variables can be accessed from the outside.)
    (-------------------------------------------------------------------------)
        dsa num alloc = 0;
        dsa num free = 0;
    (UPDATE 1.2 END)
    (*************************************************************************)

    (*************************************************************************)
    (UPDATE 1.2 START)

    (Takes flags to control certain runtime behaviour aspects. The according
    DSA CONTROL XXX flags can be set or cleared by the caller.)
    (-------------------------------------------------------------------------)
        dsa control = 0;
    (UPDATE 1.2 END)
    (*************************************************************************)

(*****************************************************************************)
"workspace"                             (Convention: "dsa ...", lower case)

    (None)

(*****************************************************************************)
"programme"                             (Convention: "Dsa ...", Mixed case)

    (=========================================================================)
    (This is a library only, it can not be run stand-alone. See "DSA Test.txt"
    for a test setup.)
    (-------------------------------------------------------------------------)
        end;
    (=========================================================================)

(*****************************************************************************)
"Dsa Create"

    (=========================================================================)
    (Initializes a DSA environment. A call to this routine is required, before
    any allocation or deallocation takes place. When the DSA is not needed
    any longer, destroy it by calling 'Dsa Destroy': this will free the memory
    dedicated to this library and return it to the OS.)
    (-------------------------------------------------------------------------)
    (Input:     D:  Number of dedicated memory units at 4 Bytes each, max 1G-1
    Output:     -

    All registers unchanged.

    May return errors, check [dsa error] after 'failed'. It may contain:
        DSA ERR DSA EXISTS              Use 'Dsa Destroy' to get rid of it
        DSA ERR LIMIT EXCEEDED          We can't allocate 1 GUnit or more
        DSA ERR NOT ENOUGH SPACE        Not even space for a single node
        DSA ERR REQUESTED TOO MUCH      Attempt to dedicate too much RAM
        DSA ERR AVL INIT FAILED         AVL tree could not be initialized

    Return with 'ok' guarantees that the requested RAM has been allocated.
    Returning with an error guarantees that no additional RAM was allocated.
    )
    (=========================================================================)
    (ENTRY)
        D -->; E -->;
        => Dsa Local Avl Data;          (Switch to local AVL Data)
    (=========================================================================)
    (VERIFICATION)

    (A new storage area can not be allocated as long as an existing one still
    is in place. In this case, destroy the old one first by calling 'Dsa
    Destroy'.)

    (UPDATE 1.2. Branch prediction optimization. Handler at bottom of routine.)
        ? [dsa status] + DSA STATUS INIT -> Dsa Create Old Dsa Exists;
    (-------------------------------------------------------------------------)
    (Check, that the number of requested memory units is below 1 G-Units. This
    equals 4 GB. If it is above, 32bit-OSs can't handle it.)

    (UPDATE 1.2. Branch prediction optimization. Handler at bottom of routine.)
        ? D + C000 0000h -> Dsa Create Limit Exceeded;

    (UPDATE 1.2. TECHNICAL NOTE: It is not beneficial to align the upper
    boundary to the size of a cache line, which on Pentiums usually is 32 bytes
    = 8 units: the by far more important thing is to keep that AVL tree's nodes
    tightly together without any gaps, and being completely fragmentation-free
    this is ensured by the AVL tree library itself.)
    (-------------------------------------------------------------------------)
    (Check, that we have at least the space for a single node. The size of a
    node is stored in the AVL library's constant AVL NODE SIZE. If we don't, we
    can't even create a root for the AVL tree.)

    (UPDATE 1.2. Branch prediction optimization. Handler at bottom of routine.)
        ? D < AVL NODE SIZE -> Dsa Create Not Enough Space;
    (=========================================================================)
    (PROCESS)

    (Formally, the boundaries are ok. Attempt to create dynamic space to be
    used by this library. However, because we need to restore the current
    RAM pointer later on when calling 'Dsa Destroy', we must store its
    content for later retrieval.)
        E = [RAM Top];                  (To become bottom of dedicated RAM)
        [dsa old ram top] = E;          (Restored to this with 'Dsa Destroy')
        D + E;                          (To become top of dedicated RAM)
        [RAM Top] = D;                  (Tell the kernel about new top value)
        isocall;                        (Attempt to set dedicated RAM)
    (UPDATE 1.2. Branch prediction optimization. Handler at bottom of routine.)
        ? failed -> Dsa Create Ram Not Allocated;

    (Further error exists within this routine from now on first need to set
    back the just allocated RAM.)
    (-------------------------------------------------------------------------)
    (Store the physical bottom of the memory area being managed by this lib.)
        [dsa ram bottom] = E;       (Library may access this, but not below)
    (-------------------------------------------------------------------------)
    (Now create the still empty AVL Controller.)
        (D = D;)                    (Top of RAM)
        (E = E;)                    (Bottom of RAM)
        => Avl Init;                (Create AVL Controller)
    (UPDATE 1.2. Branch prediction optimization. Handler at bottom of routine.)
        ? failed -> Dsa Create Avl Controller Failed;
    (-------------------------------------------------------------------------)
    (Everything was fine. Set internal flag to tell that a dedicated memory
    area was successfully created and the AVL controller initialized. All other
    flags remain clear. The DSA is ready to be used now.)
        [dsa status] | DSA STATUS INIT;     (Only flag set is DSA STATUS INIT)
        (-> Dsa Create Exit OK;)
    (=========================================================================)
    (EXIT)

    "Dsa Create Exit OK"
        => Dsa Global Avl Data;         (Switch to original AVL Data)
        <-- E; <-- D;
        [dsa error] = DSA ERR NONE;
        end;
    (-------------------------------------------------------------------------)
    "Dsa Create Exit Failed"
        => Dsa Global Avl Data;         (Switch to original AVL Data)
        <-- E; <-- D;
        fail;
    (=========================================================================)
    (UPDATE 1.2 START. Error handlers moved down from inline handling for
    Branch Prediction Optimization.)

    "Dsa Create Old Dsa Exists"
        [dsa error] = DSA ERR DSA EXISTS;
        -> Dsa Create Exit Failed;

    "Dsa Create Limit Exceeded"
        [dsa error] = DSA ERR LIMIT EXCEEDED;
        -> Dsa Create Exit Failed;

    "Dsa Create Not Enough Space"
        [dsa error] = DSA ERR NOT ENOUGH SPACE;
        -> Dsa Create Exit Failed;

    "Dsa Create Ram Not Allocated"
        [dsa error] = DSA ERR REQUESTED TOO MUCH;
        -> Dsa Create Exit Failed;

    "Dsa Create Avl Controller Failed"
    (The AVL controller could not be created. Reset memory, it's of no use
    when we return with an error.)
        [dsa error] = DSA ERR AVL INIT FAILED;
        [RAM Top] = [dsa old ram top];
        -> Dsa Create Exit Failed;

    (UPDATE 1.2 END)
    (=========================================================================)

(*****************************************************************************)
"Dsa Destroy"

    (=========================================================================)
    (Destroys a DSA environment formerly created by 'Dsa Create'. A call to
    this routine returns the formerly dedicated RAM area to the OS.)
    (-------------------------------------------------------------------------)
    (Input:     -
    Output:     -

    All registers unchanged. In fact, none is used.

    May return errors, check [dsa error] after 'failed'. It may contain:
        DSA ERR DSA NOT EXISTING        'Dsa Create' was not used/successful
        DSA ERR SPACE NOT RETURNED      RAM not given back to the OS

    Return with 'ok' guarantees that the requested RAM has been given back to
    the OS.
    Returning with an error guarantees that the RAM is still alocated; if it
    ever was created, that is.
    )
    (=========================================================================)
    (VERIFICATION)

    (Make sure, that there was a former successful call to 'Dsa Create'. If
    there was none, or if it was not successful, we really should not reset
    the RAM to an aribitrary content and leave it as it is.)

    (UPDATE 1.2. Branch prediction optimization. Handler at bottom of routine.)
        ? [dsa status] - DSA STATUS INIT -> Dsa Destroy No Dsa;
    (=========================================================================)
    (PROCESS)

    (During 'Dsa Create', the old content of [RAM Top] was saved into the
    variable [dsa old ram top]. It is reset to this value now, discarding the
    whole content which may be in there.)
        [RAM Top] = [dsa old ram top];
        isocall;
    (I would not know why the OS would not want to have memory back, but
    anyway, if it fails, we return with an error.)

    (UPDATE 1.2. Branch prediction optimization. Handler at bottom of routine.)
        ? failed -> Dsa Destroy Memory Not Returned;
    (-------------------------------------------------------------------------)
    (Reflect, that we have no more DSA area now by clearing the status bit.
    This makes it ready for another call to 'Dsa Create'.)
        [dsa status] # DSA STATUS INIT;
    (=========================================================================)
    (EXIT)

    "Dsa Destroy Exit OK"
        [dsa error] = DSA ERR NONE;
        end;
    "Dsa Destroy Exit Failed"
        fail;
    (=========================================================================)
    (UPDATE 1.2 START. Error handlers moved down from inline handling for
    Branch Prediction Optimization.)

    "Dsa Destroy No Dsa"
        [dsa error] = DSA ERR DSA NOT EXISTING;
        -> Dsa Destroy Exit Failed;
    "Dsa Destroy Memory Not Returned"
        [dsa error] = DSA ERR SPACE NOT RETURNED;
        -> Dsa Destroy Exit Failed;

    (UPDATE 1.2 END)
    (=========================================================================)

(*****************************************************************************)
"Dsa Alloc"

    (=========================================================================)
    (Allocates a memory chunk of desired size in units, returning the address
    of the first unit of the userdata area. Make no assumptions about where
    this address is: it can be a re-used memory area, which was deallocated
    quite a while ago. Never write content beyond the limits of the returned
    area: this in any case leads to destroyed pointers with a guaranteed
    failure.)
    (-------------------------------------------------------------------------)
    (Input:     D:  Number of units at 4 Bytes each, theoretical max 1G-1
    Output:     E:  Address of allocated chunk, if no error occured. Don't use
                    this value if there was an error.

    NOTE: It is possible to disallow reusage of deallocated memory by setting
    the flag DSA CONTROL NO REUSE in the variable [dsa control]. Use this with
    caution, as allocations will be done exclusively from the memory pool, thus
    increasing fragmentation . Usage is recommended only for such chunks which
    must be held closely together to take advantage of the processor's cache
    mechanisms.

    All registers except E unchanged.

    May return errors, check [dsa error] after 'failed'. It may contain:
        DSA ERR LIMIT EXCEEDED          We can't request 1 GUnit or more
        DSA ERR DSA NOT EXISTING        'Dsa Create' was not used/successful
        DSA ERR OUT OF MEMORY           Request can not be fulfilled
        DSA ERR AVL ERROR               Details are in [avl error]
    )
    (=========================================================================)
    (ENTRY)
        A -->; B -->; C -->; D -->;
        => Dsa Local Avl Data;          (Switch to local AVL Data)
    (=========================================================================)
    (VERIFICATION)

    (Make sure, that there was a former successful call to 'Dsa Create'. If
    there was none, or if it was not successful, an error is returned.)

    (UPDATE 1.2. Branch prediction optimization. Handler at bottom of routine.)
        ? [dsa status] - DSA STATUS INIT -> Dsa Alloc Dsa Not Existing;
    (-------------------------------------------------------------------------)
    (Check, that the number of requested memory units is below 1 G-Units. This
    equals 4 GB. If it is 1 GU or above, a 32 bit OS can't handle it.)

    (UPDATE 1.2. Branch prediction optimization. Handler at bottom of routine.)
        ? D + C000 0000h -> Dsa Alloc Maximum Exceeded;
    (=========================================================================)
    (PROCESS)

    (If there is not a single AVL node, there surely is nothing we can reuse.
    There is no AVL node, if there is no root. There is no root, if the AVL
    library's variable [avl status] has the AVL STATUS ROOT flag clear. This is
    the case just after Init, but also when the last still allocated memory
    chunk was returned to the pool, possibly after coalescing with a lower
    already free chunk. In this case, allocate from memory pool.)
        ? [avl status] - AVL STATUS ROOT -> Dsa Alloc From Pool;

    (UPDATE 1.2. If the [dsa control] variable has a set DSA CONTROL NO REUSE
    flag, then new chunks are always allocated from the low end of the memory
    pool. This can be exploited to organize {or reorganize} data such, that it
    is better suited for caches. WARNING: if used excessively, chances are that
    we run out of memory soon, even when there would be plenty of space, since
    no deallocated chunks are reused in this mode.)
        ? [dsa control] + DSA CONTROL NO REUSE -> Dsa Alloc From Pool;
        (-> Dsa Alloc Reuse;)
    (-------------------------------------------------------------------------)
    ("Dsa Alloc Reuse")

    (There is at least one node in the AVL tree. In D is the requested number
    of units. Look for exact fit first, i.e. if a node with key D exists.)

    (We want the address of the node, for one to access its AUX field if we
    find it, and for the other to invoke a faster 'abbreviated' find successor
    algorithm to find a larger free chunk which can be split. There are nodes,
    else we would not be here, thus we may use the AVL-internal routine
    'Avl Locate Node'. If the routine finds such a node, it will set the flag
    AVL STATUS FOUND in the AVL library's status variable [avl status]. If such
    a node does not exist, the flag will be clear.)
        E = D;                              (Node key is the size)
        => Avl Locate Node;                 (Node address in B if found)

    (UPDATE 1.2. Branch prediction optimization. Typically it is more likely
    that we have an excact fit once an AVL tree exists, because 'objects' of a
    specific 'class' usually won't change their size in time. Therefore we let
    an exact match preferably fall through and branch only when no such size
    exists.)
        ? [avl status] - AVL STATUS FOUND -> Dsa Alloc No Exact Fit;
        (-> Dsa Alloc Exact Fit;)
    (-------------------------------------------------------------------------)
    (UPDATE 1.2: This code section was moved to here from further below for
    branch prediction optimization.)

    ("Dsa Alloc Exact Fit")

    (The number of requested units is in D and E. We found a node which manages
    at least one chunk of exactly that requested size. Its address is in B.
    The node's fields DATA and AUX hold the two ends of the size chain, i.e.
    both point to a free chunk. We want to re-use the chunk with the lower
    address, keeping high addresses free as long as possible. This is part of
    the strategy to return upper addresses as soon as possible to the memory
    pool. During deallocation, we made sure, that lowest addresses were placed
    at the start of the size chain. Therefore, it is from this end that we
    re-allocate the chunk. Get its header address.)
        A = [B plus AVL DATA];          (DATA contains the chain start)
        A & 3FFF FFFFh;                 (The 2 MSB belong to AUX)

    (The chunk A will become allocated. There is no need anymore to have it
    managed by the size node B and via it by the chunks' size chain. Remove
    if from the chain. Note, that this may lead to removing size node B.)
        (A = A;)                            (Chunk to be un-managed)
        => Dsa Remove From Node;
        ? failed -> Dsa Alloc Exit Failed;  (May return with AVL error)

    (Flag the chunk to be allocated now. Then return the userdata address of
    chunk A to the caller. Done.)
        [A plus DSA CHUNK PREV] | DSA ALLOCATED;
        E = A;                          (Header address of re-allocated chunk)
        E + DSA HEADER SIZE;            (Userdata address)
        -> Dsa Alloc Exit OK;           (Done)
    (-------------------------------------------------------------------------)
    "Dsa Alloc No Exact Fit"

    (The number of requested units is in D and E. We found no node which
    manages exactly this requested size. But B nevertheless points to a node:
    it would be the parent of the node with the requested size, if such a node
    would be inserted. The Avl routine even tells us, which child of B such a
    node would become: this was coded into the status flag AVL STATUS LEFT.
    These results can be exploited for an abbreviated 'find successor'
    algorithm which needs to be invoked now to possibly locate a splittable
    node.)

    (If the requested size was smaller than B, then it would be attached as
    its left child, if larger then as its right child. If its on the left,
    then B can or can not have a right subtree already, but it certainly can
    not have a left child yet. Since B's right subtree must contain nodes
    which all are larger than B, and since B itself must be larger than the
    requested size, and because there is no left subtree of B which could
    provide smaller keys, B already /is/ the node with the smallest key larger
    than the requested size, i.e. the successor of the hypothetically inserted
    node.)
        (DO NOT DO THIS YET. Continue reading for the reason.)
        (? [avl status] + AVL STATUS LEFT -> Dsa Alloc Split Chunk;)

    (If the node would be appended as B's right child, then B is smaller.
    If B's parent has B as its right child as well, than the parent is even
    smaller than B. This is valid for all ancestors of B all the way up to the
    root, as long as they have their child leading to B on the right. Should
    such an ancestor have its child on the left, though, then that ancestor is
    larger than B and all the ancestors in between the two nodes. Thus it
    suffices to follow B's ancestors up until we find one having a key = size
    larger than what we requested. This is done in max. lg n loop iterations,
    where n are all existing nodes. Thus it is fast. If there is no such node,
    then the requested size is larger than every existing node. In that case we
    need to allocate from the pool. We can start the loop with B's parent,
    because we know already, that B is too small. However, if B already is the
    root, i.e. has no parent, we can immediately proceed with allocation from
    the pool.)

    (HOWEVER: Before searching, it is valuable to think about the use we make of
    a possibly found node. A such manages at least one free chunk. One of these
    chunks is split into 2 parts: one of these parts is going to be re-allocated
    and the other becomes a shorter still free chunk. In other words, we split
    one chunk into two. But two chunks require 2 headers, whereas one chunk
    has only one. Now we can potentially run into a situation, that we find a
    larger node, which still is not large enough to hold the additional header.
    This of course is not a node we can split, when the second free part would
    not even be able to hold the complete header. But also if it was just able
    to hold the header we can not really make use of it: after all, what sense
    does a header make 'managing' 0 userdata units? An example: Say, headers
    are 4 units. We want to assign 1000 units. There is no node managing 1000
    units. But the successor node is a node managing 1004 units. Now, although
    the node says it manages a chunk with 1004 units, the chunk actually
    occupies 1008 units due to its header. When we re-allocate the requested
    1000 units, we occupy 1004 due to the header. That leaves 4 units free for
    the second still free chunk, but 4 minus /its/ header leaves 0 units to the
    user. No user will allocate 0 units, and so we have created a logical
    fragmentation: 4 units managing nothing, but still blocked from further
    usage. - For this reason, we actually search for a node, which is large
    enough to provide an additional header and at least 1 userdata unit. The
    registers D and E both have the requested units. The original request is
    left in D, the key for the node we effectively search is calculated into E.)
        E + DSA HEADER SIZE;            (Space for additional header)
        E +;                            (Space for at least 1 userdata unit)

    (Now this has quite severe implications. Especially is it possible, that a
    Node with size E effectively exists. We have no guarantee that we by
    coincidence hit it with outlined algorithm. We may in fact be in a
    completely wrong part of the AVL tree. We /can/ find the node with key>=E,
    but this potentially involves climbing up the tree and descending on
    another part and definitely is not the fastest possible search. It in fact
    is more efficient to simply search again, but this time for node E. If
    there is such a node, we nevertheless jump to the splitter, and not to an
    exact match, since it is none relative to D. If we don't find such a node,
    we distinguish again between the left and the right side. If left, we found
    the node to be split. If right, /then/ we can search for a node > E by
    examining its parents, as outlined above.)

    (But since we have already a B, we take advantage of it. If by coincidence
    a new node with key E would be inserted to the left of node B, /and/ B's
    key still is equal to or larger than E, then a new search would provide no
    other result into B. In this case we omit the new search and take the fast
    lane.)

    (UPDATE 1.2. Analysis: Typically, both conditional branches are fall-through,
    with estimated percentages of 75% and 99.8%, resp.)
        ? [avl status] - AVL STATUS LEFT -> Dsa Alloc Find Node; (to the right)
        ? [B plus AVL KEY] >= E -> Dsa Alloc Split Chunk;   (left, and size ok)

    (Search for the required minimum size to allow for an additional header and
    at least 1 userdata unit.)
    "Dsa Alloc Find Node"
        (E = E;)                        (Key is the adjusted minimum size)
        => Avl Locate Node;             (Node address in B if found)

    (If node found, or if not found but potential left child, then we split
    one of the chunks managed by node B. Note, that we /split/ even if such a
    node was found: we were not searching for an exact size D anymore, but
    for at least a size E which /allows/ meaningful splitting.)

    (UPDATE 1.2. Analysis: Typically we don't find such a node for the vast
    majority of all cases and thus fall through.)
        ? [avl status] + AVL STATUS FOUND plus AVL STATUS LEFT
                                            -> Dsa Alloc Split Chunk;

    (Search for a node being larger than the adjusted minimum size as
    described above. Being an AVL tree, this requires maximum 1.45 lg n
    iterations, with n being all nodes in the tree. If we tested the root, then
    such a node does not exist, which means, that the requested size is larger
    than every free memory area; in that case we need to allocate from the
    memory pool.)

    (UPDATE 1.2. Analysis: Climbing up to the root is a rare event and thus we
    fall through. Iteration of the loop is depending on the tree's height.
    Since it's an AVL tree, it's height is generally not tall: typically 3
    iterations are common. Because it's backwards, the conditional jump is
    predicted taken, which suits us well.)

    "Dsa Alloc Search Node"                                 (Loop start)
        ? [B plus AVL PARENT] = minus 1 -> Dsa Alloc From Pool;     (Is root)
        B = [B plus AVL PARENT];                            (B gets B's parent)
        ? [B plus AVL KEY] < E -> Dsa Alloc Search Node;    (Still too small)
        (-> Dsa Alloc Split Chunk;)                         (Found larger)
    (-------------------------------------------------------------------------)
    (UPDATE 1.2. Additional code flow comment: All calls having entered the
    previous code section at 'Dsa Alloc No Exact Fit' meet here again.)

    "Dsa Alloc Split Chunk"

    (The number of requested units is in D. We found no node which manages
    exactly this requested size, but a node with the smallest size E larger
    than what is needed to allow for a meaningful splitting is in B. This node
    manages at least one free chunk with that too large size, of which we need
    to split one into two parts. One of these two parts becomes the new
    allocated chunk, and the other remains a free chunk, with a smaller size
    than before, though.)

    (The first decision is, from which one of the two ends the node B points to
    we detach the chunk to be re-used. The start of the chain in the node's
    field DATA points to a comparably low memory address: this was made sure
    when the chunk was deallocated. The end in the AUX field tendentially
    points to higher addresses. In order to allow the memory pool to recollect
    as many of the chunks at higher addresses as possible, we leave the upper
    ones untouched.)

    (Save number of requested units: we need register D.)
        C = D;

    (The start of the size chain is in the node's DATA field. Read it.)
        D = [B plus AVL DATA];          (The node's data field)
        D & 3FFF FFFFh;                 (The 2 MSB belong to its AUX field)

    (Having the chunk's address, we are faced with a second decision: which
    side of the chunk do we declare to be the allocated one and which part the
    free one? Both ends of this chunk can not border to a free chunk: if they
    did, we would have missed a coasceling possibility. But we made sure, that
    we did not miss any such possibility, thus we don't need to re-check for
    coalescing once again. This could tempt us to think that it does not
    matter, which part we declare to take a certain role, but it does. If we
    declared the upper part to be the free one, no matter how small it is,
    we provide better possibilities for the memory pool to reintegrate that
    part somewhen. Would we re-allocate from the chunk's upper part, then its
    allocated part somewhen may share its upper border with the pool, blocking
    it from coalescing with the lower free chunk. - For this reason, we make
    the chunk's lower end the allocated part, and its upper part the free
    chunk.)

    (In any case does the found chunk D need no more size management, at least
    not for its current size. This may lead to removing the size node B all-
    together.)
        A = D;                              (Chunk to be un-managed)
        => Dsa Remove From Node;            (May remove the node)
        ? failed -> Dsa Alloc Exit Failed;  (May return with AVL error)

    (Node B may not exist anymore now. This is the case, when chunk D was the
    only chunk of this size.)

    (Time to split the chunk D into two chunks now. This is done by simply
    inserting a header after the designated allocated part and setting the
    chunk chain pointers accordingly. The effective size of the 'lower'
    allocated chunk is the requested size plus the size of its already
    existing header. Added to the address of the chunk which is being split
    we get the header address of the new upper chunk.)
        A = C;                          (Requested units)
        A + D;                          (Header address of new 'upper' chunk)
        A + DSA HEADER SIZE;            (Size of 'lower' allocated chunk)

    (Adjusting the chunk chain pointers.)
    (*************************************************************************)
    (UPDATE 1.2 START. Instruction scheduling optimization. We had 3 Write/Read
    stalls and 1 Write/Write stall. Execution of the 5 instructions required
    3.5 cycles. Was:)
                                        (U     ) (Previous instruction in U)
        (E = [D plus DSA CHUNK NEXT];)  ( V    ) (Address of next chunk)
        ([A plus DSA CHUNK NEXT] = E;)  (U  WRS) (New upper chk to next chk)
        ([E plus DSA CHUNK PREV] = A;)  (U  WRS) (Points back to new upper chk)
        ([D plus DSA CHUNK NEXT] = A;)  ( V    ) (Lower chk points to upper chk)
        ([A plus DSA CHUNK PREV] = D;)  (U  WRS) (And upper chk to lower)
                                        (U  WWS) (Next instruction in U again)
    (New: executes in 3 cycles, despite using an additional instruction; and we
    have no stalls anymore. Register B can be used, we don't need it anymore.)
                                        (U ) (Previous instruction in U)
        E = [D plus DSA CHUNK NEXT];    ( V) (Address of next chunk)
        B = A;                          (U ) (Make B synonymous to A)
        [A plus DSA CHUNK NEXT] = E;    ( V) (New upper chk to next chk)
        [E plus DSA CHUNK PREV] = B;    (U ) (Points back to new upper chk)
        [B plus DSA CHUNK PREV] = D;    ( V) (Upper chk points to lower chk)
        [D plus DSA CHUNK NEXT] = A;    (U ) (Lower chk points to upper chk)
                                        ( V) (Next instruction in V)
    (UPDATE 1.2 END)
    (*************************************************************************)

    (The upper chunk's size chain pointers are not defined yet, since we did
    not yet execute node management for it. However, that does not neccessarily
    mean, that there is no size node for that size. We do size managemnent
    for A in a few lines, but need to initialize it.)
        [A plus DSA SIZE PREV] = DSA CHAIN END;
    (UPDATE 1.2. Instruction scheduling optimization.)
        [D plus DSA CHUNK PREV] | DSA ALLOCATED;    (Moved from below)
        [A plus DSA SIZE NEXT] = DSA CHAIN END;

    (The lower chunk needs to be treated as allocated from now on. It is re-
    used.)
    (UPDATE 1.2. Moved to above.)
        ([D plus DSA CHUNK PREV] | DSA ALLOCATED;)  (Moved to above)

    (Because D is allocated now, it was removed from any size chain it may have
    been integrated into. Its size pointers must have been set to DSA CHAIN END
    already in above 'Dsa Remove From Node'.)
        ([D plus DSA SIZE PREV] = DSA CHAIN END;)   (Already done)
        ([D plus DSA SIZE NEXT] = DSA CHAIN END;)   (Already done)

    (Also, we provided A for E's CHUNK PREV field, replacing the former back
    link to D. That back link to D had the DSA ALLOCATED flag, but A of course
    had not. Since E is the chunk next to 'old' D, now split into new D and A,
    it always was and must remain allocated. We need to mask in DSA ALLOCATED
    also for E now, or E would wrongly be regarded as a deallocated chunk.)
        [E plus DSA CHUNK PREV] | DSA ALLOCATED;

    (Into A's CHUNK PREV field we provided D. Since D was a free chunk, and A
    is the remaining free part, the DSA ALLOCATED was not set for A, and also
    may not be set. We can leave it as it is.)

    (Now 'old' chunk D was split into chunks D and A. What's still missing is
    the size chain update for the /free/ chunk A. Chunk D may not be
    integrated into the size chain, because it was and still is allocated and
    not free. A's header address needs to be provided to the routine
    'Dsa Add To Node' in E, and its /usable/ size needs to be calculated into
    A. Note, that it was made sure, that this usable size is at least 1.)
        E <> A;                         (E: Header address; A: Next chunk)
        A - E;                          (Total size upper chunk, incl. header)
        A - DSA HEADER SIZE;            (Usable size of chunk)
        => Dsa Add To Node;             (May create a new node)
    (UPDATE 1.2. Branch prediction optimization. Handler at bottom of routine.)
        ? failed -> Dsa Alloc Upper Split Not Added;

    (Now return the userdata address of the re-used lower chunk to the caller.
    The header address is still in D, but needs to be returned in E.)
        E = D;                          (Header address reallocated chunk)
        E + DSA HEADER SIZE;            (Userdata address)
        -> Dsa Alloc Exit OK;           (Done)
    (-------------------------------------------------------------------------)
    (UPDATE 1.2. Code flow comment. With the introduction of the [dsa control]
    variable and its flag DSA CONTROL NO REUSE, we may jump to this code section
    right from the routine's start, should that flag be set.)

    "Dsa Alloc From Pool"

    (There is no suitable node in the AVL tree. Allocate from the memory pool
    between the AVL tree and the memory chunks, if there is enough free space
    to accomodate the request. In D is the requested number of units.)
        (D = D;)                            (Size of future memory chunk)
        => Dsa Make Top Chunk;              (Header Address new chunk in E)
        ? failed -> Dsa Alloc Exit Failed;  ([dsa error] already loaded)
        E + DSA HEADER SIZE;                (Userdata address allocated chunk)
        (-> Dsa Alloc Exit OK;)             (Done)
    (=========================================================================)
    (EXIT)

    "Dsa Alloc Exit OK"

    (UPDATE 1.2. For statistical purposes.)
        [dsa num alloc] +;              (Counting successful allocations)
        => Dsa Global Avl Data;         (Switch to original AVL Data)
        <-- D; <-- C; <-- B; <-- A;
        [dsa error] = DSA ERR NONE;
        end;
    (-------------------------------------------------------------------------)
    "Dsa Alloc Exit Failed"
        => Dsa Global Avl Data;         (Switch to original AVL Data)
        <-- D; <-- C; <-- B; <-- A;
        fail;
    (=========================================================================)
    (UPDATE 1.2 START. Error handlers moved down from inline handling for
    Branch Prediction Optimization.)

    "Dsa Alloc Dsa Not Existing"
        [dsa error] = DSA ERR DSA NOT EXISTING;
        -> Dsa Alloc Exit Failed;
    "Dsa Alloc Maximum Exceeded"
        [dsa error] = DSA ERR LIMIT EXCEEDED;
        -> Dsa Alloc Exit Failed;
    "Dsa Alloc Upper Split Not Added"
        [dsa error] = DSA ERR AVL ERROR;    (In [avl error] is the reason)
        -> Dsa Alloc Exit Failed;

    (UPDATE 1.2 END)
    (=========================================================================)

(*****************************************************************************)
"Dsa Free"

    (=========================================================================)
    (Deallocates a memory chunk at a valid address. Valid addresses are those
    which were returned by the 'Dsa Alloc' routine. Deallocation includes
    immediate coalescing on both sides, possibly returning large upper parts
    back to the memory pool. Size nodes are updated to accurately reflect free
    chunks available for re-use. Chunk and size pointers are updated to ensure
    fast best-fit re-allocation.)
    (-------------------------------------------------------------------------)
    (Input:     E:  Deallocate memory chunk allocated at address E
    Output:     -

    All registers unchanged.

    May return errors, check [dsa error] after 'failed'. It may contain:
        DSA ERR LIMIT EXCEEDED          We can't free addresses >= 1 GUnit
        DSA ERR DSA NOT EXISTING        'Dsa Create' was not used/successful
        DSA ERR OUT OF MEMORY           Request can not be fulfilled
        DSA ERR AVL ERROR               Details are in [avl error]
        DSA ERR NO ALLOCATED CHUNK      Attempt to free a not allocated chunk
    )
    (=========================================================================)
    (ENTRY)
        A -->; B -->; C -->; D -->; E -->;
        => Dsa Local Avl Data;          (Switch to local AVL Data)
    (=========================================================================)
    (VERIFICATION)

    (Make sure, that there was a former successful call to 'Dsa Create'. If
    there was none, or if it was not successful, an error is returned.)

    (UPDATE 1.2. Branch prediction optimization. Handler at bottom of routine.)
        ? [dsa status] - DSA STATUS INIT -> Dsa Free Dsa Not Existing;
    (-------------------------------------------------------------------------)
    (We work with the header address, not the provided userdata address.)
        E - DSA HEADER SIZE;
    (-------------------------------------------------------------------------)
    (Valid addresses are between [dsa ram bottom] incl. and [avl tree bottom]
    excl. If the caller wants to deallocate beyond these limits, it may be
    everything but an allocated chunk.)
        ? E >= [avl tree bottom] -> Dsa Free Maximum Failure;

    (UPDATE 1.2. Branch prediction optimization. Handler at bottom of routine.)
        ? E < [dsa ram bottom] -> Dsa Free Maximum Failure;

    ("Dsa Free Address Valid")
    (-------------------------------------------------------------------------)
    (The field CHUNK PREV of the actual chunk must have its MSB set. If it has
    not, then this is not a validly allocated chunk. It's a marginal test only,
    but after all it's the callers responsibility to call this routine with
    valid addresses, which were returned from 'Dsa Alloc'.)

    (UPDATE 1.2. Branch prediction optimization. Handler at bottom of routine.)
        ? [E plus DSA CHUNK PREV] - DSA ALLOCATED -> Dsa Free Is Not Allocated;

    ("Dsa Free Is Allocated")
    (=========================================================================)
    (PROCESS)

    (The allocation flag can be removed now.)
        [E plus DSA CHUNK PREV] # DSA ALLOCATED;
    (-------------------------------------------------------------------------)
    (Check, if we can return the actual chunk to the memory pool. This is the
    case, when the actual chunk is the topmost chunk. It is the topmost chunk,
    when CHUNK NEXT contains DSA CHAIN END.)

    (UPDATE 1.2. Branch prediction optimization. Typically it is rare to
    deallocate the topmost chunk, so fall-through for not topmost chunks is
    preferred.)
        ? [E plus DSA CHUNK NEXT] = DSA CHAIN END -> Dsa Free Topmost Chunk;
    (-------------------------------------------------------------------------)
    ("Dsa Free Not Topmost Chunk")

    (The actual chunk E is not the topmost chunk. Check, if there are neigh-
    boring chunks, which are already deallocated. If there are, they are
    coalesced with the actual one.)

    (First check the chunk above the actual one. Such a chunk must exist, or
    we would not have entered this code section 'Dsa Free Not Topmost Chunk'.
    If it still is allocated, i.e. in use, we don't coalesce it with the actual
    chunk.)
        A = [E plus DSA CHUNK NEXT];        (Check next chunk above the actual)

    (UPDATE 1.2. Code flow comment: there is no clear preference for either
    branch, so we can't really exploit branch prediction; too much depends on
    actual circumstances. It fits the logical code flow more to fall-through
    when we coalesce, i.e. when the above chunk is not allocated anymore.)
        ? [A plus DSA CHUNK PREV] + DSA ALLOCATED -> Dsa Free No Next Coalesc;
    (-------------------------------------------------------------------------)
    (The next 'upper' chunk is free. It can be coalesced with the actual
    chunk. Registration of the new free size is done later, when we jump to
    label 'Dsa Free No Prev Coalesc', after probably also coalescing with the
    previous 'lower' chunk. This section's tasks are: to completely
    remove the already free 'upper' chunk from the size chain by shortcutting
    its up to two neighbors, if needed adjusting the size node's end entries;
    to remove the /actual/ chunk from the chunk chain by shortcutting its
    neighbors; and lastly ensuring an unchanged address in E before we continue
    with checking for bottom coalescing.)

    (Eliminate the previous 'upper' chunk A from the size chain. This may
    lead to removing the size node.)
        (A = A;)                            (Address of chunk to be unlinked)
        => Dsa Remove From Node;            (Don't keep it listed anylonger)
        ? failed -> Dsa Free Exit Failed;   (Error code already loaded)

    (Eliminate the same upper chunk A from the chunk chain. This is contrary
    to bottom coalescing, where the actual chunk E is detached.)
        E -->;                              (Need actual chunk back)
        E = A;                              (Address of chunk to be unlinked)
        => Dsa Bypass Chunk;
        <-- E;                              (Restore actual chunk)
        ? failed -> Dsa Free Exit Failed;   (Error code already loaded)

    (After coalescing with the next 'upper' chunk, check for further coalescing
    with the previous 'lower' chunk.)
        (-> Dsa Free No Next Coalesc;)
    (-------------------------------------------------------------------------)
    (UPDATE 1.2. Code flow comment: Whether coalesced with the upper chunk or
    not, both branches meet here again.)

    "Dsa Free No Next Coalesc"              (Also jumped to if upper not free)

    (Check, if there is a previous 'lower' chunk. Such a chunk does not
    neccessarily exist, namely not then, when the actual chunk is the bottom-
    most, which is a rare event. If it exists, check if still allocated. If so,
    we can't coalesce it with the actual chunk.)
        ? E = [dsa ram bottom] -> Dsa Free No Prev Coalesc;
        A = [E plus DSA CHUNK PREV];        (Check previous chunk)

    (UPDATE 1.2. Code flow comment: as above, there is no clear preference for
    either branch, so we can't exploit branch prediction. It fits the logical
    code flow more to fall-through when we coalesce, i.e. when the lower chunk
    is not allocated anymore.)
        ? [A plus DSA CHUNK PREV] + DSA ALLOCATED -> Dsa Free No Prev Coalesc;
    (-------------------------------------------------------------------------)
    (The previous 'lower' chunk is free. It can be coalesced with the actual
    chunk. Registration of the new free size is done later, when we jump to
    label 'Dsa Free No Prev Coalesc'. This section's tasks are: to completely
    remove the already free 'lower' chunk from the size chain by shortcutting
    its up to two neighbors, if needed adjusting the size node's end entries;
    to remove the /actual/ chunk from the chunk chain by shortcutting its
    neighbors; and to provide a new lower address into E for a proper con-
    tinuation.)

    (Eliminate the previous 'lower' chunk A from the size chain. This may
    lead to removing the size node.)
        (A = A;)                            (Address of chunk to be unlinked)
        => Dsa Remove From Node;            (Don't keep it listed anylonger)
        ? failed -> Dsa Free Exit Failed;   (Error code already loaded)

    (Eliminate the /actual/ chunk E from the chunk chain. This is different
    from top coalescing: here in bottom coalescing, E is the upper of the
    two chunks; there it was the lower of the two.)
        (E = E;)                            (Address of chunk to be unlinked)
        => Dsa Bypass Chunk;
        ? failed -> Dsa Free Exit Failed;   (Error code already loaded)

    (Now the actual free chunk E can not be found anymore, but A still is in
    the chunk chain, although not in any size chain anymore. From now on, we
    treat the two chunks as a single coalesced chunk at the new address A.)
        E = A;                              (Address of larger actual chunk)
        (-> Dsa Free No Prev Coalesc;)      (Node management new size)
    (-------------------------------------------------------------------------)
    (UPDATE 1.2. Code flow comment: Whether coalesced with the lower chunk or
    not, both branches meet here again.)

    "Dsa Free No Prev Coalesc"              (Also jumped to if lower not free)

    (Coalescing was done, if any. In any case does E have the header address of
    the, possibly doubly coalesced, free chunk, i.e. it may be lower than the
    original actual chunk, when there was bottom coalescing or double
    coalescing. If there was top coalescing or no coalescing at all, E still
    will point to the actual chunk. Both chunk pointers were modified, if
    coalescings were applied, so that the size of the now possibly larger chunk
    can be calculated. We do this now. We need the chunk's size as a key to the
    size node.)

    (Calculate the size of the memory chunk to be deallocated. This is done
    by subtracting the actual chunk's header address from the next chunk's
    header address. An upper chunk must exist or we would have had no need to
    check for coalescings in the first place, but would have returned the
    chunk already to the memory pool.)
        A = [E plus DSA CHUNK NEXT];        (A will take chunk's size)
        A - E;                              (Chunk's size in A, incl header)
        A - DSA HEADER SIZE;                (Size excluding the header)

    (Doing the node management for this free chunk. This includes updating
    an existing or creating a new size node, resp., and update of both pointers
    of the size chain for all involved chunks.)
        (E = E;)                            (Header address of free chunk)
        (A = A;)                            (Usable size of free chunk)
        => Dsa Add To Node;
    (May return with an error. If there was one, the error code is already in
    [dsa error].)
        ? failed -> Dsa Free Exit Failed;   (Error code already loaded)

    (UPDATE 1.2. Code flow optimization. The vast majority of all calls to
    'Dsa Free' took this branch, thus we save an unconditional jump to the
    ordinary end of the routine by relocating the end from the routine's bottom
    to here at the cost of a not so intuitive code legibility.)
        (-> Dsa Free Exit OK;)              (New: fall through)
    (=========================================================================)
    (EXIT)

    (UPDATE 1.2 START: Code section moved from routine's end to take advantage
    from fall-through saving a jump for the vast majority of all cases.)

    "Dsa Free Exit OK"                      (Also jumped to from below)
        => Dsa Global Avl Data;             (Switch to original AVL Data)

    (UPDATE 1.2. For statistical purposes.)
        [dsa num free] +;                   (Counting successful deallocations)
        <-- E; <-- D; <-- C; <-- B; <-- A;
        [dsa error] = DSA ERR NONE;
        end;

    (UPDATE 1.2 END)
    (=========================================================================)
    (UPDATE 1.2. Code flow comment: This typically is a rare case.)

    "Dsa Free Topmost Chunk"                (Typically rare case)

    (This is the topmost chunk. It can be returned to the memory pool. If
    there is a free chunk already just below the actual chunk, that one can be
    returned as well, and since it is managed by a size node, the size node
    also needs to be reduced/removed. First we handle the actual chunk's
    previous chunk. See if it exists, and if so, if it was deallocated
    already.)
        A = [E plus DSA CHUNK PREV];        (Header address previous chunk)

    (UPDATE 1.2. Code flow comment: Both conditional branches typically not
    taken.)
        ? [A plus DSA CHUNK PREV] + DSA ALLOCATED -> Dsa Free Return To Pool;
        ? A = DSA CHAIN END -> Dsa Free Return To Pool;

    (A previous 'lower' chunk exists and was already deallocated. It can be
    returned to the memory pool as well. However, first we need to actualize
    the relevant size node.)

    (Eliminate the previous 'lower' chunk A from the size chain. This may
    lead to removing the size node.)
        (A = A;)                            (Address of chunk to be unlinked)
        => Dsa Remove From Node;            (Don't keep it listed anylonger)
        ? failed -> Dsa Free Exit Failed;   (Error code already loaded)

    (Because the actual chunk E still exists, the removal of chunk A from the
    size chain did not also remove it from the chunk chain, i.e. its CHUNK NEXT
    field still points to E. Likewise does E's CHUNK PREV field still refer
    to A. However, we do not need to adjust any of these links: A's predecessor
    will be made the end of the chunk chain below, so that neither A nor E
    can be found anymore. If there is no such predecessor, then the whole
    dedicated memory will become free once again.)

    (All memory at and above the just removed previous 'lower' chunk can be
    returned to the memory pool now.)
        E = A;
        (-> Dsa Free Return To Pool;)
    (-------------------------------------------------------------------------)
    (UPDATE 1.2. Code flow comment: Whether there was an already free chunk
    below the actual chunk or not, both branches meet here again.)

    "Dsa Free Return To Pool"

    (The area at E and above can be returned to the memory pool. Should a
    previous chunk have existed, which was already deallocated, then all needed
    node chain updates and possibly even a node removal were already done. In
    this case, E points to the header of that previous chunk, and not to the
    actual chunk anymore.)
        [avl tree bottom] = E;              (Memory pool grows downwards)

    (If E is the bottommost chunk now, then all the memory is available once
    again and not a single chunk remains allocated.)

    (UPDATE 1.2. Branch prediction optimization. E typically is not the bottom-
    most chunk, thus we change the instructions to take advantage of the branch
    prediction mechanism.)
        ? E = [dsa ram bottom] -> Dsa Free Bottommost Chunk;

    ("Dsa Free Not Bottommost")

    (E's previous chunk, which must be an allocated chunk, will become the end
    of the chunk chain now. It also will be registered as the last so far
    allocated chunk.)
        E = [E plus DSA CHUNK PREV];        (Previous chunk is new last chunk)

    (Because E before the allocation was a free chunk, its CHUNK PREV field
    can not have the DSA ALLOCATED flag. Thus, after allocation E never
    has it as well and we need not to clear it explicitely. However, after the
    allocation, E's previous chunk is /always/ an allocated chunk and thus must
    have the DSA ALLOCATED flag.)
        (E & FFFF FFFFh minus DSA ALLOCATED;)   (Not needed)

    (End the chunk chain at that previous chunk E.)
        [E plus DSA CHUNK NEXT] = DSA CHAIN END;

    (Register as the last allocated chunk.)
        [dsa last chunk] = E;               (New last allocated)
        -> Dsa Free Exit OK;                (Done. Target is above.)
    (-------------------------------------------------------------------------)
    (UPDATE 1.2 START. This code section was moved from further above for
    branch prediction optimization. It handles a rare case.)

    "Dsa Free Bottommost Chunk"

    (The chunk chain fields both have DSA CHAIN END already. Tell, that there
    is no last allocated chunk anymore.)
        [dsa last chunk] = minus 1;         (No last allocated chunk)
        -> Dsa Free Exit OK;                (Done. Target is above.)

    (UPDATE 1.2 END)
    (=========================================================================)
    (EXIT)

    (UPDATE 1.2: The ordinary exit label 'Dsa Free Exit OK' which could be
    expected here was moved further up to take advantage of natural code flow
    and save an unconditional jump for the vast majority of all cases.)
    (-------------------------------------------------------------------------)
    "Dsa Free Exit Failed"
        => Dsa Global Avl Data;             (Switch to original AVL Data)
        <-- E; <-- D; <-- C; <-- B; <-- A;
        fail;
    (=========================================================================)
    (UPDATE 1.2 START. Error handlers moved down from inline handling for
    Branch Prediction Optimization.)

    "Dsa Free Dsa Not Existing"
        [dsa error] = DSA ERR DSA NOT EXISTING;
        -> Dsa Free Exit Failed;
    "Dsa Free Maximum Failure"
        [dsa error] = DSA ERR LIMIT EXCEEDED;
        -> Dsa Free Exit Failed;
    "Dsa Free Is Not Allocated"
        [dsa error] = DSA ERR NO ALLOCATED CHUNK;
        -> Dsa Free Exit Failed;

    (UPDATE 1.2 END)
    (=========================================================================)

(*****************************************************************************)
"Dsa Size"

    (=========================================================================)
    (Returns the size of an allocated chunk.)
    (-------------------------------------------------------------------------)
    (Input:     E:  Address of allocated chunk
    Output:     D:  Size of userdata area in units, excl. header

    All registers unchanged.

    May return errors, check [dsa error] after 'failed'. It may contain:
        DSA ERR LIMIT EXCEEDED          Such an address was never allocated
        DSA ERR DSA NOT EXISTING        'Dsa Create' was not used/successful
        DSA ERR NO ALLOCATED CHUNK      Such an address was never allocated
    )
    (=========================================================================)
    (ENTRY)
        E -->;
        => Dsa Local Avl Data;          (Switch to local AVL Data)
    (=========================================================================)
    (VERIFICATION)

    (Make sure, that there was a former successful call to 'Dsa Create'. If
    there was none, or if it was not successful, an error is returned.)

    (UPDATE 1.2. Branch prediction optimization. Handler at bottom of routine.)
        ? [dsa status] - DSA STATUS INIT -> Dsa Size Dsa Not Existing;
    (-------------------------------------------------------------------------)
    (Addresses of 1GUnits = 4 GB or more can not be handled, thus were never
    returned by 'Dsa Alloc'. In this case the caller provides invalid input.
    Tell him if so.)

    (UPDATE 1.2. Branch prediction optimization. Handler at bottom of routine.)
        ? E + C000 0000h -> Dsa Size Limit Exceeded;
    (-------------------------------------------------------------------------)
    (We work with the header address, not the provided userdata address.)
        E - DSA HEADER SIZE;
    (-------------------------------------------------------------------------)
    (The field CHUNK PREV of the actual chunk must have its MSB set. If it has
    not, then this is not a validly allocated chunk. It's a marginal test only,
    but after all it's the callers responsibility to call this routine with
    valid addresses.)

    (UPDATE 1.2. Branch prediction optimization. Handler at bottom of routine.)
        ? [E plus DSA CHUNK PREV] - DSA ALLOCATED -> Dsa Size Not Allocated;
    (=========================================================================)
    (PROCESS)

    (The difference between the next 'upper' chunk, if any, and the own
    address is occupied by this chunk, including header. If this is the top
    chunk, then we take the upper address from [avl tree bottom].)
        D = [E plus DSA CHUNK NEXT];
        ? D != DSA CHAIN END -> Dsa Size Got Upper;
        D = [avl tree bottom];
    "Dsa Size Got Upper"
        D - E;                          (Total size of this chunk)
        D - DSA HEADER SIZE;            (User size of this chunk)
    (=========================================================================)
    (EXIT)

    "Dsa Size Exit OK"
        => Dsa Global Avl Data;         (Switch to original AVL Data)
        <-- E;
        [dsa error] = DSA ERR NONE;
        end;
    (-------------------------------------------------------------------------)
    "Dsa Size Exit Failed"
        => Dsa Global Avl Data;         (Switch to original AVL Data)
        <-- E;
        fail;
    (=========================================================================)
    (UPDATE 1.2 START. Error handlers moved down from inline handling for
    Branch Prediction Optimization.)

    "Dsa Size Dsa Not Existing"
        [dsa error] = DSA ERR DSA NOT EXISTING;
        -> Dsa Size Exit Failed;
    "Dsa Size Limit Exceeded"
        [dsa error] = DSA ERR LIMIT EXCEEDED;
        -> Dsa Size Exit Failed;
    "Dsa Size Not Allocated"
        [dsa error] = DSA ERR NO ALLOCATED CHUNK;
        -> Dsa Size Exit Failed;

    (UPDATE 1.2 END)
    (=========================================================================)

(*****************************************************************************)
"Dsa Clear"

    (=========================================================================)
    (Clears the userarea of an allocated chunk by writing NULL into all the
    units.)
    (-------------------------------------------------------------------------)
    (Input:     E:  Address of allocated chunk
    Output:     -

    All registers unchanged.

    May return errors, check [dsa error] after 'failed'. It may contain:
        DSA ERR LIMIT EXCEEDED          Such an address was never allocated
        DSA ERR DSA NOT EXISTING        'Dsa Create' was not used/successful
        DSA ERR NO ALLOCATED CHUNK      Such an address was never allocated
    )
    (-------------------------------------------------------------------------)
        D -->; E -->;
    (-------------------------------------------------------------------------)
    (The initial validity tests and the first task of retrieving the size are
    absolutely identical to the public routine 'Dsa Size'.)
        => Dsa Size;                        (User size in D if not failed)
        ? failed -> Dsa Clear Exit Failed;  (Code already in [dsa status])
    (-------------------------------------------------------------------------)
    (In E is still the userdata address, i.e. the address of the first unit to
    be nulled. D has the length of the userdata area. The area from D on,
    including it, is nulled out now, in forward direction, because the hardware
    works faster in that direction.)
    "Dsa Clear Write Null"
        [E] = 0;                        (Write 0 into unit)
        E +;                            (Position to next unit to be nulled)
        D ^ Dsa Clear Write Null;       (Do for the whole size)
        (-> Dsa Clear Exit OK;)
    (-------------------------------------------------------------------------)
    ("Dsa Clear Exit OK")
        <-- E; <-- D;
        [dsa error] = DSA ERR NONE;
        end;
    "Dsa Clear Exit Failed"
        <-- E; <-- D;
        fail;
    (=========================================================================)

(*****************************************************************************)
"Dsa Fragmentation"                     (Introduced in update 1.2)

    (=========================================================================)
    (MEMORY ANALYSE ROUTINE. Returns data about the actual memory fragmentation
    in the dedicated RAM.)
    (-------------------------------------------------------------------------)
    (Input:     -
    Output:     A:  Free and reusable RAM below the memory pool, in units
                B:  Totally allocated and free RAM below the memory pool, units
                C:  Size of memory pool with free RAM, in units

    The percentage of the fragmentation can be obtained after a call to this
    routine with 'A * 100' and 'A / B'. The also free RAM in the memory pool
    is returned in C, but included neither in A nor in B. For the percentage of
    the overall fragmentation within the provided memory thus calculate
    A * 100 / {B + C}.

    NOTE: Don't use too frequent. For performance reasons, the number of free
    units is not updated live: Updating this number is not just an issue of a
    few additions and subtractions; it would involve its own share of logic
    due to coalescing and splitting operations. Since such a life value isn't
    a vital datum in general, at least not for the author, the according logic
    was not implemented, in favour of an overall higher performance. However,
    to satisfy the sporadic desire for such data, this function was implemented
    in version 1.2. Be careful when using it, though: the function needs to
    step through all chained memory chunks, which easily can count into the
    millions.

    All registers except A, B and C unchanged.
    May return errors, check [dsa error] after 'failed'. It may contain:
        DSA ERR DSA NOT EXISTING        'Dsa Create' was not used/successful
    )
    (=========================================================================)
    (ENTRY)
        D -->; E -->;
        => Dsa Local Avl Data;          (Switch to local AVL Data)
    (=========================================================================)
    (VERIFICATION)

    (Make sure, that there was a former successful call to 'Dsa Create'. If
    there was none, or if it was not successful, an error is returned.)
        ? [dsa status] - DSA STATUS INIT -> Dsa Fragmentation Dsa Not Existing;
    (=========================================================================)
    (PROCESS)

    (Into register A the number of free units is counted. We start with 0.)
        A = 0;
    (-------------------------------------------------------------------------)
    (Make sure that there is an AVL root managing free memory chunks. If there
    is none, then there are no free memory chunks and thus no fragmented space.
    This is not an error: it merely means, that there are no reusable memory
    chunks at the moment, but we still can provide output data: B and C will
    be calculated after jumping.)
        ? [avl status] - AVL STATUS ROOT -> Dsa Fragmentation All Counted;
    (-------------------------------------------------------------------------)
    (The AVL tree consists of nodes, each maintaining a list of free chunks of
    same size. The node's key is this size. We need to traverse the whole tree.
    It does not matter which tree traversal method is used, because we don't
    need the information in any particular order here: we just need to make
    sure, that each node is visited once. We arbitrarily use pre-order: each
    node is traversed before its children, and of the children, the left subtree
    is visited before the right one.)
        E = [avl tree root];                (The root address is flexible)
        -> Dsa Fragmentation Follow Chunks; (Process the root)

    "Dsa Fragmentation Follow Node"
    (Follow left child, if any.)
        E = [D plus AVL LEFT];              (If existing, process the node)
        ? E != minus 1 -> Dsa Fragmentation Follow Chunks;
    (D has no left child. Does it have a right one? If so, follow it.)

    "Dsa Fragmentation Follow Right"
        E = [D plus AVL RIGHT];             (If existing, process the node)
        ? E != minus 1 -> Dsa Fragmentation Follow Chunks;
    (D has neither a left nor a right child. We go back up.)

    "Dsa Fragmentation Follow Parent"
        C = D;                              (Need for comparison)
        D = [D plus AVL PARENT];            (Need parent address in D)

    (If we attempted to get the root's parent, we are done.)
        ? D = minus 1 -> Dsa Fragmentation All Counted;

    (If C was D's left child, we follow D's right child now, if any.)
        ? C = [D plus AVL LEFT] -> Dsa Fragmentation Follow Right;

    (C was D's right child. We need to go up the tree once again.)
        -> Dsa Fragmentation Follow Parent;
    (-------------------------------------------------------------------------)
    "Dsa Fragmentation Follow Chunks"

    (In E is the address of an AVL node. The field AVL KEY contains this node's
    key. The key is the free memory chunk's size: it needs to be added once for
    each existing chunk. Use a register to store the key for faster addition.)
        C = [E plus AVL KEY];

    (The AVL node's fields AVL DATA and AVL FLAGS contain the node's payload.
    Actually, only the 30 LSBs of the field AVL DATA contain the actual data:
    the 2 MSBs are bits 29-28 of the virtual AUX field, of which the 28 LSBs
    can be found in the 28 MSBs of the field AVL FLAGS. Both DATA and AUX have
    a size of 30 bits each. They both store addresses: DATA to the start of a
    memory chunk chain, and AUX to its end. This chain lists memory chunks which
    are free to be reused. Each entry points to a chunk with the same size, as
    indicated in the key.)
        B = [E plus AVL DATA];              (2 MSBs belong to virtual AUX)
        B & 3FFF FFFFh;                     (30 LSBs contain start of chain)

    (In B is the address of a free memory chunk. It points to that chunk's
    header. The header's field DSA SIZE NEXT points to the next free memory
    chunk of same size, or it contains the constant DSA CHAIN END to indicate
    the end of the chain. Whichever is the case: we need to count this chunk's
    size. The size was obtained from the actual AL node's header and was stored
    into register C.)
    "Dsa Fragmentation Count Units"
        A + C;                              (Count free space, in units)

    (If this chunk is the last chunk of size C, its field DSA SIZE NEXT
    contains the value DSA CHAIN END. In this case we can stop to follow the
    chain and continue with the next AVL node. Else it contains the address of
    another free chunk, which needs to be made the actual one: then it is
    counted as well and examined anew for the end of the chain.)
        B = [B plus DSA SIZE NEXT];         (Next chunk in chain, if any)
        ? B != DSA CHAIN END -> Dsa Fragmentation Count Units;

    (The last chunk of size C has been counted. We proceed with the next node
    in the AVL tree.)
        D = E;                              (From here we access the next node)
        -> Dsa Fragmentation Follow Node;
    (-------------------------------------------------------------------------)
    (To here we jump when all free space was counted. The sum is in register A.
    This value alone does not tell too much, though, when we want to know about
    fragmentation. What we need is a relation free space : total space. Now it
    is clear, that the memory pool itself always is a large block of free space:
    It does make no sense to include its size in the free area. However, it may
    be of interest for the caller, and so we provide its size separately in C.
    More interestingly, the variable [avl tree bottom] contains the bottom end
    of the memory pool. This means, that the topmost allocated memory chunk
    ends just the unit before that. Subtracting this value from [dsa ram bottom]
    tells us, how many units have been used so far. The result B does include
    the  deallocated memory chunks which we just summed up into A. This is a
    most useful value, as after the division A / B the caller knows the exact
    fragmentation below the memory pool. However, we don't do this division in
    this routine, as it would mean the loss of two potentially interesting
    values. If the caller does not need them, it is easy for him to do the
    division himself.)

    "Dsa Fragmentation All Counted"
        B = [avl tree bottom];          (Bottom of memory pool)
        C = [avl tree extent];          (Top of memory pool)
        B - [dsa ram bottom];           (Size of allocated and free chunks)
        C - [avl tree bottom];          (Size of memory pool)
    (=========================================================================)
    (EXIT)

    "Dsa Fragmentation Exit Ok"
        => Dsa Global Avl Data;         (Switch to original AVL Data)
        <-- E; <-- D;
        [dsa error] = DSA ERR NONE;
        end;
    "Dsa Fragmentation Exit Failed"
        => Dsa Global Avl Data;         (Switch to original AVL Data)
        <-- E; <-- D;
        fail;
    (=========================================================================)
    (ERRORS)

    "Dsa Fragmentation Dsa Not Existing"
        [dsa error] = DSA ERR DSA NOT EXISTING;
        -> Dsa Fragmentation Exit Failed;
    (=========================================================================)

(*****************************************************************************)
"Dsa Cache Alignment"                   (Introduced in update 1.2)

    (=========================================================================)
    (MEMORY ANALYSE ROUTINE. For fast memory access some opimizations can be
    done. As per Intel documentation, one such optimization is to make sure,
    that allocated memory chunks fit into a dedicated cache line of the L1
    cache and don't leap into a second line. The size of a cache line can vary
    greatly in different CPU designs: between 8 and 512 Bytes. Most Pentium
    CPUs have a cache line of 32 bytes = 8 units, Pentium 4 processors have a
    cache line size of 64 bytes = 16 units. Feed this routine with your cache 
    line size to see the potential optimization. The cache line size can be 
    obtained from your CPU documentation or from various Internet sources;
    you might want to check out wikipedia.)
    (-------------------------------------------------------------------------)
    (Input:     A:  Cache line size in UNITS
                    for Pentiums 8 or 16 units = 32 or 64 bytes
    Output:     C:  Number of totally allocated memory chunks
                D:  Number of small memory chunks with a size <= A
                E:  Number of small memory chunks completely in a cache line

    D is the total potential: to make full use of the cache, they all should be
    in the same memory segment of A units, without crossing borders into the
    next memory segment. The number E is the number of chunks which are aligned
    properly. Consequently, D-E memory chunks are /not/ aligned properly, even
    if they could be. If D-E is high in relation to D, you might want to
    consider optimization of your data structure alignment. However, read the
    following TECHNICAL NOTES on potential drawbacks.)
    (-------------------------------------------------------------------------)
    (TECHNICAL NOTES: An opimization attempt aligned data such, that a reusable
    empty filler chunk was inserted between the current topmost chunk and the
    effectively required chunk. The filler chunk had a variable length and could
    encomprise more than 1 cache line. This guaranteed a 100% cache line align-
    ment for all allocated chunks. However, due to the 4 units long DSA header,
    with a cache line of 32 Bytes as in Pentiums, at most 4 additional units of
    payload could be included into the same cache line, which is by far not
    enough if the payload is a data structure itself, as is the case for example
    with the dictionary library 'DIC' having a header of 4 units plus 1 to
    typically max 2 units of key length. This was addressed by putting the DSA
    header into the last 4 units of a cache line, allowing the user-record to
    start at a cache line in its own right. When traversing the dictionary in
    the DIC library's First/Next loops, i.e. lexicographically, no access to the
    DSA header is needed, resulting in exactly one cache line per entry, namely
    the DIC data structure plus its payload, the key. However, this proved to be
    not as optimal as was without employing this mechanism: firstly, the filler
    chunks, although reusable, occupy storage themselves, namely for the DSA
    header plus the reusable units to align to the desired position; the AVL
    data is neglicable. And secondly and more importantly: If storing dictionary
    entries just sequentially without caring for cache line boundaries, some
    units of the next entry are already present, although not complete. With an
    average key length of approx. 1.1 units in the DIC library, a chunk for a
    DIC entry including the DSA header is 4+4+1.1 = 9.1 units in size, and thus
    is not fitting in a cache line completely. But immediately after them the
    start of the next entry begins, avoiding the need to access another cache
    line to get its start. This propagates onwards for all entries, if arranged
    sequentially as is done with 'Dic Linearize'. When aligning at cache line
    boundaries, though, this is not the case. There we need in /any/ case to load
    another cache line to access a new entry's start. Overall, this optimization
    attempt provided a performance /loss/ of 7% for DIC library's First/Next
    loops, and 8% for accesses via the entry's key. For this reason, that parti-
    cular optimization was discarded altogether. - CONCLUSION: Despite Intel
    documentation pointing out to always align data structures at cache line
    boundaries, it in fact is better to have data lined up linearily /without
    any filler gaps/, even if this means to cross cache line boundaries. This
    is true for all cases which require multiple accesses to a data structure,
    like e.g. all binary tree types. However, Intel's recommendation should be
    followed for direct-access-structures, like index tabl