Article
Organization
Home »
Linoleum Source Code »
Dynamic Storage Allocation »
Source Code Library
Scope
This is the source code of the Dynamic Storage Allocation library, implemented in the
Linoleum programming language.
Author
Herbert Glarner
Source Code
(Dynamic Storage Allocation "DSA.txt" 1.2 - Copyright 2006 Herbert Glarner)
(IMPORTANT: Includes the 'AVL' library. It must be version 1.2 or higher: the
first release 1.0 will cause an error, because it lacked support for free
usage of unused flag bits. AVL 1.1 did contain a bug, causing a wrong tree
balance in special circumstances. Details see in that library's description.
The current AVL version as per time of releasing is 1.3, including several
performance optimizations.)
(*****************************************************************************)
(=========================================================================)
( Copyright Herbert Glarner [2006]
Mail: herbert.glarner@bluewin.ch
Free usage granted for all purposes, including commercial, provided
that above credits are given.
Suggestions for performance improvement welcome.
No liability for any damage usage may cause. )
(=========================================================================)
(=========================================================================)
(Implements Dynamic Storage Allocation for a dedicated Heap Storage.
The dedicated RAM is allocated dynamically. Consumption of RAM is
bidirectional: from top to bottom grows a fragmention-free AVL tree
controlling free memory sizes, from bottom to top grow the chunks used by
the caller, separated by free chunks causing fragmentation. In between
lies the memory pool from where chunks are assigned, when they can not be
granted otherwise. Fragmentation is reduced by immediate coalescing.
Reusage of coalesced areas is done by exact-fit, if possible, else by
best-fit splitting.
+-----------------------+ [avl tree top] never changes after 'Create'
| AVL Controller |
| |
| |
+-----------------------+ [avl tree extent] Grows downwards
| Memory Pool | ^
| [always free] | | When these two would overlap
| | | we are 'Out of Memory'
| | v
+-----------------------+ [avl tree bottom] Grows upwards during
| Memory Chunks | allocations
| [Allocated and free |
| chunks] |
+-----------------------+ [dsa ram bottom] never changes after 'Create'
[dsa ram bottom] is a variable pertaining to this library. It stores
permanently the low boundary of the RAM having been dedicated for DSA
purposes.
[avl tree top], [avl tree extent] and [avl tree bottom] all belong
to the AVL library. Whereas the first is static and never changes,
the other two are subject to changes. [avl tree extent] is controlled
by the AVL library itself, but [avl tree bottom] is manipulated by this
DSA library: it tells the AVL library, how far down the AVL Tree may
grow, and this is determined by the memory chunks being defined.)
(=========================================================================)
(=========================================================================)
(The AVL tree's entries carry 3 pieces of information each:
Size: Information: Naming in this Lib:
30 bit Key SIZE
30 bit Start chunk chain DATA
30 bit End chunk chain AUX
30 bit? Well yes, we manage addresses in the 32-bit-room with this library.
With 32 bit 2^32 = 4 GB can be addressed. But Linoleum addresses do not
address individual bytes, but units of 4 bytes each, and these units are
numbered from 0 on. Thus, even if we really had and would address 4 GB, we
number them from 0 to 1 GigaUnit-1. To store these numbers, 30 suffice.
And why is this important?
An AVL node effectively has a 32 bit Key and a 32 bit Data field, so this
obviously is not enough for all the information. However, there's also the
Flags field: as long as we do not interfer with the 4 flags managed by the
AVL library itself, we can use this field's remaining 28 bits.
Now the 28 extra bits are not quite enough to store an additional 30-bit-
address. Therefore we store the 2 missing bits somewhere else. This place
is the 2 most significant bits {MSB} of the node's DATA field, of which we
only use the 30 least significant bits {LSB} anyway, for exactly the same
reason.
In this library, we call the 30 LSB of the Data field 'DATA', and the
combination of the 2 MSB of the Data field plus the 28 MSB of the Flags
field 'AUX'.
The 3 fields of an AVL node map this way:
32 bit 32 bit 32 bit
+-------------------+ +-------------------+ +-------------------+
| AVL KEY | | AVL DATA | | AVL FLAGS |
+-------------------+ +-------------------+ +-------------------+
Determines the nodes The lower 30 LSB are The 4 LSB are used by
key. It is identical the 'DATA' field. The the AVL library. The
with the chunks sizes 2 MSB are the upper 28 MSB are the lower
which are managed by compenent of AUX. compenent of AUX.
this node.
2 bit 30 bit 2 bit 30 bit 28 bit 4 bit
+-----+-------------+ +-----+-------------+ +--------------+----+
| '00'| KEY | | AUX1| DATA | | AUX2 |XXXX|
+-----+-------------+ +-----+-------------+ +--------------+----+
NEVER touch the 4 LSB
bits.
AUX1 and AUX2 put together form the AUX field.
2 bit 30 bit 30 bit 2 bit 28 bit
+-------------+ +-------------+ +-----+---------+
00| KEY | 00| DATA | 00| AUX1| AUX2 |
+-------------+ +-------------+ +-----+---------+
The 2 MSB are The 2 MSB are The 2 MSB are
always clear. always clear. always clear.
)
(=========================================================================)
(=========================================================================)
(Routines summary:
Name of routine Short description)
(-------------------------------------------------------------------------)
( ----------------------------------------------------------------------
PUBLIC ROUTINES:
----------------------------------------------------------------------
------------------------------------------------------------------
[Dedicated Ram Area Handling]
Dsa Create Do this first to tell about memory
Dsa Destroy Do this last to clean up memory
------------------------------------------------------------------
[Allocating and deallocating chunks]
Dsa Alloc Allocate a memory chunk by size
Dsa Free Free an allocated chunk by address
------------------------------------------------------------------
[Working with allocated chunks]
Dsa Clear Write NULL into all usable units
Dsa Size [1] Get the size of an allocated chunk
******************************************************************
UPDATE 1.2 START
[Special purpose routines, newly introduced with update 1.2]
Dsa Fragmentation Analyzes fragmentation
Dsa Cache Alignment Analyzes cache alignment
UPDATE 1.2 END
******************************************************************
------------------------------------------------------------------
[1] However, why would you not know the size of the object you
once allocated? If this can be the case for some reason,
consider to store the length as a part of the record/object/
structure for which the memory chunk is allocated.
------------------------------------------------------------------
----------------------------------------------------------------------
INTERNAL ROUTINES:
----------------------------------------------------------------------
Dsa Add to Node
Dsa Bypass Chunk
Dsa Make Top Chunk
Dsa Read Node Aux
Dsa Remove From Node
Dsa Write Node Aux
----------------------------------------------------------------------
Usage of Internal Routines from the outside world is strongly
discouraged: Most change some chunks pointers in a way granting a nice
crash. The only 'undangerous' one is 'Dsa Read Node Aux'.
----------------------------------------------------------------------
)
(=========================================================================)
(=========================================================================)
(Memory requirements:
As much as is assigned with 'Dsa Create', plus 8 units for variables
and an additional 6 units to save the library's own AVL data set.
)
(=========================================================================)
(=========================================================================)
(Version history:
23 Jun 2006: 1.0 Released
25 Jul 2006: 1.1 Released. Encapsulates all AVL data used by this lib
such, that an indipendent AVL data set outside of this lib
is not affected, thus enabling the parallel usage of an
independent AVL tree in co-existence with this library.
Or in short: Just use the external AVL tree as usual, this
library's AVL tree is transparent and causes no conflicts.
29 Nov 2006: 1.2 Released.
STATISTICS. [dsa num alloc] and [dsa num free] added to
count the total number of both the successfull allocations
and deallocations for statistical purposes.
FLOW CONTROL. [dsa control] and DSA CONTROL XXX constants
control behaviour of the DSA library for fine-tuning issues
related to faster cache access.
NEW ANALYZE FUNCTIONS. The routines 'Dsa Fragmentation'
and 'Dsa Cache Alignment' allow to analyze how the storage
is treated by the DSA library.
OPTIMIZATIONS. Careful code review regarding common branch
prediction mechanisms and instruction scheduling led to
modifications in several dozen places.
)
(=========================================================================)
(*****************************************************************************)
"libraries"
(=========================================================================)
(AVL Tree)
(-------------------------------------------------------------------------)
(The AVL library provides a controller to quickly find free memory chunks
for re-usage, thus avoiding memory fragmentation as much as possible. It
sits on top of the dedicated RAM. The source of the AVL Library can be
found at
http://herbert.wikispaces.com/AVL+Tree
)
(-------------------------------------------------------------------------)
AVL; (Requires version 1.2 or higher)
(=========================================================================)
(*****************************************************************************)
"directors"
program name = { DSA };
unit = 32;
(*****************************************************************************)
"constants" (Convention: "DSA ...", UPPER CASE)
(=========================================================================)
(Error codes)
(-------------------------------------------------------------------------)
DSA ERR NONE = 0; (No error)
(-------------------------------------------------------------------------)
(From 'Dsa Create')
DSA ERR DSA EXISTS = 1; (Use 'Dsa Destroy' to get rid of it)
DSA ERR LIMIT EXCEEDED = 2; (We can't allocate 1 GUnit or more)
DSA ERR NOT ENOUGH SPACE = 3; (Not even space for a single node)
DSA ERR REQUESTED TOO MUCH = 4; (Attempt to dedicate too much RAM)
DSA ERR AVL INIT FAILED = 5; (AVL tree could not be initialized)
(-------------------------------------------------------------------------)
(From 'Dsa Destroy')
DSA ERR DSA NOT EXISTING = 6; ('Dsa Create' was not used/successful)
DSA ERR SPACE NOT RETURNED = 7; (RAM not given back to the OS)
(-------------------------------------------------------------------------)
(From 'Dsa Alloc')
( DSA ERR LIMIT EXCEEDED = 2;) (We can't request 1 GUnit or more)
( DSA ERR DSA NOT EXISTING = 6;)('Dsa Create' was not used/successful)
DSA ERR OUT OF MEMORY = 8; (Request can not be fulfilled)
DSA ERR AVL ERROR = 9; (Details are in [avl error])
(-------------------------------------------------------------------------)
(From 'Dsa Free')
( DSA ERR LIMIT EXCEEDED = 2;) (We can't free addresses >= 1 GUnit)
( DSA ERR DSA NOT EXISTING = 6;)('Dsa Create' was not used/successful)
( DSA ERR OUT OF MEMORY = 8;) (Request can not be fulfilled)
( DSA ERR AVL ERROR = 9;) (Details are in [avl error])
DSA ERR NO ALLOCATED CHUNK = 10;(Attempt to free a not allocated chunk)
(-------------------------------------------------------------------------)
(From 'Dsa Size')
( DSA ERR LIMIT EXCEEDED = 2;) (Such an address was never allocated)
( DSA ERR DSA NOT EXISTING = 6;)('Dsa Create' was not used/successful)
( DSA ERR NO ALLOCATED CHUNK = 10;)(Such an address was never allocated)
(-------------------------------------------------------------------------)
(From 'Dsa Clear')
( DSA ERR LIMIT EXCEEDED = 2;) (Such an address was never allocated)
( DSA ERR DSA NOT EXISTING = 6;)('Dsa Create' was not used/successful)
( DSA ERR NO ALLOCATED CHUNK = 10;)(Such an address was never allocated)
(-------------------------------------------------------------------------)
(From internal routine 'Dsa Make Top Chunk', passed to 'Dsa Alloc')
( DSA ERR OUT OF MEMORY = 8;) (Request can not be fulfilled)
(-------------------------------------------------------------------------)
(From internal routine 'Dsa Add To Node', passed to 'Dsa Alloc' or
'Dsa Free', resp.)
( DSA ERR OUT OF MEMORY = 8;) (Request can not be fulfilled)
( DSA ERR AVL ERROR = 9;) (Details are in [avl error])
(-------------------------------------------------------------------------)
(From internal routine 'Dsa Remove From Node', passed to 'Dsa Alloc' or
'Dsa Free', resp.)
( DSA ERR AVL ERROR = 9;) (Details are in [avl error])
(-------------------------------------------------------------------------)
(From internal routine 'Dsa Bypass Chunk', passed to 'Dsa Free'.
If appearing, these would be programming errors, which need to be fixed, as
they would compromise the library's functionality. However, multi-billion
tests with random length allocations and deallocations did not cause such
inconsistencies. If they happen to you, a notification would be much
appreciated.)
DSA ERR INTERNAL 1 = 11; (Comments see 'Dsa Bypass Chunk')
DSA ERR INTERNAL 2 = 12; (Comments see 'Dsa Bypass Chunk')
DSA ERR INTERNAL 3 = 13; (Comments see 'Dsa Bypass Chunk')
(*************************************************************************)
(UPDATE 1.2 START)
(From 'Dsa Cache Alignment')
( DSA ERR DSA NOT EXISTING = 6;)('Dsa Create' was not used/successful)
DSA ERR INV CACHE LINE SIZE = 14; (Cache line size must be power of 2)
(UPDATE 1.2 END)
(*************************************************************************)
(=========================================================================)
(=========================================================================)
(Memory chunk header, just before the user data. In essence, the 4 units
provide a quadruply linked list. The first two chain memory chunks to each
other as they appear in physical memory. There is just one such chain
encomprising all allocated and free memory chunks. Memory chunks can be
coalesced or split, which leads to insertion and removal of these pointers.
The other two chain memory chunks of same size, but only for free chunks.
Their start and end is referenced by an AVL Node, whose key is the chunk
size. Thus, depending on the different 'object' sizes an application
manages, there may be just a few or also many individual chains. The first
of the 4 pointers also contains a flag in its MSB, telling if this chunk
currently is free or allocated. If it is free, it is part of a size chain.)
(-------------------------------------------------------------------------)
DSA HEADER SIZE = 4;
DSA CHUNK PREV = 0; (Pointer to previous memory chunk)
(If the following flag is set in CHUNK PREV, then the chunk is
allocated and in use; else it is a free memory chunk.)
DSA ALLOCATED = 8000 0000h;
DSA CHUNK NEXT = 1; (Pointer to next memory chunk)
DSA SIZE PREV = 2; (Pointer to prev. chunk with same size)
DSA SIZE NEXT = 3; (Pointer to next chunk with same size)
(In all 4 pointers, the following '31-bit value' -1 can appear
as an indication, that this is the start or end, resp., of the
actual chain.)
DSA CHAIN END = 7FFF FFFFh;
(=========================================================================)
(=========================================================================)
(The variable [dsa status] will hold two flags about the current status
of the Dynamic Storage Allocator.)
(-------------------------------------------------------------------------)
(The following flag is set when 'Dsa Create' was successful.)
DSA STATUS INIT = 0000 0001h;
(-------------------------------------------------------------------------)
(UPDATE 1.1 START)
(This flag is set, when the library currently uses its own AVL data set.
If it is clear, then we would access the AVL data set valid outside of
this library: without switching to the local set a crash is guaranteed.)
DSA STATUS LOCAL SET = 0000 0002h;
(UPDATE 1.1 END)
(-------------------------------------------------------------------------)
(Theoretically, the other bits within the [dsa status] variable are
free to be used. However, usage is discouraged.)
(=========================================================================)
(*************************************************************************)
(UPDATE 1.2 START)
(The variable [dsa control] will hold flags enabling the caller to control
certain runtime behaviours of the Dynamic Storage Allocator.)
(-------------------------------------------------------------------------)
(If the following flag is set, memory always is allocated from the memory
pool and never reused from free chunks. In certain circumstances, this can
be exploited to make advantage of the L1 data cache.)
DSA CONTROL NO REUSE = 0000 0001h;
(-------------------------------------------------------------------------)
(Theoretically, the other bits within the [dsa status] variable are
free to be used. However, usage is discouraged.)
(UPDATE 1.2 END)
(*************************************************************************)
(*****************************************************************************)
"variables" (Convention: "dsa ...", lower case)
(=========================================================================)
(Error condition as per DSA ERR XXX constants.)
(-------------------------------------------------------------------------)
dsa error = 0;
(=========================================================================)
(=========================================================================)
(Flags about the current status of the library. The bits are given in
the DSA STATUS XXX constants.)
(-------------------------------------------------------------------------)
dsa status = 0;
(=========================================================================)
(=========================================================================)
(The AVL Controller's nodes must be stored somewhere. The variables of the
AVL library itself tell what is stored where. They are:
avl tree top exclusive the given address
avl tree extent bottommost node of the AVL controller
The AVL library variable [avl tree bottom] holds the maximum extent
towards the bottom to which the AVL tree may grow. Since this is not a
static value, but changes with ongoing allocations and deallocations, it
is programmatically manipulated within this library, as soon as such is
required. However, it still is neccessary, that we store the physically
bottommost address of the dedicated RAM somewhere, and the following
variable is this place. Initially, it is identical with [avl tree bottom].)
(-------------------------------------------------------------------------)
dsa ram bottom = 0;
(=========================================================================)
(=========================================================================)
(When usage of this library is not needed any longer, the memory dedicated
to it should be given back to the OS by calling 'Dsa Destroy'. For this
reason, the original content of [RAM Top] when calling 'Dsa Create' must
be restored. To be restored, it must be saved somewhere. This is done in
this variable during creation.)
(-------------------------------------------------------------------------)
dsa old ram top = 0; (Only valid if DSA STATUS INIT is set)
(=========================================================================)
(=========================================================================)
(This variable stores the header address of the physically last memory
chunk below [avl tree bottom]. A content of -1 signals, that there is no
such chunk yet/anymore.)
(-------------------------------------------------------------------------)
dsa last chunk = minus 1;
(=========================================================================)
(*************************************************************************)
(UPDATE 1.1 START)
(This library uses an own AVL data set completely independent from whatever
the AVL variables may contain outside of this library. The original content
is saved in these variables when a DSA routine is called. It is restored
to the original values when the called DSA routine is left.)
(-------------------------------------------------------------------------)
dsa avl error = 0;
dsa avl status = 0;
dsa avl tree top = 0;
dsa avl tree bottom = 0;
dsa avl tree root = 0;
dsa avl tree extent = 0;
(UPDATE 1.1 END)
(*************************************************************************)
(*************************************************************************)
(UPDATE 1.2 START)
(For statistical reasons, the total number of successful allocations as
well as the total number of succesful deallocations are counted. These
variables can be accessed from the outside.)
(-------------------------------------------------------------------------)
dsa num alloc = 0;
dsa num free = 0;
(UPDATE 1.2 END)
(*************************************************************************)
(*************************************************************************)
(UPDATE 1.2 START)
(Takes flags to control certain runtime behaviour aspects. The according
DSA CONTROL XXX flags can be set or cleared by the caller.)
(-------------------------------------------------------------------------)
dsa control = 0;
(UPDATE 1.2 END)
(*************************************************************************)
(*****************************************************************************)
"workspace" (Convention: "dsa ...", lower case)
(None)
(*****************************************************************************)
"programme" (Convention: "Dsa ...", Mixed case)
(=========================================================================)
(This is a library only, it can not be run stand-alone. See "DSA Test.txt"
for a test setup.)
(-------------------------------------------------------------------------)
end;
(=========================================================================)
(*****************************************************************************)
"Dsa Create"
(=========================================================================)
(Initializes a DSA environment. A call to this routine is required, before
any allocation or deallocation takes place. When the DSA is not needed
any longer, destroy it by calling 'Dsa Destroy': this will free the memory
dedicated to this library and return it to the OS.)
(-------------------------------------------------------------------------)
(Input: D: Number of dedicated memory units at 4 Bytes each, max 1G-1
Output: -
All registers unchanged.
May return errors, check [dsa error] after 'failed'. It may contain:
DSA ERR DSA EXISTS Use 'Dsa Destroy' to get rid of it
DSA ERR LIMIT EXCEEDED We can't allocate 1 GUnit or more
DSA ERR NOT ENOUGH SPACE Not even space for a single node
DSA ERR REQUESTED TOO MUCH Attempt to dedicate too much RAM
DSA ERR AVL INIT FAILED AVL tree could not be initialized
Return with 'ok' guarantees that the requested RAM has been allocated.
Returning with an error guarantees that no additional RAM was allocated.
)
(=========================================================================)
(ENTRY)
D -->; E -->;
=> Dsa Local Avl Data; (Switch to local AVL Data)
(=========================================================================)
(VERIFICATION)
(A new storage area can not be allocated as long as an existing one still
is in place. In this case, destroy the old one first by calling 'Dsa
Destroy'.)
(UPDATE 1.2. Branch prediction optimization. Handler at bottom of routine.)
? [dsa status] + DSA STATUS INIT -> Dsa Create Old Dsa Exists;
(-------------------------------------------------------------------------)
(Check, that the number of requested memory units is below 1 G-Units. This
equals 4 GB. If it is above, 32bit-OSs can't handle it.)
(UPDATE 1.2. Branch prediction optimization. Handler at bottom of routine.)
? D + C000 0000h -> Dsa Create Limit Exceeded;
(UPDATE 1.2. TECHNICAL NOTE: It is not beneficial to align the upper
boundary to the size of a cache line, which on Pentiums usually is 32 bytes
= 8 units: the by far more important thing is to keep that AVL tree's nodes
tightly together without any gaps, and being completely fragmentation-free
this is ensured by the AVL tree library itself.)
(-------------------------------------------------------------------------)
(Check, that we have at least the space for a single node. The size of a
node is stored in the AVL library's constant AVL NODE SIZE. If we don't, we
can't even create a root for the AVL tree.)
(UPDATE 1.2. Branch prediction optimization. Handler at bottom of routine.)
? D < AVL NODE SIZE -> Dsa Create Not Enough Space;
(=========================================================================)
(PROCESS)
(Formally, the boundaries are ok. Attempt to create dynamic space to be
used by this library. However, because we need to restore the current
RAM pointer later on when calling 'Dsa Destroy', we must store its
content for later retrieval.)
E = [RAM Top]; (To become bottom of dedicated RAM)
[dsa old ram top] = E; (Restored to this with 'Dsa Destroy')
D + E; (To become top of dedicated RAM)
[RAM Top] = D; (Tell the kernel about new top value)
isocall; (Attempt to set dedicated RAM)
(UPDATE 1.2. Branch prediction optimization. Handler at bottom of routine.)
? failed -> Dsa Create Ram Not Allocated;
(Further error exists within this routine from now on first need to set
back the just allocated RAM.)
(-------------------------------------------------------------------------)
(Store the physical bottom of the memory area being managed by this lib.)
[dsa ram bottom] = E; (Library may access this, but not below)
(-------------------------------------------------------------------------)
(Now create the still empty AVL Controller.)
(D = D;) (Top of RAM)
(E = E;) (Bottom of RAM)
=> Avl Init; (Create AVL Controller)
(UPDATE 1.2. Branch prediction optimization. Handler at bottom of routine.)
? failed -> Dsa Create Avl Controller Failed;
(-------------------------------------------------------------------------)
(Everything was fine. Set internal flag to tell that a dedicated memory
area was successfully created and the AVL controller initialized. All other
flags remain clear. The DSA is ready to be used now.)
[dsa status] | DSA STATUS INIT; (Only flag set is DSA STATUS INIT)
(-> Dsa Create Exit OK;)
(=========================================================================)
(EXIT)
"Dsa Create Exit OK"
=> Dsa Global Avl Data; (Switch to original AVL Data)
<-- E; <-- D;
[dsa error] = DSA ERR NONE;
end;
(-------------------------------------------------------------------------)
"Dsa Create Exit Failed"
=> Dsa Global Avl Data; (Switch to original AVL Data)
<-- E; <-- D;
fail;
(=========================================================================)
(UPDATE 1.2 START. Error handlers moved down from inline handling for
Branch Prediction Optimization.)
"Dsa Create Old Dsa Exists"
[dsa error] = DSA ERR DSA EXISTS;
-> Dsa Create Exit Failed;
"Dsa Create Limit Exceeded"
[dsa error] = DSA ERR LIMIT EXCEEDED;
-> Dsa Create Exit Failed;
"Dsa Create Not Enough Space"
[dsa error] = DSA ERR NOT ENOUGH SPACE;
-> Dsa Create Exit Failed;
"Dsa Create Ram Not Allocated"
[dsa error] = DSA ERR REQUESTED TOO MUCH;
-> Dsa Create Exit Failed;
"Dsa Create Avl Controller Failed"
(The AVL controller could not be created. Reset memory, it's of no use
when we return with an error.)
[dsa error] = DSA ERR AVL INIT FAILED;
[RAM Top] = [dsa old ram top];
-> Dsa Create Exit Failed;
(UPDATE 1.2 END)
(=========================================================================)
(*****************************************************************************)
"Dsa Destroy"
(=========================================================================)
(Destroys a DSA environment formerly created by 'Dsa Create'. A call to
this routine returns the formerly dedicated RAM area to the OS.)
(-------------------------------------------------------------------------)
(Input: -
Output: -
All registers unchanged. In fact, none is used.
May return errors, check [dsa error] after 'failed'. It may contain:
DSA ERR DSA NOT EXISTING 'Dsa Create' was not used/successful
DSA ERR SPACE NOT RETURNED RAM not given back to the OS
Return with 'ok' guarantees that the requested RAM has been given back to
the OS.
Returning with an error guarantees that the RAM is still alocated; if it
ever was created, that is.
)
(=========================================================================)
(VERIFICATION)
(Make sure, that there was a former successful call to 'Dsa Create'. If
there was none, or if it was not successful, we really should not reset
the RAM to an aribitrary content and leave it as it is.)
(UPDATE 1.2. Branch prediction optimization. Handler at bottom of routine.)
? [dsa status] - DSA STATUS INIT -> Dsa Destroy No Dsa;
(=========================================================================)
(PROCESS)
(During 'Dsa Create', the old content of [RAM Top] was saved into the
variable [dsa old ram top]. It is reset to this value now, discarding the
whole content which may be in there.)
[RAM Top] = [dsa old ram top];
isocall;
(I would not know why the OS would not want to have memory back, but
anyway, if it fails, we return with an error.)
(UPDATE 1.2. Branch prediction optimization. Handler at bottom of routine.)
? failed -> Dsa Destroy Memory Not Returned;
(-------------------------------------------------------------------------)
(Reflect, that we have no more DSA area now by clearing the status bit.
This makes it ready for another call to 'Dsa Create'.)
[dsa status] # DSA STATUS INIT;
(=========================================================================)
(EXIT)
"Dsa Destroy Exit OK"
[dsa error] = DSA ERR NONE;
end;
"Dsa Destroy Exit Failed"
fail;
(=========================================================================)
(UPDATE 1.2 START. Error handlers moved down from inline handling for
Branch Prediction Optimization.)
"Dsa Destroy No Dsa"
[dsa error] = DSA ERR DSA NOT EXISTING;
-> Dsa Destroy Exit Failed;
"Dsa Destroy Memory Not Returned"
[dsa error] = DSA ERR SPACE NOT RETURNED;
-> Dsa Destroy Exit Failed;
(UPDATE 1.2 END)
(=========================================================================)
(*****************************************************************************)
"Dsa Alloc"
(=========================================================================)
(Allocates a memory chunk of desired size in units, returning the address
of the first unit of the userdata area. Make no assumptions about where
this address is: it can be a re-used memory area, which was deallocated
quite a while ago. Never write content beyond the limits of the returned
area: this in any case leads to destroyed pointers with a guaranteed
failure.)
(-------------------------------------------------------------------------)
(Input: D: Number of units at 4 Bytes each, theoretical max 1G-1
Output: E: Address of allocated chunk, if no error occured. Don't use
this value if there was an error.
NOTE: It is possible to disallow reusage of deallocated memory by setting
the flag DSA CONTROL NO REUSE in the variable [dsa control]. Use this with
caution, as allocations will be done exclusively from the memory pool, thus
increasing fragmentation . Usage is recommended only for such chunks which
must be held closely together to take advantage of the processor's cache
mechanisms.
All registers except E unchanged.
May return errors, check [dsa error] after 'failed'. It may contain:
DSA ERR LIMIT EXCEEDED We can't request 1 GUnit or more
DSA ERR DSA NOT EXISTING 'Dsa Create' was not used/successful
DSA ERR OUT OF MEMORY Request can not be fulfilled
DSA ERR AVL ERROR Details are in [avl error]
)
(=========================================================================)
(ENTRY)
A -->; B -->; C -->; D -->;
=> Dsa Local Avl Data; (Switch to local AVL Data)
(=========================================================================)
(VERIFICATION)
(Make sure, that there was a former successful call to 'Dsa Create'. If
there was none, or if it was not successful, an error is returned.)
(UPDATE 1.2. Branch prediction optimization. Handler at bottom of routine.)
? [dsa status] - DSA STATUS INIT -> Dsa Alloc Dsa Not Existing;
(-------------------------------------------------------------------------)
(Check, that the number of requested memory units is below 1 G-Units. This
equals 4 GB. If it is 1 GU or above, a 32 bit OS can't handle it.)
(UPDATE 1.2. Branch prediction optimization. Handler at bottom of routine.)
? D + C000 0000h -> Dsa Alloc Maximum Exceeded;
(=========================================================================)
(PROCESS)
(If there is not a single AVL node, there surely is nothing we can reuse.
There is no AVL node, if there is no root. There is no root, if the AVL
library's variable [avl status] has the AVL STATUS ROOT flag clear. This is
the case just after Init, but also when the last still allocated memory
chunk was returned to the pool, possibly after coalescing with a lower
already free chunk. In this case, allocate from memory pool.)
? [avl status] - AVL STATUS ROOT -> Dsa Alloc From Pool;
(UPDATE 1.2. If the [dsa control] variable has a set DSA CONTROL NO REUSE
flag, then new chunks are always allocated from the low end of the memory
pool. This can be exploited to organize {or reorganize} data such, that it
is better suited for caches. WARNING: if used excessively, chances are that
we run out of memory soon, even when there would be plenty of space, since
no deallocated chunks are reused in this mode.)
? [dsa control] + DSA CONTROL NO REUSE -> Dsa Alloc From Pool;
(-> Dsa Alloc Reuse;)
(-------------------------------------------------------------------------)
("Dsa Alloc Reuse")
(There is at least one node in the AVL tree. In D is the requested number
of units. Look for exact fit first, i.e. if a node with key D exists.)
(We want the address of the node, for one to access its AUX field if we
find it, and for the other to invoke a faster 'abbreviated' find successor
algorithm to find a larger free chunk which can be split. There are nodes,
else we would not be here, thus we may use the AVL-internal routine
'Avl Locate Node'. If the routine finds such a node, it will set the flag
AVL STATUS FOUND in the AVL library's status variable [avl status]. If such
a node does not exist, the flag will be clear.)
E = D; (Node key is the size)
=> Avl Locate Node; (Node address in B if found)
(UPDATE 1.2. Branch prediction optimization. Typically it is more likely
that we have an excact fit once an AVL tree exists, because 'objects' of a
specific 'class' usually won't change their size in time. Therefore we let
an exact match preferably fall through and branch only when no such size
exists.)
? [avl status] - AVL STATUS FOUND -> Dsa Alloc No Exact Fit;
(-> Dsa Alloc Exact Fit;)
(-------------------------------------------------------------------------)
(UPDATE 1.2: This code section was moved to here from further below for
branch prediction optimization.)
("Dsa Alloc Exact Fit")
(The number of requested units is in D and E. We found a node which manages
at least one chunk of exactly that requested size. Its address is in B.
The node's fields DATA and AUX hold the two ends of the size chain, i.e.
both point to a free chunk. We want to re-use the chunk with the lower
address, keeping high addresses free as long as possible. This is part of
the strategy to return upper addresses as soon as possible to the memory
pool. During deallocation, we made sure, that lowest addresses were placed
at the start of the size chain. Therefore, it is from this end that we
re-allocate the chunk. Get its header address.)
A = [B plus AVL DATA]; (DATA contains the chain start)
A & 3FFF FFFFh; (The 2 MSB belong to AUX)
(The chunk A will become allocated. There is no need anymore to have it
managed by the size node B and via it by the chunks' size chain. Remove
if from the chain. Note, that this may lead to removing size node B.)
(A = A;) (Chunk to be un-managed)
=> Dsa Remove From Node;
? failed -> Dsa Alloc Exit Failed; (May return with AVL error)
(Flag the chunk to be allocated now. Then return the userdata address of
chunk A to the caller. Done.)
[A plus DSA CHUNK PREV] | DSA ALLOCATED;
E = A; (Header address of re-allocated chunk)
E + DSA HEADER SIZE; (Userdata address)
-> Dsa Alloc Exit OK; (Done)
(-------------------------------------------------------------------------)
"Dsa Alloc No Exact Fit"
(The number of requested units is in D and E. We found no node which
manages exactly this requested size. But B nevertheless points to a node:
it would be the parent of the node with the requested size, if such a node
would be inserted. The Avl routine even tells us, which child of B such a
node would become: this was coded into the status flag AVL STATUS LEFT.
These results can be exploited for an abbreviated 'find successor'
algorithm which needs to be invoked now to possibly locate a splittable
node.)
(If the requested size was smaller than B, then it would be attached as
its left child, if larger then as its right child. If its on the left,
then B can or can not have a right subtree already, but it certainly can
not have a left child yet. Since B's right subtree must contain nodes
which all are larger than B, and since B itself must be larger than the
requested size, and because there is no left subtree of B which could
provide smaller keys, B already /is/ the node with the smallest key larger
than the requested size, i.e. the successor of the hypothetically inserted
node.)
(DO NOT DO THIS YET. Continue reading for the reason.)
(? [avl status] + AVL STATUS LEFT -> Dsa Alloc Split Chunk;)
(If the node would be appended as B's right child, then B is smaller.
If B's parent has B as its right child as well, than the parent is even
smaller than B. This is valid for all ancestors of B all the way up to the
root, as long as they have their child leading to B on the right. Should
such an ancestor have its child on the left, though, then that ancestor is
larger than B and all the ancestors in between the two nodes. Thus it
suffices to follow B's ancestors up until we find one having a key = size
larger than what we requested. This is done in max. lg n loop iterations,
where n are all existing nodes. Thus it is fast. If there is no such node,
then the requested size is larger than every existing node. In that case we
need to allocate from the pool. We can start the loop with B's parent,
because we know already, that B is too small. However, if B already is the
root, i.e. has no parent, we can immediately proceed with allocation from
the pool.)
(HOWEVER: Before searching, it is valuable to think about the use we make of
a possibly found node. A such manages at least one free chunk. One of these
chunks is split into 2 parts: one of these parts is going to be re-allocated
and the other becomes a shorter still free chunk. In other words, we split
one chunk into two. But two chunks require 2 headers, whereas one chunk
has only one. Now we can potentially run into a situation, that we find a
larger node, which still is not large enough to hold the additional header.
This of course is not a node we can split, when the second free part would
not even be able to hold the complete header. But also if it was just able
to hold the header we can not really make use of it: after all, what sense
does a header make 'managing' 0 userdata units? An example: Say, headers
are 4 units. We want to assign 1000 units. There is no node managing 1000
units. But the successor node is a node managing 1004 units. Now, although
the node says it manages a chunk with 1004 units, the chunk actually
occupies 1008 units due to its header. When we re-allocate the requested
1000 units, we occupy 1004 due to the header. That leaves 4 units free for
the second still free chunk, but 4 minus /its/ header leaves 0 units to the
user. No user will allocate 0 units, and so we have created a logical
fragmentation: 4 units managing nothing, but still blocked from further
usage. - For this reason, we actually search for a node, which is large
enough to provide an additional header and at least 1 userdata unit. The
registers D and E both have the requested units. The original request is
left in D, the key for the node we effectively search is calculated into E.)
E + DSA HEADER SIZE; (Space for additional header)
E +; (Space for at least 1 userdata unit)
(Now this has quite severe implications. Especially is it possible, that a
Node with size E effectively exists. We have no guarantee that we by
coincidence hit it with outlined algorithm. We may in fact be in a
completely wrong part of the AVL tree. We /can/ find the node with key>=E,
but this potentially involves climbing up the tree and descending on
another part and definitely is not the fastest possible search. It in fact
is more efficient to simply search again, but this time for node E. If
there is such a node, we nevertheless jump to the splitter, and not to an
exact match, since it is none relative to D. If we don't find such a node,
we distinguish again between the left and the right side. If left, we found
the node to be split. If right, /then/ we can search for a node > E by
examining its parents, as outlined above.)
(But since we have already a B, we take advantage of it. If by coincidence
a new node with key E would be inserted to the left of node B, /and/ B's
key still is equal to or larger than E, then a new search would provide no
other result into B. In this case we omit the new search and take the fast
lane.)
(UPDATE 1.2. Analysis: Typically, both conditional branches are fall-through,
with estimated percentages of 75% and 99.8%, resp.)
? [avl status] - AVL STATUS LEFT -> Dsa Alloc Find Node; (to the right)
? [B plus AVL KEY] >= E -> Dsa Alloc Split Chunk; (left, and size ok)
(Search for the required minimum size to allow for an additional header and
at least 1 userdata unit.)
"Dsa Alloc Find Node"
(E = E;) (Key is the adjusted minimum size)
=> Avl Locate Node; (Node address in B if found)
(If node found, or if not found but potential left child, then we split
one of the chunks managed by node B. Note, that we /split/ even if such a
node was found: we were not searching for an exact size D anymore, but
for at least a size E which /allows/ meaningful splitting.)
(UPDATE 1.2. Analysis: Typically we don't find such a node for the vast
majority of all cases and thus fall through.)
? [avl status] + AVL STATUS FOUND plus AVL STATUS LEFT
-> Dsa Alloc Split Chunk;
(Search for a node being larger than the adjusted minimum size as
described above. Being an AVL tree, this requires maximum 1.45 lg n
iterations, with n being all nodes in the tree. If we tested the root, then
such a node does not exist, which means, that the requested size is larger
than every free memory area; in that case we need to allocate from the
memory pool.)
(UPDATE 1.2. Analysis: Climbing up to the root is a rare event and thus we
fall through. Iteration of the loop is depending on the tree's height.
Since it's an AVL tree, it's height is generally not tall: typically 3
iterations are common. Because it's backwards, the conditional jump is
predicted taken, which suits us well.)
"Dsa Alloc Search Node" (Loop start)
? [B plus AVL PARENT] = minus 1 -> Dsa Alloc From Pool; (Is root)
B = [B plus AVL PARENT]; (B gets B's parent)
? [B plus AVL KEY] < E -> Dsa Alloc Search Node; (Still too small)
(-> Dsa Alloc Split Chunk;) (Found larger)
(-------------------------------------------------------------------------)
(UPDATE 1.2. Additional code flow comment: All calls having entered the
previous code section at 'Dsa Alloc No Exact Fit' meet here again.)
"Dsa Alloc Split Chunk"
(The number of requested units is in D. We found no node which manages
exactly this requested size, but a node with the smallest size E larger
than what is needed to allow for a meaningful splitting is in B. This node
manages at least one free chunk with that too large size, of which we need
to split one into two parts. One of these two parts becomes the new
allocated chunk, and the other remains a free chunk, with a smaller size
than before, though.)
(The first decision is, from which one of the two ends the node B points to
we detach the chunk to be re-used. The start of the chain in the node's
field DATA points to a comparably low memory address: this was made sure
when the chunk was deallocated. The end in the AUX field tendentially
points to higher addresses. In order to allow the memory pool to recollect
as many of the chunks at higher addresses as possible, we leave the upper
ones untouched.)
(Save number of requested units: we need register D.)
C = D;
(The start of the size chain is in the node's DATA field. Read it.)
D = [B plus AVL DATA]; (The node's data field)
D & 3FFF FFFFh; (The 2 MSB belong to its AUX field)
(Having the chunk's address, we are faced with a second decision: which
side of the chunk do we declare to be the allocated one and which part the
free one? Both ends of this chunk can not border to a free chunk: if they
did, we would have missed a coasceling possibility. But we made sure, that
we did not miss any such possibility, thus we don't need to re-check for
coalescing once again. This could tempt us to think that it does not
matter, which part we declare to take a certain role, but it does. If we
declared the upper part to be the free one, no matter how small it is,
we provide better possibilities for the memory pool to reintegrate that
part somewhen. Would we re-allocate from the chunk's upper part, then its
allocated part somewhen may share its upper border with the pool, blocking
it from coalescing with the lower free chunk. - For this reason, we make
the chunk's lower end the allocated part, and its upper part the free
chunk.)
(In any case does the found chunk D need no more size management, at least
not for its current size. This may lead to removing the size node B all-
together.)
A = D; (Chunk to be un-managed)
=> Dsa Remove From Node; (May remove the node)
? failed -> Dsa Alloc Exit Failed; (May return with AVL error)
(Node B may not exist anymore now. This is the case, when chunk D was the
only chunk of this size.)
(Time to split the chunk D into two chunks now. This is done by simply
inserting a header after the designated allocated part and setting the
chunk chain pointers accordingly. The effective size of the 'lower'
allocated chunk is the requested size plus the size of its already
existing header. Added to the address of the chunk which is being split
we get the header address of the new upper chunk.)
A = C; (Requested units)
A + D; (Header address of new 'upper' chunk)
A + DSA HEADER SIZE; (Size of 'lower' allocated chunk)
(Adjusting the chunk chain pointers.)
(*************************************************************************)
(UPDATE 1.2 START. Instruction scheduling optimization. We had 3 Write/Read
stalls and 1 Write/Write stall. Execution of the 5 instructions required
3.5 cycles. Was:)
(U ) (Previous instruction in U)
(E = [D plus DSA CHUNK NEXT];) ( V ) (Address of next chunk)
([A plus DSA CHUNK NEXT] = E;) (U WRS) (New upper chk to next chk)
([E plus DSA CHUNK PREV] = A;) (U WRS) (Points back to new upper chk)
([D plus DSA CHUNK NEXT] = A;) ( V ) (Lower chk points to upper chk)
([A plus DSA CHUNK PREV] = D;) (U WRS) (And upper chk to lower)
(U WWS) (Next instruction in U again)
(New: executes in 3 cycles, despite using an additional instruction; and we
have no stalls anymore. Register B can be used, we don't need it anymore.)
(U ) (Previous instruction in U)
E = [D plus DSA CHUNK NEXT]; ( V) (Address of next chunk)
B = A; (U ) (Make B synonymous to A)
[A plus DSA CHUNK NEXT] = E; ( V) (New upper chk to next chk)
[E plus DSA CHUNK PREV] = B; (U ) (Points back to new upper chk)
[B plus DSA CHUNK PREV] = D; ( V) (Upper chk points to lower chk)
[D plus DSA CHUNK NEXT] = A; (U ) (Lower chk points to upper chk)
( V) (Next instruction in V)
(UPDATE 1.2 END)
(*************************************************************************)
(The upper chunk's size chain pointers are not defined yet, since we did
not yet execute node management for it. However, that does not neccessarily
mean, that there is no size node for that size. We do size managemnent
for A in a few lines, but need to initialize it.)
[A plus DSA SIZE PREV] = DSA CHAIN END;
(UPDATE 1.2. Instruction scheduling optimization.)
[D plus DSA CHUNK PREV] | DSA ALLOCATED; (Moved from below)
[A plus DSA SIZE NEXT] = DSA CHAIN END;
(The lower chunk needs to be treated as allocated from now on. It is re-
used.)
(UPDATE 1.2. Moved to above.)
([D plus DSA CHUNK PREV] | DSA ALLOCATED;) (Moved to above)
(Because D is allocated now, it was removed from any size chain it may have
been integrated into. Its size pointers must have been set to DSA CHAIN END
already in above 'Dsa Remove From Node'.)
([D plus DSA SIZE PREV] = DSA CHAIN END;) (Already done)
([D plus DSA SIZE NEXT] = DSA CHAIN END;) (Already done)
(Also, we provided A for E's CHUNK PREV field, replacing the former back
link to D. That back link to D had the DSA ALLOCATED flag, but A of course
had not. Since E is the chunk next to 'old' D, now split into new D and A,
it always was and must remain allocated. We need to mask in DSA ALLOCATED
also for E now, or E would wrongly be regarded as a deallocated chunk.)
[E plus DSA CHUNK PREV] | DSA ALLOCATED;
(Into A's CHUNK PREV field we provided D. Since D was a free chunk, and A
is the remaining free part, the DSA ALLOCATED was not set for A, and also
may not be set. We can leave it as it is.)
(Now 'old' chunk D was split into chunks D and A. What's still missing is
the size chain update for the /free/ chunk A. Chunk D may not be
integrated into the size chain, because it was and still is allocated and
not free. A's header address needs to be provided to the routine
'Dsa Add To Node' in E, and its /usable/ size needs to be calculated into
A. Note, that it was made sure, that this usable size is at least 1.)
E <> A; (E: Header address; A: Next chunk)
A - E; (Total size upper chunk, incl. header)
A - DSA HEADER SIZE; (Usable size of chunk)
=> Dsa Add To Node; (May create a new node)
(UPDATE 1.2. Branch prediction optimization. Handler at bottom of routine.)
? failed -> Dsa Alloc Upper Split Not Added;
(Now return the userdata address of the re-used lower chunk to the caller.
The header address is still in D, but needs to be returned in E.)
E = D; (Header address reallocated chunk)
E + DSA HEADER SIZE; (Userdata address)
-> Dsa Alloc Exit OK; (Done)
(-------------------------------------------------------------------------)
(UPDATE 1.2. Code flow comment. With the introduction of the [dsa control]
variable and its flag DSA CONTROL NO REUSE, we may jump to this code section
right from the routine's start, should that flag be set.)
"Dsa Alloc From Pool"
(There is no suitable node in the AVL tree. Allocate from the memory pool
between the AVL tree and the memory chunks, if there is enough free space
to accomodate the request. In D is the requested number of units.)
(D = D;) (Size of future memory chunk)
=> Dsa Make Top Chunk; (Header Address new chunk in E)
? failed -> Dsa Alloc Exit Failed; ([dsa error] already loaded)
E + DSA HEADER SIZE; (Userdata address allocated chunk)
(-> Dsa Alloc Exit OK;) (Done)
(=========================================================================)
(EXIT)
"Dsa Alloc Exit OK"
(UPDATE 1.2. For statistical purposes.)
[dsa num alloc] +; (Counting successful allocations)
=> Dsa Global Avl Data; (Switch to original AVL Data)
<-- D; <-- C; <-- B; <-- A;
[dsa error] = DSA ERR NONE;
end;
(-------------------------------------------------------------------------)
"Dsa Alloc Exit Failed"
=> Dsa Global Avl Data; (Switch to original AVL Data)
<-- D; <-- C; <-- B; <-- A;
fail;
(=========================================================================)
(UPDATE 1.2 START. Error handlers moved down from inline handling for
Branch Prediction Optimization.)
"Dsa Alloc Dsa Not Existing"
[dsa error] = DSA ERR DSA NOT EXISTING;
-> Dsa Alloc Exit Failed;
"Dsa Alloc Maximum Exceeded"
[dsa error] = DSA ERR LIMIT EXCEEDED;
-> Dsa Alloc Exit Failed;
"Dsa Alloc Upper Split Not Added"
[dsa error] = DSA ERR AVL ERROR; (In [avl error] is the reason)
-> Dsa Alloc Exit Failed;
(UPDATE 1.2 END)
(=========================================================================)
(*****************************************************************************)
"Dsa Free"
(=========================================================================)
(Deallocates a memory chunk at a valid address. Valid addresses are those
which were returned by the 'Dsa Alloc' routine. Deallocation includes
immediate coalescing on both sides, possibly returning large upper parts
back to the memory pool. Size nodes are updated to accurately reflect free
chunks available for re-use. Chunk and size pointers are updated to ensure
fast best-fit re-allocation.)
(-------------------------------------------------------------------------)
(Input: E: Deallocate memory chunk allocated at address E
Output: -
All registers unchanged.
May return errors, check [dsa error] after 'failed'. It may contain:
DSA ERR LIMIT EXCEEDED We can't free addresses >= 1 GUnit
DSA ERR DSA NOT EXISTING 'Dsa Create' was not used/successful
DSA ERR OUT OF MEMORY Request can not be fulfilled
DSA ERR AVL ERROR Details are in [avl error]
DSA ERR NO ALLOCATED CHUNK Attempt to free a not allocated chunk
)
(=========================================================================)
(ENTRY)
A -->; B -->; C -->; D -->; E -->;
=> Dsa Local Avl Data; (Switch to local AVL Data)
(=========================================================================)
(VERIFICATION)
(Make sure, that there was a former successful call to 'Dsa Create'. If
there was none, or if it was not successful, an error is returned.)
(UPDATE 1.2. Branch prediction optimization. Handler at bottom of routine.)
? [dsa status] - DSA STATUS INIT -> Dsa Free Dsa Not Existing;
(-------------------------------------------------------------------------)
(We work with the header address, not the provided userdata address.)
E - DSA HEADER SIZE;
(-------------------------------------------------------------------------)
(Valid addresses are between [dsa ram bottom] incl. and [avl tree bottom]
excl. If the caller wants to deallocate beyond these limits, it may be
everything but an allocated chunk.)
? E >= [avl tree bottom] -> Dsa Free Maximum Failure;
(UPDATE 1.2. Branch prediction optimization. Handler at bottom of routine.)
? E < [dsa ram bottom] -> Dsa Free Maximum Failure;
("Dsa Free Address Valid")
(-------------------------------------------------------------------------)
(The field CHUNK PREV of the actual chunk must have its MSB set. If it has
not, then this is not a validly allocated chunk. It's a marginal test only,
but after all it's the callers responsibility to call this routine with
valid addresses, which were returned from 'Dsa Alloc'.)
(UPDATE 1.2. Branch prediction optimization. Handler at bottom of routine.)
? [E plus DSA CHUNK PREV] - DSA ALLOCATED -> Dsa Free Is Not Allocated;
("Dsa Free Is Allocated")
(=========================================================================)
(PROCESS)
(The allocation flag can be removed now.)
[E plus DSA CHUNK PREV] # DSA ALLOCATED;
(-------------------------------------------------------------------------)
(Check, if we can return the actual chunk to the memory pool. This is the
case, when the actual chunk is the topmost chunk. It is the topmost chunk,
when CHUNK NEXT contains DSA CHAIN END.)
(UPDATE 1.2. Branch prediction optimization. Typically it is rare to
deallocate the topmost chunk, so fall-through for not topmost chunks is
preferred.)
? [E plus DSA CHUNK NEXT] = DSA CHAIN END -> Dsa Free Topmost Chunk;
(-------------------------------------------------------------------------)
("Dsa Free Not Topmost Chunk")
(The actual chunk E is not the topmost chunk. Check, if there are neigh-
boring chunks, which are already deallocated. If there are, they are
coalesced with the actual one.)
(First check the chunk above the actual one. Such a chunk must exist, or
we would not have entered this code section 'Dsa Free Not Topmost Chunk'.
If it still is allocated, i.e. in use, we don't coalesce it with the actual
chunk.)
A = [E plus DSA CHUNK NEXT]; (Check next chunk above the actual)
(UPDATE 1.2. Code flow comment: there is no clear preference for either
branch, so we can't really exploit branch prediction; too much depends on
actual circumstances. It fits the logical code flow more to fall-through
when we coalesce, i.e. when the above chunk is not allocated anymore.)
? [A plus DSA CHUNK PREV] + DSA ALLOCATED -> Dsa Free No Next Coalesc;
(-------------------------------------------------------------------------)
(The next 'upper' chunk is free. It can be coalesced with the actual
chunk. Registration of the new free size is done later, when we jump to
label 'Dsa Free No Prev Coalesc', after probably also coalescing with the
previous 'lower' chunk. This section's tasks are: to completely
remove the already free 'upper' chunk from the size chain by shortcutting
its up to two neighbors, if needed adjusting the size node's end entries;
to remove the /actual/ chunk from the chunk chain by shortcutting its
neighbors; and lastly ensuring an unchanged address in E before we continue
with checking for bottom coalescing.)
(Eliminate the previous 'upper' chunk A from the size chain. This may
lead to removing the size node.)
(A = A;) (Address of chunk to be unlinked)
=> Dsa Remove From Node; (Don't keep it listed anylonger)
? failed -> Dsa Free Exit Failed; (Error code already loaded)
(Eliminate the same upper chunk A from the chunk chain. This is contrary
to bottom coalescing, where the actual chunk E is detached.)
E -->; (Need actual chunk back)
E = A; (Address of chunk to be unlinked)
=> Dsa Bypass Chunk;
<-- E; (Restore actual chunk)
? failed -> Dsa Free Exit Failed; (Error code already loaded)
(After coalescing with the next 'upper' chunk, check for further coalescing
with the previous 'lower' chunk.)
(-> Dsa Free No Next Coalesc;)
(-------------------------------------------------------------------------)
(UPDATE 1.2. Code flow comment: Whether coalesced with the upper chunk or
not, both branches meet here again.)
"Dsa Free No Next Coalesc" (Also jumped to if upper not free)
(Check, if there is a previous 'lower' chunk. Such a chunk does not
neccessarily exist, namely not then, when the actual chunk is the bottom-
most, which is a rare event. If it exists, check if still allocated. If so,
we can't coalesce it with the actual chunk.)
? E = [dsa ram bottom] -> Dsa Free No Prev Coalesc;
A = [E plus DSA CHUNK PREV]; (Check previous chunk)
(UPDATE 1.2. Code flow comment: as above, there is no clear preference for
either branch, so we can't exploit branch prediction. It fits the logical
code flow more to fall-through when we coalesce, i.e. when the lower chunk
is not allocated anymore.)
? [A plus DSA CHUNK PREV] + DSA ALLOCATED -> Dsa Free No Prev Coalesc;
(-------------------------------------------------------------------------)
(The previous 'lower' chunk is free. It can be coalesced with the actual
chunk. Registration of the new free size is done later, when we jump to
label 'Dsa Free No Prev Coalesc'. This section's tasks are: to completely
remove the already free 'lower' chunk from the size chain by shortcutting
its up to two neighbors, if needed adjusting the size node's end entries;
to remove the /actual/ chunk from the chunk chain by shortcutting its
neighbors; and to provide a new lower address into E for a proper con-
tinuation.)
(Eliminate the previous 'lower' chunk A from the size chain. This may
lead to removing the size node.)
(A = A;) (Address of chunk to be unlinked)
=> Dsa Remove From Node; (Don't keep it listed anylonger)
? failed -> Dsa Free Exit Failed; (Error code already loaded)
(Eliminate the /actual/ chunk E from the chunk chain. This is different
from top coalescing: here in bottom coalescing, E is the upper of the
two chunks; there it was the lower of the two.)
(E = E;) (Address of chunk to be unlinked)
=> Dsa Bypass Chunk;
? failed -> Dsa Free Exit Failed; (Error code already loaded)
(Now the actual free chunk E can not be found anymore, but A still is in
the chunk chain, although not in any size chain anymore. From now on, we
treat the two chunks as a single coalesced chunk at the new address A.)
E = A; (Address of larger actual chunk)
(-> Dsa Free No Prev Coalesc;) (Node management new size)
(-------------------------------------------------------------------------)
(UPDATE 1.2. Code flow comment: Whether coalesced with the lower chunk or
not, both branches meet here again.)
"Dsa Free No Prev Coalesc" (Also jumped to if lower not free)
(Coalescing was done, if any. In any case does E have the header address of
the, possibly doubly coalesced, free chunk, i.e. it may be lower than the
original actual chunk, when there was bottom coalescing or double
coalescing. If there was top coalescing or no coalescing at all, E still
will point to the actual chunk. Both chunk pointers were modified, if
coalescings were applied, so that the size of the now possibly larger chunk
can be calculated. We do this now. We need the chunk's size as a key to the
size node.)
(Calculate the size of the memory chunk to be deallocated. This is done
by subtracting the actual chunk's header address from the next chunk's
header address. An upper chunk must exist or we would have had no need to
check for coalescings in the first place, but would have returned the
chunk already to the memory pool.)
A = [E plus DSA CHUNK NEXT]; (A will take chunk's size)
A - E; (Chunk's size in A, incl header)
A - DSA HEADER SIZE; (Size excluding the header)
(Doing the node management for this free chunk. This includes updating
an existing or creating a new size node, resp., and update of both pointers
of the size chain for all involved chunks.)
(E = E;) (Header address of free chunk)
(A = A;) (Usable size of free chunk)
=> Dsa Add To Node;
(May return with an error. If there was one, the error code is already in
[dsa error].)
? failed -> Dsa Free Exit Failed; (Error code already loaded)
(UPDATE 1.2. Code flow optimization. The vast majority of all calls to
'Dsa Free' took this branch, thus we save an unconditional jump to the
ordinary end of the routine by relocating the end from the routine's bottom
to here at the cost of a not so intuitive code legibility.)
(-> Dsa Free Exit OK;) (New: fall through)
(=========================================================================)
(EXIT)
(UPDATE 1.2 START: Code section moved from routine's end to take advantage
from fall-through saving a jump for the vast majority of all cases.)
"Dsa Free Exit OK" (Also jumped to from below)
=> Dsa Global Avl Data; (Switch to original AVL Data)
(UPDATE 1.2. For statistical purposes.)
[dsa num free] +; (Counting successful deallocations)
<-- E; <-- D; <-- C; <-- B; <-- A;
[dsa error] = DSA ERR NONE;
end;
(UPDATE 1.2 END)
(=========================================================================)
(UPDATE 1.2. Code flow comment: This typically is a rare case.)
"Dsa Free Topmost Chunk" (Typically rare case)
(This is the topmost chunk. It can be returned to the memory pool. If
there is a free chunk already just below the actual chunk, that one can be
returned as well, and since it is managed by a size node, the size node
also needs to be reduced/removed. First we handle the actual chunk's
previous chunk. See if it exists, and if so, if it was deallocated
already.)
A = [E plus DSA CHUNK PREV]; (Header address previous chunk)
(UPDATE 1.2. Code flow comment: Both conditional branches typically not
taken.)
? [A plus DSA CHUNK PREV] + DSA ALLOCATED -> Dsa Free Return To Pool;
? A = DSA CHAIN END -> Dsa Free Return To Pool;
(A previous 'lower' chunk exists and was already deallocated. It can be
returned to the memory pool as well. However, first we need to actualize
the relevant size node.)
(Eliminate the previous 'lower' chunk A from the size chain. This may
lead to removing the size node.)
(A = A;) (Address of chunk to be unlinked)
=> Dsa Remove From Node; (Don't keep it listed anylonger)
? failed -> Dsa Free Exit Failed; (Error code already loaded)
(Because the actual chunk E still exists, the removal of chunk A from the
size chain did not also remove it from the chunk chain, i.e. its CHUNK NEXT
field still points to E. Likewise does E's CHUNK PREV field still refer
to A. However, we do not need to adjust any of these links: A's predecessor
will be made the end of the chunk chain below, so that neither A nor E
can be found anymore. If there is no such predecessor, then the whole
dedicated memory will become free once again.)
(All memory at and above the just removed previous 'lower' chunk can be
returned to the memory pool now.)
E = A;
(-> Dsa Free Return To Pool;)
(-------------------------------------------------------------------------)
(UPDATE 1.2. Code flow comment: Whether there was an already free chunk
below the actual chunk or not, both branches meet here again.)
"Dsa Free Return To Pool"
(The area at E and above can be returned to the memory pool. Should a
previous chunk have existed, which was already deallocated, then all needed
node chain updates and possibly even a node removal were already done. In
this case, E points to the header of that previous chunk, and not to the
actual chunk anymore.)
[avl tree bottom] = E; (Memory pool grows downwards)
(If E is the bottommost chunk now, then all the memory is available once
again and not a single chunk remains allocated.)
(UPDATE 1.2. Branch prediction optimization. E typically is not the bottom-
most chunk, thus we change the instructions to take advantage of the branch
prediction mechanism.)
? E = [dsa ram bottom] -> Dsa Free Bottommost Chunk;
("Dsa Free Not Bottommost")
(E's previous chunk, which must be an allocated chunk, will become the end
of the chunk chain now. It also will be registered as the last so far
allocated chunk.)
E = [E plus DSA CHUNK PREV]; (Previous chunk is new last chunk)
(Because E before the allocation was a free chunk, its CHUNK PREV field
can not have the DSA ALLOCATED flag. Thus, after allocation E never
has it as well and we need not to clear it explicitely. However, after the
allocation, E's previous chunk is /always/ an allocated chunk and thus must
have the DSA ALLOCATED flag.)
(E & FFFF FFFFh minus DSA ALLOCATED;) (Not needed)
(End the chunk chain at that previous chunk E.)
[E plus DSA CHUNK NEXT] = DSA CHAIN END;
(Register as the last allocated chunk.)
[dsa last chunk] = E; (New last allocated)
-> Dsa Free Exit OK; (Done. Target is above.)
(-------------------------------------------------------------------------)
(UPDATE 1.2 START. This code section was moved from further above for
branch prediction optimization. It handles a rare case.)
"Dsa Free Bottommost Chunk"
(The chunk chain fields both have DSA CHAIN END already. Tell, that there
is no last allocated chunk anymore.)
[dsa last chunk] = minus 1; (No last allocated chunk)
-> Dsa Free Exit OK; (Done. Target is above.)
(UPDATE 1.2 END)
(=========================================================================)
(EXIT)
(UPDATE 1.2: The ordinary exit label 'Dsa Free Exit OK' which could be
expected here was moved further up to take advantage of natural code flow
and save an unconditional jump for the vast majority of all cases.)
(-------------------------------------------------------------------------)
"Dsa Free Exit Failed"
=> Dsa Global Avl Data; (Switch to original AVL Data)
<-- E; <-- D; <-- C; <-- B; <-- A;
fail;
(=========================================================================)
(UPDATE 1.2 START. Error handlers moved down from inline handling for
Branch Prediction Optimization.)
"Dsa Free Dsa Not Existing"
[dsa error] = DSA ERR DSA NOT EXISTING;
-> Dsa Free Exit Failed;
"Dsa Free Maximum Failure"
[dsa error] = DSA ERR LIMIT EXCEEDED;
-> Dsa Free Exit Failed;
"Dsa Free Is Not Allocated"
[dsa error] = DSA ERR NO ALLOCATED CHUNK;
-> Dsa Free Exit Failed;
(UPDATE 1.2 END)
(=========================================================================)
(*****************************************************************************)
"Dsa Size"
(=========================================================================)
(Returns the size of an allocated chunk.)
(-------------------------------------------------------------------------)
(Input: E: Address of allocated chunk
Output: D: Size of userdata area in units, excl. header
All registers unchanged.
May return errors, check [dsa error] after 'failed'. It may contain:
DSA ERR LIMIT EXCEEDED Such an address was never allocated
DSA ERR DSA NOT EXISTING 'Dsa Create' was not used/successful
DSA ERR NO ALLOCATED CHUNK Such an address was never allocated
)
(=========================================================================)
(ENTRY)
E -->;
=> Dsa Local Avl Data; (Switch to local AVL Data)
(=========================================================================)
(VERIFICATION)
(Make sure, that there was a former successful call to 'Dsa Create'. If
there was none, or if it was not successful, an error is returned.)
(UPDATE 1.2. Branch prediction optimization. Handler at bottom of routine.)
? [dsa status] - DSA STATUS INIT -> Dsa Size Dsa Not Existing;
(-------------------------------------------------------------------------)
(Addresses of 1GUnits = 4 GB or more can not be handled, thus were never
returned by 'Dsa Alloc'. In this case the caller provides invalid input.
Tell him if so.)
(UPDATE 1.2. Branch prediction optimization. Handler at bottom of routine.)
? E + C000 0000h -> Dsa Size Limit Exceeded;
(-------------------------------------------------------------------------)
(We work with the header address, not the provided userdata address.)
E - DSA HEADER SIZE;
(-------------------------------------------------------------------------)
(The field CHUNK PREV of the actual chunk must have its MSB set. If it has
not, then this is not a validly allocated chunk. It's a marginal test only,
but after all it's the callers responsibility to call this routine with
valid addresses.)
(UPDATE 1.2. Branch prediction optimization. Handler at bottom of routine.)
? [E plus DSA CHUNK PREV] - DSA ALLOCATED -> Dsa Size Not Allocated;
(=========================================================================)
(PROCESS)
(The difference between the next 'upper' chunk, if any, and the own
address is occupied by this chunk, including header. If this is the top
chunk, then we take the upper address from [avl tree bottom].)
D = [E plus DSA CHUNK NEXT];
? D != DSA CHAIN END -> Dsa Size Got Upper;
D = [avl tree bottom];
"Dsa Size Got Upper"
D - E; (Total size of this chunk)
D - DSA HEADER SIZE; (User size of this chunk)
(=========================================================================)
(EXIT)
"Dsa Size Exit OK"
=> Dsa Global Avl Data; (Switch to original AVL Data)
<-- E;
[dsa error] = DSA ERR NONE;
end;
(-------------------------------------------------------------------------)
"Dsa Size Exit Failed"
=> Dsa Global Avl Data; (Switch to original AVL Data)
<-- E;
fail;
(=========================================================================)
(UPDATE 1.2 START. Error handlers moved down from inline handling for
Branch Prediction Optimization.)
"Dsa Size Dsa Not Existing"
[dsa error] = DSA ERR DSA NOT EXISTING;
-> Dsa Size Exit Failed;
"Dsa Size Limit Exceeded"
[dsa error] = DSA ERR LIMIT EXCEEDED;
-> Dsa Size Exit Failed;
"Dsa Size Not Allocated"
[dsa error] = DSA ERR NO ALLOCATED CHUNK;
-> Dsa Size Exit Failed;
(UPDATE 1.2 END)
(=========================================================================)
(*****************************************************************************)
"Dsa Clear"
(=========================================================================)
(Clears the userarea of an allocated chunk by writing NULL into all the
units.)
(-------------------------------------------------------------------------)
(Input: E: Address of allocated chunk
Output: -
All registers unchanged.
May return errors, check [dsa error] after 'failed'. It may contain:
DSA ERR LIMIT EXCEEDED Such an address was never allocated
DSA ERR DSA NOT EXISTING 'Dsa Create' was not used/successful
DSA ERR NO ALLOCATED CHUNK Such an address was never allocated
)
(-------------------------------------------------------------------------)
D -->; E -->;
(-------------------------------------------------------------------------)
(The initial validity tests and the first task of retrieving the size are
absolutely identical to the public routine 'Dsa Size'.)
=> Dsa Size; (User size in D if not failed)
? failed -> Dsa Clear Exit Failed; (Code already in [dsa status])
(-------------------------------------------------------------------------)
(In E is still the userdata address, i.e. the address of the first unit to
be nulled. D has the length of the userdata area. The area from D on,
including it, is nulled out now, in forward direction, because the hardware
works faster in that direction.)
"Dsa Clear Write Null"
[E] = 0; (Write 0 into unit)
E +; (Position to next unit to be nulled)
D ^ Dsa Clear Write Null; (Do for the whole size)
(-> Dsa Clear Exit OK;)
(-------------------------------------------------------------------------)
("Dsa Clear Exit OK")
<-- E; <-- D;
[dsa error] = DSA ERR NONE;
end;
"Dsa Clear Exit Failed"
<-- E; <-- D;
fail;
(=========================================================================)
(*****************************************************************************)
"Dsa Fragmentation" (Introduced in update 1.2)
(=========================================================================)
(MEMORY ANALYSE ROUTINE. Returns data about the actual memory fragmentation
in the dedicated RAM.)
(-------------------------------------------------------------------------)
(Input: -
Output: A: Free and reusable RAM below the memory pool, in units
B: Totally allocated and free RAM below the memory pool, units
C: Size of memory pool with free RAM, in units
The percentage of the fragmentation can be obtained after a call to this
routine with 'A * 100' and 'A / B'. The also free RAM in the memory pool
is returned in C, but included neither in A nor in B. For the percentage of
the overall fragmentation within the provided memory thus calculate
A * 100 / {B + C}.
NOTE: Don't use too frequent. For performance reasons, the number of free
units is not updated live: Updating this number is not just an issue of a
few additions and subtractions; it would involve its own share of logic
due to coalescing and splitting operations. Since such a life value isn't
a vital datum in general, at least not for the author, the according logic
was not implemented, in favour of an overall higher performance. However,
to satisfy the sporadic desire for such data, this function was implemented
in version 1.2. Be careful when using it, though: the function needs to
step through all chained memory chunks, which easily can count into the
millions.
All registers except A, B and C unchanged.
May return errors, check [dsa error] after 'failed'. It may contain:
DSA ERR DSA NOT EXISTING 'Dsa Create' was not used/successful
)
(=========================================================================)
(ENTRY)
D -->; E -->;
=> Dsa Local Avl Data; (Switch to local AVL Data)
(=========================================================================)
(VERIFICATION)
(Make sure, that there was a former successful call to 'Dsa Create'. If
there was none, or if it was not successful, an error is returned.)
? [dsa status] - DSA STATUS INIT -> Dsa Fragmentation Dsa Not Existing;
(=========================================================================)
(PROCESS)
(Into register A the number of free units is counted. We start with 0.)
A = 0;
(-------------------------------------------------------------------------)
(Make sure that there is an AVL root managing free memory chunks. If there
is none, then there are no free memory chunks and thus no fragmented space.
This is not an error: it merely means, that there are no reusable memory
chunks at the moment, but we still can provide output data: B and C will
be calculated after jumping.)
? [avl status] - AVL STATUS ROOT -> Dsa Fragmentation All Counted;
(-------------------------------------------------------------------------)
(The AVL tree consists of nodes, each maintaining a list of free chunks of
same size. The node's key is this size. We need to traverse the whole tree.
It does not matter which tree traversal method is used, because we don't
need the information in any particular order here: we just need to make
sure, that each node is visited once. We arbitrarily use pre-order: each
node is traversed before its children, and of the children, the left subtree
is visited before the right one.)
E = [avl tree root]; (The root address is flexible)
-> Dsa Fragmentation Follow Chunks; (Process the root)
"Dsa Fragmentation Follow Node"
(Follow left child, if any.)
E = [D plus AVL LEFT]; (If existing, process the node)
? E != minus 1 -> Dsa Fragmentation Follow Chunks;
(D has no left child. Does it have a right one? If so, follow it.)
"Dsa Fragmentation Follow Right"
E = [D plus AVL RIGHT]; (If existing, process the node)
? E != minus 1 -> Dsa Fragmentation Follow Chunks;
(D has neither a left nor a right child. We go back up.)
"Dsa Fragmentation Follow Parent"
C = D; (Need for comparison)
D = [D plus AVL PARENT]; (Need parent address in D)
(If we attempted to get the root's parent, we are done.)
? D = minus 1 -> Dsa Fragmentation All Counted;
(If C was D's left child, we follow D's right child now, if any.)
? C = [D plus AVL LEFT] -> Dsa Fragmentation Follow Right;
(C was D's right child. We need to go up the tree once again.)
-> Dsa Fragmentation Follow Parent;
(-------------------------------------------------------------------------)
"Dsa Fragmentation Follow Chunks"
(In E is the address of an AVL node. The field AVL KEY contains this node's
key. The key is the free memory chunk's size: it needs to be added once for
each existing chunk. Use a register to store the key for faster addition.)
C = [E plus AVL KEY];
(The AVL node's fields AVL DATA and AVL FLAGS contain the node's payload.
Actually, only the 30 LSBs of the field AVL DATA contain the actual data:
the 2 MSBs are bits 29-28 of the virtual AUX field, of which the 28 LSBs
can be found in the 28 MSBs of the field AVL FLAGS. Both DATA and AUX have
a size of 30 bits each. They both store addresses: DATA to the start of a
memory chunk chain, and AUX to its end. This chain lists memory chunks which
are free to be reused. Each entry points to a chunk with the same size, as
indicated in the key.)
B = [E plus AVL DATA]; (2 MSBs belong to virtual AUX)
B & 3FFF FFFFh; (30 LSBs contain start of chain)
(In B is the address of a free memory chunk. It points to that chunk's
header. The header's field DSA SIZE NEXT points to the next free memory
chunk of same size, or it contains the constant DSA CHAIN END to indicate
the end of the chain. Whichever is the case: we need to count this chunk's
size. The size was obtained from the actual AL node's header and was stored
into register C.)
"Dsa Fragmentation Count Units"
A + C; (Count free space, in units)
(If this chunk is the last chunk of size C, its field DSA SIZE NEXT
contains the value DSA CHAIN END. In this case we can stop to follow the
chain and continue with the next AVL node. Else it contains the address of
another free chunk, which needs to be made the actual one: then it is
counted as well and examined anew for the end of the chain.)
B = [B plus DSA SIZE NEXT]; (Next chunk in chain, if any)
? B != DSA CHAIN END -> Dsa Fragmentation Count Units;
(The last chunk of size C has been counted. We proceed with the next node
in the AVL tree.)
D = E; (From here we access the next node)
-> Dsa Fragmentation Follow Node;
(-------------------------------------------------------------------------)
(To here we jump when all free space was counted. The sum is in register A.
This value alone does not tell too much, though, when we want to know about
fragmentation. What we need is a relation free space : total space. Now it
is clear, that the memory pool itself always is a large block of free space:
It does make no sense to include its size in the free area. However, it may
be of interest for the caller, and so we provide its size separately in C.
More interestingly, the variable [avl tree bottom] contains the bottom end
of the memory pool. This means, that the topmost allocated memory chunk
ends just the unit before that. Subtracting this value from [dsa ram bottom]
tells us, how many units have been used so far. The result B does include
the deallocated memory chunks which we just summed up into A. This is a
most useful value, as after the division A / B the caller knows the exact
fragmentation below the memory pool. However, we don't do this division in
this routine, as it would mean the loss of two potentially interesting
values. If the caller does not need them, it is easy for him to do the
division himself.)
"Dsa Fragmentation All Counted"
B = [avl tree bottom]; (Bottom of memory pool)
C = [avl tree extent]; (Top of memory pool)
B - [dsa ram bottom]; (Size of allocated and free chunks)
C - [avl tree bottom]; (Size of memory pool)
(=========================================================================)
(EXIT)
"Dsa Fragmentation Exit Ok"
=> Dsa Global Avl Data; (Switch to original AVL Data)
<-- E; <-- D;
[dsa error] = DSA ERR NONE;
end;
"Dsa Fragmentation Exit Failed"
=> Dsa Global Avl Data; (Switch to original AVL Data)
<-- E; <-- D;
fail;
(=========================================================================)
(ERRORS)
"Dsa Fragmentation Dsa Not Existing"
[dsa error] = DSA ERR DSA NOT EXISTING;
-> Dsa Fragmentation Exit Failed;
(=========================================================================)
(*****************************************************************************)
"Dsa Cache Alignment" (Introduced in update 1.2)
(=========================================================================)
(MEMORY ANALYSE ROUTINE. For fast memory access some opimizations can be
done. As per Intel documentation, one such optimization is to make sure,
that allocated memory chunks fit into a dedicated cache line of the L1
cache and don't leap into a second line. The size of a cache line can vary
greatly in different CPU designs: between 8 and 512 Bytes. Most Pentium
CPUs have a cache line of 32 bytes = 8 units, Pentium 4 processors have a
cache line size of 64 bytes = 16 units. Feed this routine with your cache
line size to see the potential optimization. The cache line size can be
obtained from your CPU documentation or from various Internet sources;
you might want to check out wikipedia.)
(-------------------------------------------------------------------------)
(Input: A: Cache line size in UNITS
for Pentiums 8 or 16 units = 32 or 64 bytes
Output: C: Number of totally allocated memory chunks
D: Number of small memory chunks with a size <= A
E: Number of small memory chunks completely in a cache line
D is the total potential: to make full use of the cache, they all should be
in the same memory segment of A units, without crossing borders into the
next memory segment. The number E is the number of chunks which are aligned
properly. Consequently, D-E memory chunks are /not/ aligned properly, even
if they could be. If D-E is high in relation to D, you might want to
consider optimization of your data structure alignment. However, read the
following TECHNICAL NOTES on potential drawbacks.)
(-------------------------------------------------------------------------)
(TECHNICAL NOTES: An opimization attempt aligned data such, that a reusable
empty filler chunk was inserted between the current topmost chunk and the
effectively required chunk. The filler chunk had a variable length and could
encomprise more than 1 cache line. This guaranteed a 100% cache line align-
ment for all allocated chunks. However, due to the 4 units long DSA header,
with a cache line of 32 Bytes as in Pentiums, at most 4 additional units of
payload could be included into the same cache line, which is by far not
enough if the payload is a data structure itself, as is the case for example
with the dictionary library 'DIC' having a header of 4 units plus 1 to
typically max 2 units of key length. This was addressed by putting the DSA
header into the last 4 units of a cache line, allowing the user-record to
start at a cache line in its own right. When traversing the dictionary in
the DIC library's First/Next loops, i.e. lexicographically, no access to the
DSA header is needed, resulting in exactly one cache line per entry, namely
the DIC data structure plus its payload, the key. However, this proved to be
not as optimal as was without employing this mechanism: firstly, the filler
chunks, although reusable, occupy storage themselves, namely for the DSA
header plus the reusable units to align to the desired position; the AVL
data is neglicable. And secondly and more importantly: If storing dictionary
entries just sequentially without caring for cache line boundaries, some
units of the next entry are already present, although not complete. With an
average key length of approx. 1.1 units in the DIC library, a chunk for a
DIC entry including the DSA header is 4+4+1.1 = 9.1 units in size, and thus
is not fitting in a cache line completely. But immediately after them the
start of the next entry begins, avoiding the need to access another cache
line to get its start. This propagates onwards for all entries, if arranged
sequentially as is done with 'Dic Linearize'. When aligning at cache line
boundaries, though, this is not the case. There we need in /any/ case to load
another cache line to access a new entry's start. Overall, this optimization
attempt provided a performance /loss/ of 7% for DIC library's First/Next
loops, and 8% for accesses via the entry's key. For this reason, that parti-
cular optimization was discarded altogether. - CONCLUSION: Despite Intel
documentation pointing out to always align data structures at cache line
boundaries, it in fact is better to have data lined up linearily /without
any filler gaps/, even if this means to cross cache line boundaries. This
is true for all cases which require multiple accesses to a data structure,
like e.g. all binary tree types. However, Intel's recommendation should be
followed for direct-access-structures, like index tabl