Hash Function HSH 11/13 Source Code Test Program
|
|
URI:
|
http://herbert.gandraxa.com/herbert/hsh_test.asp
|
|
Link template:
|
<a href="http://herbert.gandraxa.com/herbert/hsh_test.asp">Hash Function HSH 11/13 Source Code Test Program</a>
|
|
|
Link symbols:
|
On current page |
On this site |
On external site |
Wikipedia article |
ZIP archive |
PDF |
E-Mail
|
Article
Organization
Home »
Linoleum Source Code »
Hash Function HSH 11/13 »
Source Code Test Program
Scope
This is the source code of the Hash Function HSH 11/13 Test Program, testing the Hash Function HSH 11/13 library, implemented in the
Linoleum programming language.
Author
Herbert Glarner
Usage
This test program tests long-term functionality of the
Hash Function HSH 11/13 library.
Make sure you have the library named hsh.txt in the same folder as this test program before assembling the latter.
If compiling this test as is, you will need a test file. I used a word list from Berkeley University, containing 173,528 English words, one per line. You can find it here:
Berkeley Word List. Rename the file to wordlist.txt. Make sure it resides in the same folder as the test program.
Source Code
(Test application, using the HSH Library Routines "HSH.txt".)
(Uses testfile 'wordlist.txt', containing 173,528 English words. The file is
available from Berkeley at http://aima.cs.berkeley.edu/data/wordlist. Rename
the file to worlist.txt. Make sure it resides in the same folder as the
test program.)
(*****************************************************************************)
"libraries"
(=========================================================================)
(Hash Generator. The library which is tested with this test program.)
(-------------------------------------------------------------------------)
HSH;
(=========================================================================)
(=========================================================================)
(Bitstream. Used to convert text data into a lexicographical bitstream.
NOTE: Use version 1.1 or better: Version 1.0 did not have the 'Resume'
functionality yet. You can find the actual version of the BST library at:
http://herbert.wikispaces.com/BST)
(-------------------------------------------------------------------------)
BST;
(=========================================================================)
(*****************************************************************************)
"directors"
program name = { HSH Test };
unit = 32;
(=========================================================================)
(Test screen reflecting what's allocated and what's free)
(-------------------------------------------------------------------------)
display width = HSH TEST SCREEN WIDTH;
display height = HSH TEST SCREEN HEIGHT;
(=========================================================================)
(*****************************************************************************)
"constants" (Convention: "HSH ...", UPPER CASE)
(=========================================================================)
(The testfile is a list of English plain ASCII words. It can be downloaded
from http://aima.cs.berkeley.edu/data/wordlist. It is 1,923,517 bytes large
including CR/LF characters at the end of each word. It contains 173,528
words.)
(-------------------------------------------------------------------------)
(Four bytes each occupy a unit. Of the last unit only 1 byte is used. We
need the space to read in the file into one go.)
HSH FILENAME SIZE = 1 923 517 plus 3 div 4;
HSH TEST KEYS = 173 528;
(=========================================================================)
(=========================================================================)
(Number of LSB within the calculated hash value. The more these are, the
more distinct is the distribition for the keys, i.e. the less collisions
do happen. HSH UNIQUE HASHES depends from HSH SIGNIFICANT BITS, it is
2^bits: Please calculate yourself, that's just a test program.)
(-------------------------------------------------------------------------)
HSH SIGNIFICANT BITS = 10;
HSH UNIQUE HASHES = 1024;
(HSH SIGNIFICANT BITS = 8;
HSH UNIQUE HASHES = 256;)
(=========================================================================)
(=========================================================================)
(The test screen will have as many columns as there are unique hashes.
Its height is fixed.)
HSH TEST SCREEN WIDTH = HSH UNIQUE HASHES;
HSH TEST SCREEN HEIGHT = 800;
(=========================================================================)
(*****************************************************************************)
"variables" (Convention: "hsh ...", lower case)
(=========================================================================)
(The file is 'wordlist.txt' as provided by The University of California,
Berkeley, at http://aima.cs.berkeley.edu/data/wordlist. Note, that my
file was renamed to have a proper extension.
No copyright notice was found on that site, but I do not intend to use the
list differently than as private data to feed this test program.)
(-------------------------------------------------------------------------)
hsh test filename = { wordlist.txt };
(=========================================================================)
(=========================================================================)
(Temporary storage used for representing the distribution.)
(-------------------------------------------------------------------------)
hsh test scaled exp collisions = 0;
(=========================================================================)
(=========================================================================)
(Counting minimum and maximum collisions)
(-------------------------------------------------------------------------)
hsh test min collisions = 7FFFFFFFh; (Max positive value)
hsh test exp collisions = 0;
hsh test max collisions = 0;
(=========================================================================)
(*****************************************************************************)
"workspace" (Convention: "hsh ...", lower case)
(=========================================================================)
(Space for reading the testfile)
(-------------------------------------------------------------------------)
(The 'plus 1' ensures, that there is a Null-byte, actually four of them,
after the last byte of the copies.)
hsh test inputfile = HSH FILENAME SIZE plus 1;
(=========================================================================)
(=========================================================================)
(Some space for creating a bitstream. Without checking, I assume that the
longest word within the test file does not exceed 100 units = 400
characters = 3200 bits.)
(-------------------------------------------------------------------------)
hsh test bitstream = 100;
(=========================================================================)
(=========================================================================)
(Count number of collisions per hash)
(-------------------------------------------------------------------------)
hsh test collisions = HSH UNIQUE HASHES;
(=========================================================================)
(=========================================================================)
(Display area)
(-------------------------------------------------------------------------)
hsh window = HSH TEST SCREEN WIDTH mtp HSH TEST SCREEN HEIGHT;
(=========================================================================)
(*****************************************************************************)
"programme" (Convention: "Hsh ...", Mixed case)
(=========================================================================)
(Screen init)
(-------------------------------------------------------------------------)
[Display Origin] = hsh window; (Defined in the work area)
(=========================================================================)
(=========================================================================)
(Reading in the testfile. It contains 8 bit ASCII characters. There are
1,923,517 bytes in the file.)
(-------------------------------------------------------------------------)
(Check if file exists)
[File Name] = hsh test filename;
[File Command] = TEST;
isocall;
? ok -> hsh test file found;
B = 100000;
-> Hsh Test Error;
"hsh test file found"
(-------------------------------------------------------------------------)
(Check if we may read the file)
? [File Status] + PERMIT TO READ -> hsh test file readable;
B = 100001;
-> Hsh Test Error;
"hsh test file readable"
(-------------------------------------------------------------------------)
(The testfile is read into RAM.)
[File Name] = hsh test filename;
[Block Pointer] = hsh test inputfile;
[File Command] = READ;
[File Position] = ZERO;
[Block Size] = HSH FILENAME SIZE mtp BYTESPERUNIT;
isocall;
B = [Block Size]; (Block size is in byte)
B > 2; (We want units)
(=========================================================================)
(=========================================================================)
(Initialising a bitstream area. The resulting bitstream /intentionally/ has
8 bits per character, because most applications won't fiddle around with
the bits. However, as the input file I used contains plain vanilla ASCII
characters, a 7-bit bitstream could be used: in that case set LSW to 7.)
(-------------------------------------------------------------------------)
A = 8; (PSW: Characters are in 8 bit units)
B = 8; (LSW: using 8 bit bitstreams)
C = 13; (End-of-symbols symbol is CR)
(c = 0;)
=> Bst Init;
? failed -> Hsh Test Error;
(-------------------------------------------------------------------------)
(When reading in on a 'Big Endian' system, set the following variable
/after/ the call to 'Bst Init'. When reading in on ordinary 'Little Endian'
machines, such as on Intel or AMD, do not set the flag.)
([bst status] | BST BIG ENDIAN;)
(=========================================================================)
(Yvonne)
([hsh test inputfile] = 6e 6f 76 59h;
[hsh test inputfile plus 1] = 00 00 65 6eh;)
(Herbert)
([hsh test inputfile] = 62 72 65 48h;
[hsh test inputfile plus 1] = 00 74 72 65h; )
(=========================================================================)
(Make a bitstream from the current word.)
(-------------------------------------------------------------------------)
E = hsh test inputfile; (Address to unit with start of input word)
D = hsh test bitstream; (Address of output bitstream)
=> Bst Stream; (All registers unchanged)
(=========================================================================)
(=========================================================================)
(If using the indicated word list 'wordlist.txt', we have 173,528 words
to process. The first word's bitstream was processed with 'Bst Stream',
thus we have 173,527 words left to process with 'Bst Resume'. We need to
establish a counter for them, because the end criteria was not the file's
NUL byte, but the words' CR byte. If we don't, we would continue to stream
in after the last word and happily stream bits out, which sooner or later
will lead to crash, since it may take quite a while until we accidentally
find a CR byte somewhere.)
(-------------------------------------------------------------------------)
B = HSH TEST KEYS; B -;
(=========================================================================)
(a = [hsh test bitstream]; b = [hsh test bitstream plus 1];)
(show registers;)
(=========================================================================)
(Create a hash for the actual bitstream.)
(-------------------------------------------------------------------------)
"Hsh Make Hash"
(Beginning with the MSB in unit [hsh test bitstream] we have a bitstream
of [bst bits] bits. If there are more than 32 bits, the bitstream is
continued in the MSB ! of the next unit, i.e., the complete bitstream is
readable from left ro right. This bitstream of [bst bits] is used now to
create a hash value.)
B -->;
D = hsh test bitstream; (Address bitstream to be hashed)
A = [bst bits]; (Number of bits the key consists of)
A + 31; A > 5; (Number of 32-bit units)
B = 8; (Width of symbol)
=> Hsh Create; (Hash value in E)
<-- B;
(Can not create errors, don't query 'ok' or 'leave'.)
(=========================================================================)
(a = [hsh test bitstream]; b = [hsh test bitstream plus 1];)
(show registers;)
(=========================================================================)
(Take the wanted number of bits. In this test, must conform to the mapped
values. With 1024 values, this is 10 bits.)
(-------------------------------------------------------------------------)
(Distribution as per number of relevant bits)
C = HSH SIGNIFICANT BITS;
A = FF FF FF FFh;
? C = 32 -> Hsh Mask Ready;
A = 1; A < HSH SIGNIFICANT BITS;
A -;
"Hsh Mask Ready"
(The algorithm in fact distributes well in /all/ 32 bits. To check the
distribution of the MSB rather than the LSB bits, uncomment the following
line)
(E > 32 minus HSH SIGNIFICANT BITS;)
(Mask the not needed MSBs out)
E & A;
(=========================================================================)
(=========================================================================)
(The LSB of E contain the HSH SIGNIFICANT BITS bits of the hash range.
Count the number of collisions.)
(-------------------------------------------------------------------------)
E + hsh test collisions;
[E] +;
(=========================================================================)
(=========================================================================)
(Make a bitstream from the next word.)
(-------------------------------------------------------------------------)
C = 1; (Symbols to skip: We don't want the LF)
D = hsh test bitstream; (Address of output bitstream)
=> Bst Resume;
(-------------------------------------------------------------------------)
(Create a new Hash for the bitstream, unless there are no more words, i.e.
the input file has ended. The input file ends, when the counter in B is 0.)
B ^ Hsh Make Hash;
(=========================================================================)
(=========================================================================)
(Display the results.)
(-------------------------------------------------------------------------)
(If a perfect hash was created, then 173,528 words would be distributed
equally onto the available hash range, which is defined by the number of
bits in the constant HSH SIGNIFICANT BITS. If HSH SIGNIFICANT BITS = 8,
then the range would be 0...255 and we could expect an average
distribution of 173,528 / 256 = 677.84 words per hash. Henceforth we call
this the expected collisions. The screen height was fixed to 800 pixels.
To show distributions, we scale the expected collisions down to be less
than a quarter of the screen height.)
A = HSH TEST KEYS; (Created hash values)
A / HSH UNIQUE HASHES; (Expected collisions per hash value)
[hsh test exp collisions] = A;
B = 1; (So many height pixels per hash value)
"Hsh Test Reduce"
? A <= HSH TEST SCREEN HEIGHT div 4 -> Hsh Test Reduced;
A > 1; (Half as many collisions)
B < 1; (Double number of entries)
"Hsh Test Reduced"
(B entries map on a single height pixel.)
[hsh test scaled exp collisions] = A;
A = 0; (Hash value = screen column)
"Hsh Draw Hash Column"
(Calculate base for column)
C = A; C + [Display Origin];
(Get the number of collisions for the current hash value A)
D = [A plus hsh test collisions];
(Counting minimum and maximum collisions)
? D >= [hsh test min collisions] -> Hsh Draw No New Min;
[hsh test min collisions] = D;
"Hsh Draw No New Min"
? D <= [hsh test max collisions] -> Hsh Draw No New Max;
[hsh test max collisions] = D;
"Hsh Draw No New Max"
D / B; (Scaled number of collisions)
? D = 0 -> Hst Test Next Hash;
E = 00 FF 00h; (Number of collisions in green)
(If exceeding the screen height, than that is a really frequently used hash
value.)
? D > HSH TEST SCREEN HEIGHT -> Hsh Used Too Frequent;
(If within +- 50% around the expected collisions, use green)
A -->;
A = [hsh test scaled exp collisions];
A > 1; (50%)
? D >= A -> Hsh Used Not Below Min;
E = FF FF FFh; (Number of collisions < 50% in white)
-> Hsh Limits Checked;
"Hsh Used Not Below Min"
A + [hsh test scaled exp collisions]; (150%)
? D <= A -> Hsh Limits Checked;
E = FF FF 00h; (Number of collisions > 150% in yellow)
"Hsh Limits Checked"
<-- A;
-> Hsh Draw Column;
"Hsh Used Too Frequent"
E = FF 00 00h; (Too many collisions in red color)
D = HSH TEST SCREEN HEIGHT; (We can't draw beyond the screen)
"Hsh Draw Column"
(Draw vertical line as per scaled number of collisions.)
[C] = E;
C + HSH TEST SCREEN WIDTH;
D ^ Hsh Draw Column;
(Next hash value = screen column)
"Hst Test Next Hash"
A +;
? A < HSH UNIQUE HASHES -> Hsh Draw Hash Column;
(-------------------------------------------------------------------------)
(Draw a horizontal blue line to tell where the expected collisions
would be. If equally distributed, all values should fluctuate around this
line.)
C = [hsh test scaled exp collisions]; (Expected collisions)
C * HSH TEST SCREEN WIDTH; (vertical screen position)
C + [Display Origin]; (Address of screen position)
(Draw in whole width)
A = HSH UNIQUE HASHES;
"Hst Draw Expectation Line"
[C] = 00 00 FFh; (Blue line)
C +; (next position)
A ^ Hst Draw Expectation Line;
(-------------------------------------------------------------------------)
(Draw a horizontal red line to indicate the values which are used more
than double of what would be expected.)
C = [hsh test scaled exp collisions]; (Expected collisions)
C < 1;
C * HSH TEST SCREEN WIDTH; (vertical screen position)
C + [Display Origin]; (Address of screen position)
(Draw in whole width)
A = HSH UNIQUE HASHES;
"Hst Draw Double Line"
[C] = FF 00 00h; (Red line)
C +; (next position)
A ^ Hst Draw Double Line;
(-------------------------------------------------------------------------)
(Update the screen.)
[Display Command] = RETRACE;
isocall;
(-------------------------------------------------------------------------)
(Allow user to exit the loop by hitting any key.)
"Hsh Test Display"
[Console Command] = GETCONSOLEINPUT;
isocall;
? failed -> Hsh Test Display; (Break on any key)
(=========================================================================)
(=========================================================================)
"Hsh Test Ok"
(Ending without errors)
B = 0;
-> Hsh Test Quit;
(-------------------------------------------------------------------------)
"Hsh Test Error"
(Error code is in B.)
(-> Hsh Test Quit;)
(-------------------------------------------------------------------------)
"Hsh Test Quit"
(C, D and E show the min, expected and max collisions for the individual
hash values.)
C = [hsh test min collisions];
D = [hsh test exp collisions];
E = [hsh test max collisions];
(-------------------------------------------------------------------------)
(Registers:
B: Error code, 0 if no errors occured.
C-E: Content of the first 3 outstream units, i.e. the first 96 bits.
)
show registers;
(=========================================================================)
(*****************************************************************************)
Linoleum Syntax Highlighting produced with LSH (© 2007 by Herbert Glarner)