Bitstream Generator
|
URI: |
http://herbert.gandraxa.com/herbert/bst.asp |
|
Link template: |
<a href="http://herbert.gandraxa.com/herbert/bst.asp">Bitstream Generator</a> |
|
Link symbols: |
|
Bitstream Generator
|
URI: |
http://herbert.gandraxa.com/herbert/bst.asp |
|
Link template: |
<a href="http://herbert.gandraxa.com/herbert/bst.asp">Bitstream Generator</a> |
|
Link symbols: |
|
Home »
Linoleum Source Code
» Bitstream Generator
Source Code Library: The implementation in the cross-platform Assembler Linoleum.
Source Code Test Program: Test program using the library.Download as a WinZip file see below.
This Bitstream Generator converts an arbitrarily long input into a sequential bitstream. This is an implementation in the
Linoleum Programming Language, a cross-platform Assembler.
Bitstreams are readable from left to right (MSB to LSB) in their entirety: they do not respect any bytes or words boundaries. This makes them ideal to be processed as keys for certain data structures like Patricia Tries or Suffix Trees. A usage example can be found in my
dictionary implementation based on a radix tree.
Every input format is accepted, as long as the symbols are of a uniform bit size in the range of 1 to 32 bits:
The input's terminator is freely chosable:
Can resume reading input streams at an offset from the last terminator:
Different platform origins are processed:
Streams out a bitstream in any desired symbol width smaller than or equal to the input's symbol width:
The following ZIP file contains both the library's source (BST.txt) as well as a test program (BST Test.txt).
On a Pentium 4 with 1,328 MHz, AMD, running Windows 2000 Professional, this implementation provided the following times for large-volume data:
| PSW [1] | LSW [2] | Copies [3] | Counts | Ct/msec | msec | ms/Copy | Streamed bits | Bits/second [4] |
| Argument | Argument | A | B | C | D = B / C | E = D / A | F (Result) | G=(F/A)/E*A |
| 8 | 7 | 100 | 2792134 | 5194 | 537.569118 | 5.37569118 | 145894000 | 271'395'798 |
| 8 | 7 | 100 | 2533470 | 5197 | 487.487012 | 4.87487012 | 145894000 | 299'277'717 |
| 8 | 7 | 100 | 2589747 | 5178 | 500.144264 | 5.00144264 | 145894000 | 291'703'835 |
| 8 | 7 | 1000 | 26414427 | 5177 | 5102.26521 | 5.10226521 | 1458940000 | 285'939'664 |
| 8 | 7 | 1000 | 26852588 | 5177 | 5186.90129 | 5.18690129 | 1458940000 | 281'273'908 |
| 8 | 7 | 1000 | 26813424 | 5177 | 5179.3363 | 5.1793363 | 1458940000 | 281'684'740 |
| 8 | 8 | 100 | 2620725 | 5316 | 492.988149 | 4.92988149 | 166736000 | 338'215'027 |
| 8 | 8 | 100 | 2559783 | 5177 | 494.452965 | 4.94452965 | 166736000 | 337'213'065 |
| 8 | 8 | 100 | 2515569 | 5177 | 485.912498 | 4.85912498 | 166736000 | 343'139'970 |
| 8 | 8 | 1000 | 25697447 | 5178 | 4962.81325 | 4.96281325 | 1667360000 | 335'970'732 |
| 8 | 8 | 1000 | 26071016 | 5178 | 5034.95867 | 5.03495867 | 1667360000 | 331'156'641 |
| 8 | 8 | 1000 | 25365152 | 5194 | 4883.54871 | 4.88354871 | 1667360000 | 341'423'850 |
Note: Since the tests were performed on an Intel compatible AMD, a Little Endian file system was used. This means, that the measured times include byte swapping. On a Big Endian system, performance would be slightly better, because Little Endians require 10 additional instructions to be executed for every 32 processed bits.
[1] Physical Symbol Width: The width of a source symbol ("character"). The test file stores characters in bytes = 8 bits.
[2] Logical Symbol Width: How the symbols are interpreted within The PSW. The test file uses pure ASCII code = 7 bits. When actually using 7 bits as out-bitstream, the symbols are not easily readable anymore by byte-oriented systems: the bitstream is just a concatenation of 7-bit symbols, crossing byte- and unit-boundaries whenever required, filling up the last partial unit (if any) with trailing zero bits. When streaming out 8 bits/symbol, this is not an issue anymore.
[3] A single copy of the test file was 208,420 bytes long. 100 copies thus process 20,842,000 input bytes; 1,000 copies 208,420,000 input bytes.
[4] The significantly faster results for LSW 8 in comparison to LSW 7 are due to the lesser extent of involved steps: LSW 7 crosses bytes and word boundaries, requiring more processing time, whereas LSW 8 neatly lines up within bytes and words.
The library was thoroughly tested with
Berkeley Word List containing 173,528 words in English language (each word CR/LF-separated), processing a bitstream for each individual word between CR/LF pairs.No errors occured during those tests. No data corruption was manifest.