Bitstream Generator


URI:

http://herbert.gandraxa.com/herbert/bst.asp

Link template:   

<a href="http://herbert.gandraxa.com/herbert/bst.asp">Bitstream Generator</a>


Link symbols:   

Local LinkOn current page | DocumentOn this site | External PageOn external site | WikipediaWikipedia article | Compressed ArchiveZIP archive | PDF documentPDF | E-MailE-Mail


Article

Organization

DocumentHome » DocumentLinoleum Source Code » Bitstream Generator

Download as a WinZip file see below.

Scope

This Bitstream Generator converts an arbitrarily long input into a sequential bitstream. This is an implementation in the WikipediaLinoleum Programming Language, a cross-platform Assembler.

Bitstreams are readable from left to right (MSB to LSB) in their entirety: they do not respect any bytes or words boundaries. This makes them ideal to be processed as keys for certain data structures like Patricia Tries or Suffix Trees. A usage example can be found in my Documentdictionary implementation based on a radix tree.

Author

DocumentHerbert Glarner


Outline

Every input format is accepted, as long as the symbols are of a uniform bit size in the range of 1 to 32 bits:

The input's terminator is freely chosable:

Can resume reading input streams at an offset from the last terminator:

Different platform origins are processed:

Streams out a bitstream in any desired symbol width smaller than or equal to the input's symbol width:


Download

The following ZIP file contains both the library's source (BST.txt) as well as a test program (BST Test.txt).

Zipped ArchiveBST.zip


Speed

On a Pentium 4 with 1,328 MHz, AMD, running Windows 2000 Professional, this implementation provided the following times for large-volume data:

PSW [1] LSW [2] Copies [3] Counts Ct/msec msec ms/Copy Streamed bits Bits/second [4]
Argument Argument A B C D = B / C E = D / A F (Result) G=(F/A)/E*A
8 7 100 2792134 5194 537.569118 5.37569118 145894000 271'395'798
8 7 100 2533470 5197 487.487012 4.87487012 145894000 299'277'717
8 7 100 2589747 5178 500.144264 5.00144264 145894000 291'703'835
8 7 1000 26414427 5177 5102.26521 5.10226521 1458940000 285'939'664
8 7 1000 26852588 5177 5186.90129 5.18690129 1458940000 281'273'908
8 7 1000 26813424 5177 5179.3363 5.1793363 1458940000 281'684'740
8 8 100 2620725 5316 492.988149 4.92988149 166736000 338'215'027
8 8 100 2559783 5177 494.452965 4.94452965 166736000 337'213'065
8 8 100 2515569 5177 485.912498 4.85912498 166736000 343'139'970
8 8 1000 25697447 5178 4962.81325 4.96281325 1667360000 335'970'732
8 8 1000 26071016 5178 5034.95867 5.03495867 1667360000 331'156'641
8 8 1000 25365152 5194 4883.54871 4.88354871 1667360000 341'423'850

Note: Since the tests were performed on an Intel compatible AMD, a Little Endian file system was used. This means, that the measured times include byte swapping. On a Big Endian system, performance would be slightly better, because Little Endians require 10 additional instructions to be executed for every 32 processed bits.

[1] Physical Symbol Width: The width of a source symbol ("character"). The test file stores characters in bytes = 8 bits.

[2] Logical Symbol Width: How the symbols are interpreted within The PSW. The test file uses pure ASCII code = 7 bits. When actually using 7 bits as out-bitstream, the symbols are not easily readable anymore by byte-oriented systems: the bitstream is just a concatenation of 7-bit symbols, crossing byte- and unit-boundaries whenever required, filling up the last partial unit (if any) with trailing zero bits. When streaming out 8 bits/symbol, this is not an issue anymore.

[3] A single copy of the test file was 208,420 bytes long. 100 copies thus process 20,842,000 input bytes; 1,000 copies 208,420,000 input bytes.

[4] The significantly faster results for LSW 8 in comparison to LSW 7 are due to the lesser extent of involved steps: LSW 7 crosses bytes and word boundaries, requiring more processing time, whereas LSW 8 neatly lines up within bytes and words.


Test

The library was thoroughly tested with

No errors occured during those tests. No data corruption was manifest.