Bitstream Generator Source Code Library


URI:

http://herbert.gandraxa.com/herbert/bst_lib.asp

Link template:   

<a href="http://herbert.gandraxa.com/herbert/bst_lib.asp">Bitstream Generator Source Code Library</a>


Link symbols:   

Local LinkOn current page | DocumentOn this site | External PageOn external site | WikipediaWikipedia article | Compressed ArchiveZIP archive | PDF documentPDF | E-MailE-Mail


Article

Organization

DocumentHome » DocumentLinoleum Source Code » DocumentBitstream Generator » Source Code Library

Scope

This is the source code of the Bitstream Generator library, implemented in the WikipediaLinoleum programming language.

Author

DocumentHerbert Glarner


Source Code

(Bit Streams "BST.txt" 1.1 - Copyright 2006 Herbert Glarner)

(*****************************************************************************)

    (=========================================================================)
    (   Copyright Herbert Glarner [2006]
        Mail: herbert.glarner@bluewin.ch

        Free usage granted for all purposes, including commercial, provided
        that above credits are given.

        Suggestions for performance improvement welcome.
        No liability for any damage usage may cause.   )
    (=========================================================================)


    (=========================================================================)
    (Converts input data into a bitstream. With respect to a bitstream, input
    data is characterized by one or several of the following problems:

    1.  The symbols come not in expected order, due to being packed into 32 bit
        words such, that the entirety of the packed symbols expose a right-to-
        left order, whereas each symbol in itself still maintains a left-to-
        right order.

    2.  The individual input symbols {letters, numbers, ...} are of an
        unsuitable size, mainly by exposing an unwanted number of prefixed
        Zero bits;

    3.  Symbols do not neccessarily need to respect 32-bit unit alignments nor
        8-bit byte alignments. For example do 24-bits encodings respect byte
        alignments, but not unit alignments. A 7 bit encoding respects neither
        of the two.

    4.  We are faced with different conventions of reading files into the RAM,
        i.e. we must distinguish between 'little endian' and 'big endian'.)
    (-------------------------------------------------------------------------)
    (INPUT DATA PROBLEMS:

    Files on Harddisk essentially are bitstreams: Seen from the file, we don't
    have any of above problems, with the exception of problem #2.

    But as soon as we read such a file into units, all of aforementioned
    problems may become manifest.

    This is an example of a very trivial ANSI file being read into a Linoleum
    Workarea starting at n. The file's ANSI characters require 8 bits each.
    They are packed into consecutive 32 bit units, filling them from right-to-
    left. Suppose, there exists an ANSI file with just the following content
    {where '\0' marks a NUL byte, i.e. eight consecutive zero bits}:

        'Yvonne\0'


    PROBLEM 1.

    Linoleum will read the bits into units as follows:

          B3 B2 B1 B0 = Bytes in Unit
        +-------------+                     Within each unit, the symbols are
        |  n  o  v  Y | Unit n              not suitably ordered to be used as
        +-------------+                     a consective bitstream: they are
        | \0 \0  e  n | Unit n+1            right-to-left. But the bits in the
        +-------------+                     symbols are left-to-right.

    Let's have a closer look at the first unit, this time at the bit level.

        Byte 3      Byte 2      Byte 1      Byte 0
        'n'         'o'         'v'         'Y'
        +-----------+-----------+-----------+-----------+
        | 0110 1110 | 0110 1111 | 0111 0110 | 0101 1001 | Unit n
        +-----------+-----------+-----------+-----------+
          ^           ^           ^           ^
          |           |           |           |
          +-----------+-----------+-----------+
          |
        The MSB is to the byte's left, i.e. each symbol's bitstream runs from
        left-to-right.


    PROBLEM 2.

    If we were to build a compact bitstream, and knew, that the ANSI file did
    not really contain ANSI characters but pure 7-bit ASCII characters, then we
    would waste 1 bit per character {the MSB}. This seems not be that severe,
    but in fact it still is 12.5% and sums up as the symbols become many,
    especially if those ASCII symbols were to be used as bitstream-keys of
    specialized data stuctures like 'Patricia Tries' or 'Suffix Tries'.

        Byte 3      Byte 2      Byte 1      Byte 0
        'n'         'o'         'v'         'Y'
        +-----------+-----------+-----------+-----------+
        | 0110 1110 | 0110 1111 | 0111 0110 | 0101 1001 | Unit n
        +-----------+-----------+-----------+-----------+
          ^           ^           ^           ^
          |           |           |           |
          +-----------+-----------+-----------+
          |
        The MSB is always 0 in 7-bit ASCII.

    The problem is exposed to a far larger extent, when we look at the way on
    how we save strings in Linoleum: Now we in fact use 7-bit ASCII characters
    and store each character into an individual 32-bit ! unit.

        Byte 3      Byte 2      Byte 1      Byte 0
        +-----------+-----------+-----------+-----------+
    'Y' | 0000 0000 | 0000 0000 | 0000 0000 | 0101 1001 | Unit n
        +-----------+-----------+-----------+-----------+
    'v' | 0000 0000 | 0000 0000 | 0000 0000 | 0111 0110 | Unit n+1
        +-----------+-----------+-----------+-----------+
    'o' | 0000 0000 | 0000 0000 | 0000 0000 | 0110 1111 | Unit n+2
        +-----------+-----------+-----------+-----------+
    'n' | 0000 0000 | 0000 0000 | 0000 0000 | 0110 1110 | Unit n+3
        +-----------+-----------+-----------+-----------+
          ^^^^ ^^^^   ^^^^ ^^^^   ^^^^ ^^^^   ^
          |||| ||||   |||| ||||   |||| ||||   |
          ++++-++++---++++-++++---++++-++++---+
          |
        Here we  waste 25 of 32 bits, or more than 78%.


    The two aforementioned problems 1 and 2 can be immanent in a variety of
    mixed cases. One would be the storage of each symbol within 16 bits, as is
    often the case for several Microsoft-produced files, which use 16-bit-
    'Unicode' {quoted, because real Unicode can not be represented with just
    16 bits, there's only room for a subset of Unicode}. Again, we have the
    right-to-left problem, and furthermore, if the content of such a file was
    known to be ANSI-coded in reality, this would be a waste of 50% with
    respect to a bitstream.


    PROBLEM 3.

    Another problem is the fact, that the input by no means is restricted to
    symbols being 8, 16 or 32 bits wide. In fact, it is very well thinkable to
    have a file which was output in 7-bit ASCII, mainly for the reason of
    saving on file size. Such a file usually would not be readable with
    standard commercial programs, but applications emitting them very well can
    and do process their own output. This library is able to process them as
    well.

    Let's assume our mini-file from above being in 7-bit ASCII now, not in
    8-bit ANSI anymore {this implies, that the final \0 represents a 7-bit NUL}.

        'Yvonne\0'

    The 7-bit symbol bitstream on file, from left to right, would be:

        1011001 1110110 1101111 1101110 1101110 1100101 0000000
        Y       v       o       n       n       e       \0

    The bits would be stored in a file as 8-bit bytes as follows - still a
    perfect bitstream:

        10110011 11011011 01111110 11101101 11011001 01000000 00000000
        Y      v       o       n       n       e       \0      Filler

    But when read into Linoleum units, the content would be exposed pretty
    cryptic:

    +-----<-------+   +---------<---------+
    |             |   |                   |
    |     +-------|------<----+   +-------|-----<-----+
    |     |       |   |       |   |       |           |
    |     +----n--+   +-----n-+   +------o+   Y-------v
    |   +-----------+-----------+-----------+-----------+
    |   | 1110 1101 | 0111 1110 | 1101 1011 | 1011 0011 | Unit n
    |   +-----------+-----------+-----------+-----------+
    |
    +-------------------->--------------------+
                                              |
                      +---------<---------+   |
                      |                   |   |
          +-------<-----------+   +-------|-------<---+
          |           |       |   |       |   |       |
          +-------End +Filler-+   +-\0----+   +--e----+
        +-----------+-----------+-----------+-----------+
        | 0000 0000 | 0000 0000 | 0100 0000 | 1101 1001 | Unit n+1
        +-----------+-----------+-----------+-----------+

    It is obvious, that this is quite cumbersome to deal with. Not only do we
    have a left-to-right bitstream in symbols and right-to-left bytes within
    units, but also shifting symbol starts on byte level and symbols being
    split between bytes and even units {the second 'n', for example}.

    Is this merely academical? Well, 7-bit symbols are not streamed out often
    these days with abundant external storage possibilities. However, 'odd'
    bitstreams do very well exist. A more common example includes 24 bits
    {3 byte coded bitmaps, one byte each for the color components Red, Green,
    and Blue.} It also is common to output mass data gathered by realtime
    systems as 4-bit BCD {Binary Coded Digits}. In any case, whether academical
    or not, this library handles all the possibilities for completeness reasons,
    including such awkward 7-bit encodings.


    PROBLEM 4.

    Different systems use different conventions when it comes to the order
    in which the individual bytes of a multi-byte-symbol are represented in
    RAM. There are systems which read the lowest byte before the higher bytes,
    and other system which treat this vice-versa. These different approaches
    are known as 'Endianness'. Note, that the Endianness relates to te
    representation in RAM, not to the representation on the file!

    The 32-bit symbols 'Y', 'v' in:

    Big Endian:

        In Linoleum Units:
        B3   B2   B1   B0
        +----+----+----+----+
        | \0 | \0 | \0 | Y  | Unit n
        +----+----+----+----+
        | \0 | \0 | \0 | v  | Unit n+1
        +----+----+----+----+
        This system is also used when encoding normal strings in Linoleum,
        as with 'String Variable = { Yvonne };'.

    Little Endian:

        In Linoleum Units:
        B3   B2   B1   B0
        +----+----+----+----+
        | Y  | \0 | \0 | \0 | Unit n
        +----+----+----+----+
        | v  | \0 | \0 | \0 | Unit n+1
        +----+----+----+----+
        This is how a file's bitstream '\0\0\0Y' would be read into a Linoleum
        unit on a Little Endian system like Intel or AMD.

    Assuming, that Linoleum is mostly used on Intel PCs and compatibles like
    AMD, Little Endian is the default.

    If a Big Endian file needs to be processed, or if the keys originate from
    pure Linoleum strings, then after Initialization with 'Bst Init' a special
    flag needs to be set:
        [bst status] | BST BIG ENDIAN
    The library's routines 'Bst Stream' and 'Bst Resume' then will care for
    the correct representation.)
    (-------------------------------------------------------------------------)
    (OUTPUT FORMAT:

    This library reconstructs the original left-to-right bitstream. Whether
    the input is in symbols of 32 or of 7 bits width, whether the machine uses
    little or big endian, it will convert the symbols in a user-defined output
    width and present the whole 'string' as a contiguous bitstream which can be
    read from left to right.

    Would the user choose an output width of 8 bit, then the resulting bit-
    stream in /any/ case would contain:

        Byte 3      Byte 2      Byte 1      Byte 0
        +-----------+-----------+-----------+-----------+
        | 0101 1001 | 0111 0110 | 0110 1111 | 0110 1110 | Unit n
        +-----------+-----------+-----------+-----------+
          Y           v           o           n

        Byte 3      Byte 2      Byte 1      Byte 0
        +-----------+-----------+-----------+-----------+
        | 0110 1110 | 0110 0101 | 0000 0000 | 0000 0000 | Unit n+1
        +-----------+-----------+-----------+-----------+
          n           e           \0          Filler

    Would the user choose a 7 bit width:

        Byte 3      Byte 2      Byte 1      Byte 0
        +-----------+-----------+-----------+-----------+
        | 1011 0011 | 1101 1011 | 0111 1110 | 1110 1101 | Unit n
        +-----------+-----------+-----------+-----------+
          Y       v          o          n          n

        Byte 3      Byte 2      Byte 1      Byte 0
        +-----------+-----------+-----------+-----------+
        | 1101 1001 | 0100 0000 | 0000 0000 | 0000 0000 | Unit n+1
        +-----------+-----------+-----------+-----------+
             e          \0         Filler

    Note, that it would have been possible to make the whole bitstream right-
    to-left, i.e. beginning at bit 0 of each unit. Since in the end the user
    just needs to /know/ how the bits within a unit were streamed {here: left
    to right, i.e. a unit's MSB is the first bit of the bitstream} the direc-
    tion seems to be a secondary decision only. However, this is not the case.
    There is quite a big disadvantage by creating a bitstream running from
    right to left: we would need to revert each individual symbol's bit posi-
    tions to maintain lexicographical integrity. Maintaining this integrity is
    crucial, if bitstreams are to become keys of data structure implementation
    relying on such. The fastest method I was able to think of to revert a
    unit's bits still needs 24 Linoleum instructions, and that per symbol.
    Making the stream run from left-to-right, though, there simply is no need
    for such.)
    (-------------------------------------------------------------------------)
    (What is this all good for?

    'Endless' bitstreams have their usage in many fields. Compression, pattern
    recognition, image and sound processing, cryptography are among them, just
    to name a few.

    The actual reason for this library is some research about efficiently
    storing alphanumeric information.)
    (-------------------------------------------------------------------------)
    (TERMINOLOGY:

    Successfull processing requires knowledge about the Data formats. Within
    this source, the following terms are used.


    For the INPUT:

    The 'PSW' {Physical Symbol Width}: It answers the question, how many bits
    are needed to represent an individual symbol. ANSI files usually have a
    PSW of 8. An example of 7 for an ASCII file was given above, though also
    pure ASCII file usually are encoded with a PSW of 8. Some manufacturers
    prefer symbol encoding in 2 bytes: use 16 for those. If you are processing
    32-bit Unicode, use 32.

    'Endianness': Assuming that Linoleum is mostly processed on PCs, the
    default is 'Little Endian'. However, the library can process 'Big Endian':
    to enforce such, set the flag BST BIG ENDIAN. Also set this flag if the
    input symbols are coded in pure Linoleum strings with the {...} delimiters.

    'Source': Address telling where the first input symbol is, left aligned in
    the symbol width given in PSW.

    'EOS' {End of Symbols}: This tells the library the symbol which, when
    read in, terminates the conversion process. This in most cases is NUL,
    i.e. a series of binary zeros, in the same width PSW as an ordinary symbol.
    However, you are free to code whatever symbol here: sometimes a comma or
    space suits the input better. Also frequently used are horizontal tab,
    carriage return, line feed and more. Whatever is used: keep in mind, that
    it in the end is an ordinary symbol, thus PSW bits wide.


    For the OUTPUT:

    The 'LSW' {Logical Symbol Width}: By no means are we bound to blindly
    accept the input data's symbol width: we can change it to a smaller width,
    thus enforcing a compression without any insignificant leading bits. This,
    however, depends largely on what shall be done with the resulting bit-
    stream. Should the bitstreams be input into your own string-processing
    routine, such as a key into an implementation of a trie, then I suggest
    that one uses the effective symbol size without wasting bits. If we know,
    for example, that we in fact are processing 7 bit ASCII, it is recommmended
    to set the LSW to 7, though: this will save a nice 12.5% on the resulting
    bitstream. If, on the other hand, the data is to be output later on into a
    file, then it in most cases would be unefficient to process them first as
    7 bits and later on convert them once again into 8 bits: in this case
    better use the expected standard widths of 8, 16 or 32 bits. As was said:
    it depends on the processing task.

    'Target': Address telling where the library may write the bitstream into.
    It is your responsibility to provide enough space for all symbols, the
    library just streams out bits into the space as long as the EOS symbol was
    read.
    NOTE: The library does /not/ write any EOS string after the last streamed
    out symbol. Insted, the /number of bits/ starting with the MSB of the
    specified target address is counted into [bst bits]. The effectively used
    number of /units/ then can be calculated with '{[bst bits] + 31} > 5'.
    It is guaranteed, that the LSB unused bits of the last used unit are 0.)
    (=========================================================================)


    (=========================================================================)
    (Routines summary:
        Name of routine                 Short description)
    (-------------------------------------------------------------------------)
    (   PUBLIC ROUTINES:
            Bst Init                    Telling about PSW, LSW and EOS
            Bst Stream                  Converting units into a bitstream
            Bst Resume                  Resuming after the last EOS symbol

        INTERNAL ROUTINES:
            Bst Reverse Bytes           Converts a unit's 'DCBA' into 'ABCD'

        Although 'Bst Reverse Bytes' is declared to be an internal routine, no
        harm is done if called from the outside world for other purposes: it
        just does its job in a register and leaves the other registers alone.
    )
    (=========================================================================)


    (=========================================================================)
    (Memory requirements:
        13 variables = 52 bytes. When streaming, an area large enough to hold
        the streamed out bits needs to be provided. If that area is too small,
        anything beyond it will be overwritten, until an EOS is found: this
        may crash your application.
    )
    (=========================================================================)


    (=========================================================================)
    (Version history:
        07 Jul 2006: 1.0 Released.
        11 Jul 2006: 1.1 Released. Outstream area is nulled now before being
                     used. New function 'Bst Resume" resumes streaming after
                     the last EOS symbol.
    )
    (=========================================================================)


(*****************************************************************************)
"libraries"


(*****************************************************************************)
"directors"
        program name = { BST };
        unit = 32;

(*****************************************************************************)
"constants"                             (Convention: "BST ...", UPPER CASE)

    (=========================================================================)
    (Error codes)
    (-------------------------------------------------------------------------)
        BST ERR NONE = 0;               (No error)
    (-------------------------------------------------------------------------)
    (From 'Bst Init')
        BST ERR INVALID PSW = 1;        (Must be 1...32)
        BST ERR INVALID LSW = 2;        (Must be 1...PSW)
    (=========================================================================)

    (=========================================================================)
    (Status flags)
    (-------------------------------------------------------------------------)
    (The variable [bst status] will hold some flags about the current status
    of the library.)
        BST STATUS INIT  = 0000 0001h;  (set when 'Bst Init' was successful)
        BST BIG ENDIAN   = 0000 0002h;  (default = clear = little endian)

        (Theoretically, the other bits within the [bst status] variable are
        free to be used. However, usage is discouraged.)
    (=========================================================================)

(*****************************************************************************)
"variables"                             (Convention: "bst ...", lower case)

    (=========================================================================)
    (Error condition as per BST ERR XXX constants.)
    (-------------------------------------------------------------------------)
        bst error = 0;
    (=========================================================================)

    (=========================================================================)
    (Some flags about the current status of the library. The bits are given in
    the BST STATUS XXX constants.)
    (-------------------------------------------------------------------------)
        bst status = 0;
    (=========================================================================)

    (=========================================================================)
    (Saving the input and output formats as provided to 'Bst Init'.)
    (-------------------------------------------------------------------------)
        bst psw = 0;                    (Input: Physical Symbol Width)
        bst lsw = 0;                    (Input/Output: Logical Symbol Width)
        bst eos = 0;                    (Input: End-of-symbols symbol)
    (=========================================================================)

    (=========================================================================)
    (The following takes a shiftable bitmask, masking PSW bits. At the start
    of a bitstream, this is left-aligned, i.e. the MSB is set.)
    (-------------------------------------------------------------------------)
        bst pswm = 0;
    (-------------------------------------------------------------------------)
    (Similarly, there exists a mask for the LSW bits, only that it is always
    right aligned.)
    (-------------------------------------------------------------------------)
        bst lswm = 0;
    (=========================================================================)

    (=========================================================================)
    (Depending on the little or big endian scheme, read units might need to be
    reversed. In any case does the following variable hold the actual unit such,
    that its bits can be read from left to right.)
    (-------------------------------------------------------------------------)
        bst unit stream = 0;
    (=========================================================================)

    (=========================================================================)
    (The lowest bit of the next symbol to be streamed out is held in this
    variable. 'OOS' means 'Offset Out-Stream'. It starts with 32 for a new
    outstream and is reduced by LSW prior to streaming a symbol out. After this
    it may contain any value 0...31, depending on the LSW.)
    (-------------------------------------------------------------------------)
        bst oos = 0;
    (=========================================================================)

    (=========================================================================)
    (The following variable counts the streamed out bits. Recall, that we
    /always/ stream from left to right. This means, that the value of this
    variable mod 32 is so many bits on the MSB side of the last unit, when 1
    or larger. If mod 32 is 0, then thelast unit was completely consumed.)
    (-------------------------------------------------------------------------)
        bst bits = 0;
    (=========================================================================)

    (=========================================================================)
    (In order to allow 'Bst Resume' to resume with the symbol following the
    last found EOS symbol after a 'Bst Stream' or 'Bst Resume' call, the unit
    in which it was found needs to be stored, and also the bit position of the
    LSB of the EOS symbol.)
    (-------------------------------------------------------------------------)
        bst last eos unit = 0;
        bst last eos bit = 0;
    (-------------------------------------------------------------------------)
    ('Bst Resume' also allows to skip the first few symbols. This is of
    interest when processing lists which are separated by two symbols, e.g.
    CRLF, for which the first symbol, CR, has been declared to be the EOS
    symbol: the second symbol LF then can be skipped rather than being inter-
    preted as the first new symbol of the next stream. The number of symbols
    to be skipped are passed to 'Bst Resume' via a register. It is saved into
    this variable and decrements in here, as long as larger than 0.)
    (-------------------------------------------------------------------------)
        bst skip symbols = 0;
    (=========================================================================)

(*****************************************************************************)
"workspace"                             (Convention: "bst ...", lower case)

    (None)

(*****************************************************************************)
"programme"                             (Convention: "Bst ...", Mixed case)

    (=========================================================================)
    (This is a library only, it can not be run stand-alone. See "BST Test.txt"
    for a test setup.)
    (-------------------------------------------------------------------------)
        end;
    (=========================================================================)

(*****************************************************************************)
"Bst Init"

    (=========================================================================)
    (Initializes a bitstream environment by providing the neccessary input and
    output format settings. This is used just once for a given input/output-
    system. After this, use 'Bst Stream' for the first or only bitstream to be
    created, then consecutive calls to 'Bst Stream' or 'Bst Resume' for new
    inputs.)
    (-------------------------------------------------------------------------)
    (Input:     A:  Physical Symbol Width PSW of Input Data, in bits
                B:  Logical Symbol Width LSW of Input and Output data, in bits
                C:  End-of-symbols symbol, usually 0, but often also CR etc.
    Output:     -

    NOTE: 1 <= A <= B.

    Example: For 7-bit ASCII symbols coded in whole 32-bit units, pass:
             A = 32; B = 7;     if you want a bitstream as compact as possible
             A = 32; B = 8;     if the bitstream shall consists of easily
                                readable bytes

             When reading ANSI or ASCII files, pass:
             A = 8;  B = 8;     if the file is ANSI or ASCII coded and you
                                want the bitstream consist of bytes
             A = 8;  B = 7;     if you know the file is pure ASCII and you want
                                the bitstream as compact as possible
             A = 7;  B = 7;     if the file was streamed out in 7-bit ASCII,
                                which is quite rare

    All registers unchanged.
    Can cause errors. If fails, the error code is in [bst error].
    )
    (-------------------------------------------------------------------------)
        C -->;
    (-------------------------------------------------------------------------)
        [bst psw] = A;
        [bst lsw] = B;
        [bst eos] = C;
    (-------------------------------------------------------------------------)
    (The PSW must be at least 1 bit and may not exceed 32 bits. Although, 1 bit
    most likely does not make much sense as 1 of the 2 symbols must be declared
    as the EOS symbol, leaving just the other as the only symbol. Anyhow, there
    might be an application in which this makes sense, so this library does not
    restrict it artificially just because I don't see it.)
        ? A <= 0 -> Bst Init Psw Invalid;
        ? A <= 32 -> Bst Init Psw Ok;
    "Bst Init Psw Invalid"
        [bst error] = BST ERR INVALID PSW;
        -> Bst Init Exit Failed;
    "Bst Init Psw Ok"
    (-------------------------------------------------------------------------)
    (The LSW must be at least 1 bit and may not be larger than the PSW.)
        ? B <= 0 -> Bst Init Lsw Invalid;
        ? B <= A -> Bst Init Lsw Ok;
    "Bst Init Lsw Invalid"
        [bst error] = BST ERR INVALID LSW;
        -> Bst Init Exit Failed;
    "Bst Init Lsw Ok"
    (-------------------------------------------------------------------------)
    (To extract a symbol from an input stream, we need a bitmask with so many
    set bits as the PSW is wide. We need this bitmask /left/-aligned, i.e., the
    masks MSBs are set, the LSBs cleared.)

    (Note, that C < 32 would not work, as per Intel manual, only the 5 LSBs are
    masked, and thus C < 32 equals C < 0. Because of this we need to test for a
    PSW of 32 and treat it separately.)
        C = 0;
        ? A = 32 -> Bst Init Psw is 32 bit;
                                (7bit       8bit        9bit         32bit    )
        C = 1;                  (       1b          1b           1b           )
        C < A;                  (10000000b  100000000b  1000000000b           )
    "Bst Init Psw is 32 bit"
        C -;                    (01111111b  011111111b  0111111111b  FFFFFFFFh)
    (Now we have a bitmask, but we need it left-aligned, i.e. the MSBs must be
    set. Thus we shift left 32-PSW bits. To the same effect we can rotate PSW
    bits to the right, though. In case of 32 bits, this rotates right 0 bits,
    but in this case that provides the same result, i.e. no changes at all.)
        C @> A;                 (FE000000h  FF000000h   FF800000h    FFFFFFFFh)
        [bst pswm] = C;
    (-------------------------------------------------------------------------)
    (To mask out a symbol, a right-aligned symbol mask of LSW bits is created.
    It can be 32 bit as well, so we need to test for 32 bits individually once
    again.)
        C = 0;
        ? B = 32 -> Bst Init Lsw is 32 bit;
                                (7bit       8bit        9bit         32bit    )
        C = 1;                  (       1b          1b           1b           )
        C < B;                  (10000000b  100000000b  1000000000b           )
    "Bst Init Lsw is 32 bit"
        C -;                    (01111111b  011111111b  0111111111b  FFFFFFFFh)
        [bst lswm] = C;         (always right-aligned)
    (-------------------------------------------------------------------------)
    (Everything was fine, streaming can begin now.)
        [bst status] | BST STATUS INIT;
    (-------------------------------------------------------------------------)
    "Bst Init Exit OK"
        <-- C;
        [bst error] = BST ERR NONE;
        end;
    "Bst Init Exit Failed"
        <-- C;
        fail;
    (=========================================================================)

(*****************************************************************************)
"Bst Stream"

    (=========================================================================)
    (Converts PSW wide symbols in units at E and following into a bitstream of
    LSW wide symbols at unit D and following. Note, that the resulting bit-
    stream /always/ is readable from left to right, i.e. MSB to LSB, over all
    target units.)
    (-------------------------------------------------------------------------)
    (Input:     E:  'Source', Address of input data
                    NOTE: The content /must/ be terminated by a PSW bits wide
                    EOS symbol. The routine continues to stream until such a
                    symbol was found.

                D:  'Target', Address of output bitstream
                    **********************************************************
                    Version 1.1:
                    NOTE: The output area's units are NULLED before the bits
                    are streamed out. However, only as many units are nulled
                    as required for the bitstream: all units after the last
                    used one are left as they were, i.e. may contain garbage.
                    Check [bst bits] for the number of streamed bits.
                    {[bst bits] + 31} > 5 provides the number of used 32 bit
                    units.
                    **********************************************************
                    NOTE: If the area is not large enough, the output still
                    will continue to create the bitstream and overwrite what-
                    ever follows your too small area, most likely resulting in
                    a crash.

    Output:     [bst bits]: Number of streamed out bits, beginning at [D] in
                            its most significant bit.

    All registers unchanged.
    Can not cause errors. Exits with 'leave': Do not query 'ok' or 'failed'!
    )
    (-------------------------------------------------------------------------)
        A -->; B -->; C -->; D -->; E -->;
    (-------------------------------------------------------------------------)
    (The offset of the out-stream is initialized with 32. This will be
    reduced by LSW prior to each bitstream being streamed out into an address
    at [D] and following.)
        [bst oos] = 32;                 (No symbol ready yet to stream out)
        [bst bits] = 0;                 (No bits streamed out yet)

    (*************************************************************************)
    (Version 1.1 clears the output area now and does not expect a previously
    cleaned area. The first bitstream unit is nulled. As soon as we need
    another units, it will be nulled as well. Don't expect the units beyond the
    last used one to be nulled: it may contain any value.)
        [D] = 0;                        (Clear the first bitstream unit)
    (*************************************************************************)

    (-------------------------------------------------------------------------)
    "Bst Stream New Unit"
        B = 32;                         (B will have lowest bit of symbol)
        A = [bst pswm];                 (Left-aligned PSW bitmask)
        C = [E];                        (Get the actual unit)
    (If the unit is little endian, such as Intel, then we need to revert the
    bytes within the unit so, that 'DCBA' becomes 'ABCD'. Little endian is the
    default: If reading Big Endian files, you need to set an according flag
    immediately /after/ the call to 'Bst Init', prior any call to 'Bst Stream'.
    This is true also when processing Linoleum string variables. The needed
    instruction is: '[bst status] | BST BIG ENDIAN'.)
        ? [bst status] + BST BIG ENDIAN -> Bst Stream New Unit Big Endian;
        => Bst Reverse Bytes;           (In C, bytes 'DCBA' reversed to 'ABCD')
    "Bst Stream New Unit Big Endian"
        [bst unit stream] = C;          (Units readable left to right)
    (-------------------------------------------------------------------------)
    "Bst Stream New Symbol"
        C = [bst unit stream];          (The complete unit stream)
        B - [bst psw];                  (Lowest bit of symbol in unit stream)
        C > B;                          (Right-align the symbol)
    (-------------------------------------------------------------------------)
    "Bst Stream Extract Symbol"
        C & [bst lswm];                 (Eliminate garbage in upper bits)
    (If the extracted symbol is the EOS symbol, then we are done and leave the
    stream-in loop.)
        ? C = [bst eos] -> Bst Stream Eos Read;
    (-------------------------------------------------------------------------)

    (*************************************************************************)
    (Version 1.1: The new routine 'Bst Resume' allows to indicate a number of
    prefix symbols which need not be streamed out, but skipped altogether. In
    'Bst Resume', the number of symbols was stored into [bst skip symbols]. As
    long as it is not 0, the symbol is not streamed out.)
        ? [bst skip symbols] = 0 -> Bst Stream Out;
    (The actual symbol is /not/ streamed out.)
        [bst skip symbols] -;
        -> Bst Stream Resume;
    "Bst Stream Out"
    (*************************************************************************)

    (In C's LSW least significant bits is the symbol. It can be streamed out
    now into the target area at [D].)
        ? [bst oos] >= [bst lsw] -> Bst Stream Same Out Unit;
    (Only part of the symbol in C, or even no part of it at all, fits into the
    actual target address. We need the bits which fit in.)
        ? [bst oos] = 0 -> Bst Stream Next Unit;
        C -->;                          (Still need the symbol)
        C < [bst oos]; C > [bst lsw];   (Fitting-in part is right-aligned)
        [D] | C;                        (Merge MSBs of symbol into target)
        <-- C;                          (Restore complete original symbol)
    "Bst Stream Next Unit"
        D +;                            (Next output bits into next unit)

    (*************************************************************************)
    (Version 1.1 explicitely clears the output area now and does not expect a
    previously nulled area anymore.)
        [D] = 0;                        (Clear the new bitstream unit)
    (*************************************************************************)

        [bst oos] + 32;                 (This will be reduced as we go on)
    "Bst Stream Same Out Unit"
        [bst oos] - [bst lsw];          (Lowest bit to merge into target)
        C < [bst oos];                  (Shift symbol's bits into position)
        [D] | C;                        (Merge LSBs of symbol into target)
        [bst bits] + [bst lsw];         (Count the streamed out bits)
    (-------------------------------------------------------------------------)

    (*************************************************************************)
    (Version 1.1: This label is jumped to from the routine 'Bst Resume' after
    doing the required synchronization tasks to allow a resumed processing with
    the symbol following the last EOS symbol, at an arbitrary position within
    the current unit. It is also jumped to from within this routine, when we
    skip one or more input symbols as per argument to 'Bst Resume'.)
    "Bst Stream Resume"
    (*************************************************************************)

    (A has the originally left-aligned PSW bitmask. It is shifted into position
    now for the next symbol of PSW bits. If the PSW is 32, then this would
    result in a shift by 0, because shifts work modulo wordsize. This would be
    wrong, but because then also B would be 0, we will fetch a new unit anyway,
    and along with it B will be re-initialized. Because of this, no special
    measures are needed to adjust A.)
        A > [bst psw];

    (If B, the lowest bit of the streamed-out symbol, is larger than or equal
    to PSW, then there are more complete symbols in the actual unit. Get the
    next one.)
        ? B >= [bst psw] -> Bst Stream New Symbol;

    (If B is larger than 0, but smaller than PSW, then we have a symbol which
    is distributed in two adjacent units. This happens with odd streams like
    24- or 7-bit bitstreams and requires a special treatment to join the two
    parts from the adjacent units into a single symbol.)
        ? B > 0 -> Bst Stream Split Symbol;

    (When we arrive here, B is 0, i.e. we just have used up all bits of the
    actual unit. We proceed one unit and process it as above.)
        E +;                            (Next input unit)
        -> Bst Stream New Unit;
    (-------------------------------------------------------------------------)
    (This handles a symbol which is distributed in two adjacent units. This
    happens with odd streams like 24- or 7-bit bitstreams, but not with symbol
    widths which divide 32 without any reminder, i.e. any symbol width which is
    a power of 2. The actual unit's LSBs then contain the next symbol's MSBs,
    and the next unit's MSBs the next symbol's LSBs.)

    "Bst Stream Split Symbol"
        B -->; D -->;                   (Need two additional registers)
        C = [bst unit stream];          (Contains symbol's MSBs in its LSBs)
        A = [bst psw]; A - B;           (So many bits of symbol in next unit)
        C < A;                          (Shift symbol's MSBs to final position)
    (Now the next unit is needed. As before, depending on the endianness, its
    bytes possibly must be reversed. We want that unit in another register than
    C, because we still need C, as it contains the symbol's MSB already. We use
    register B. However, since 'Bst Reverse Bytes' works on C, we need to save
    it temporarily.)
        C -->;
        E +;                            (Address of next unit)
        C = [E];                        (Get the unit with the symbol's LSBs)
        ? [bst status] + BST BIG ENDIAN -> Bst Stream Split Unit Big Endian;
        => Bst Reverse Bytes;           (In C, bytes 'DCBA' reversed to 'ABCD')
    "Bst Stream Split Unit Big Endian"
        [bst unit stream] = C;          (Units readable left to right)
        B = C;                          (Bring to B)
        <-- C;
    (In B's MSB are the symbol's LSB. The number of bits is in A. We right-
    align these bits.)
        B <@ A;                         (Symbol's LSBs in B's LSBs)
    (All bits except the right-aligned A bits are garbage and need to be
    masked out. The mask is created in D. A can not be 32, so no precautions
    are needed: if A would contain 32, we would have no split symbols and would
    not be here to process such.)
        D = 1; D < A; D -;              (Mask for the symbol's LSBs)
        B & D;                          (Symbol's LSBs extracted now)
        C | B;                          (C contains whole symbol in its LSBs)
    (The PSW shift mask must be put back in place to resume processing with the
    next symbol. It must be in A, and it must be shifted right as many bits as
    we processed already from the new unit.)
        B = A;                          (Processed bits of new unit)
        A = [bst pswm];                 (The left-aligned PSW bitmask)
        A > B;                          (Shift in position for next symbol)
        D <--; B <--;                   (Restore original registers)
    (Also B, the lowest bit of the symbol being processed must be adjusted.)
        B + 32;                         (The bit in the next unit)
        B - [bst psw];                  (LSB of symbol in next unit)
    (Now C contains the right-aligned symbol in its PSW least significant bits.
    The upper bits contain some garbage. Resuming at 'Bst Stream Extract Symbol'
    will apply the LSW mask on C and then stream out the symbol.)
        -> Bst Stream Extract Symbol;
    (-------------------------------------------------------------------------)
    "Bst Stream Eos Read"

    (*************************************************************************)
    (Version 1.1: In order to allow 'Bst Resume' to resume with the symbol
    following the last found EOS symbol after a 'Bst Stream' or 'Bst Resume'
    call, the unit in which it was found needs to be stored, and also the bit
    position of the LSB of the EOS symbol.)
        [bst last eos unit] = E;        (Unit in which EOS was found)
        [bst last eos bit] = B;         (LSB of EOS symbol)
    (*************************************************************************)

    (Note, that bitstreams don't end with any special symbol. Instead, the
    number of bits is counted into [bst bits]. However, if you for whatever
    reason /need/ such an EOS symbol even in bistreams, then this would be the
    right place to code a such. Careful: Depending on the LSW, your symbol may
    require a split into two units. For the technique, see above code starting
    below the label 'Bst Stream Out'.)
    (-------------------------------------------------------------------------)
    "Bst Stream Exit OK"
        <-- E; <-- D; <-- C; <-- B; <-- A;
        leave;
    (=========================================================================)

(*****************************************************************************)
"Bst Resume"

    (=========================================================================)
    (Added in version 1.1. The routine 'Bst Stream' assumes, that the first
    symbol can be found in the MSBs or LSBs of the provided unit, depending on
    the endianness. Sometimes this is inconvenient, though, for example when
    the input is ANSI or ASCII and the first symbol to be streamed in is at a
    byte boundary or, more cumbersome, at any arbitrary position within the
    unit. In such cases, use this routine. It continues to read the input
    stream from the position after the last encountered EOS symbol, and treats
    that symbol as the first symbol of the new bitstream. However, even this
    may be inconvenient at times, for example in the situation, when several
    input strings are separated by a CR+LF sequence, of which the first, CR,
    was defined to be the EOS symbol. In this situation you don't want to
    resume with the next symbol, which would be LF, but you want to skip it.
    The number of /symbols/ to be skipped can be given in C: in aforementioned
    example you would use 1. Usually, you would use 0 here, though.)
    (-------------------------------------------------------------------------)
    (Input:     C:  Symbols to skip after the last EOS symbol, before starting
                    to stream out. This skips /output symbols/, not input
                    symbols: Should the position from which we resume to stream
                    in start with an EOS symbol, then it would /not/ be skipped
                    and the procedure would immediately terminate with a total
                    of 0 streamed out bits. Usually, this is 0 = stream out
                    from first symbol after last EOS symbol. Further notes see
                    in this routine's header above.

                D:  'Target', Address of output bitstream
                    NOTE: The output area's units are NULLED before the bits
                    are streamed out. However, only as many units are nulled
                    as required for the bitstream: all units after the last
                    used one are left as they were, i.e. may contain garbage.
                    Check [bst bits] for the number of streamed bits.
                    {[bst bits] + 31} > 5 provides the number of used 32 bit
                    units.
                    NOTE: If the area is not large enough, the output still
                    will continue to create the bitstream and overwrite what-
                    ever follows your too small area, most likely resulting in
                    a crash.

    Output:     [bst bits]: Number of streamed out bits, beginning at [D] in
                            its most significant bit.

    All registers unchanged.
    Can not cause errors. This routine will jump into 'Bst Stream' after the
    neccessary symchronization tasks. There it exits with 'leave': Do not
    query 'ok' or 'failed'!
    )
    (-------------------------------------------------------------------------)
    (These are the same register saves as in 'Bst Stream'. The routine will
    jump to 'Bst Stream' and terminate there. This requires same stack
    arrangement.)
        A -->; B -->; C -->; D -->; E -->;
    (-------------------------------------------------------------------------)
    (Effectively skipping the symbols is done in 'Bst Stream'. Here we just
    initialize the counter variable.)
        [bst skip symbols] = C;
    (-------------------------------------------------------------------------)
    (The offset of the out-stream is initialized with 32. This will be
    reduced by LSW prior to each bitstream being streamed out into an address
    at [D] and following.)
        [bst oos] = 32;                 (No symbol ready yet to stream out)
        [bst bits] = 0;                 (No bits streamed out yet)
    (-------------------------------------------------------------------------)
    (Clear the first bitstream unit)
        [D] = 0;
    (-------------------------------------------------------------------------)
    (Any previous call to 'Bst Stream' or 'Bst Resume' stored the position,
    at which the last EOS symbol was found, into two variables. The address in
    which the EOS occured was stored in [bst last eos unit], and the LSB of
    the EOS symbol in [bst last eos bit]. We load the appropriate registers
    with those values now, as if we just processed the EOS symbol as a valid
    symbol and are about to process the next symbol.)
        E = [bst last eos unit];        (Unit in which EOS was found)
        B = [bst last eos bit];         (LSB of EOS symbol)
    (-------------------------------------------------------------------------)
    (The actual unit must be present when we jump into 'Bst Stream'. If needed,
    it must be brought to big endian. It is expected in register C.)
        C = [E];                        (Get the actual unit)
        ? [bst status] + BST BIG ENDIAN -> Bst Resume Big Endian;
        => Bst Reverse Bytes;           (In C, bytes 'DCBA' reversed to 'ABCD')
    "Bst Resume Big Endian"
    (-------------------------------------------------------------------------)
    (The PSW mask was left in a position suitable to extract the EOS last
    time. It is expected in A.)
        A = [bst pswm];
    (-------------------------------------------------------------------------)
    (These are all informations we need. We can continue to stream in now as
    if we never did something else.)
        -> Bst Stream Resume;
    (-------------------------------------------------------------------------)
    ("Bst Resume Exit OK"
        This routine jumps into 'Bst Stream' and finishes there. It has no
        own exit labels.
    )
    (=========================================================================)

(*****************************************************************************)
"Bst Reverse Bytes"

    (=========================================================================)
    (Internal routine. Reverses the bytes in C such, that 'DCBA' becomes 'ABCD'.
    This reversion is needed for Small Endian processors such as 'Intel' to get
    a bitstream readable from left to right.)
    (-------------------------------------------------------------------------)
    (Input:     C:  Bits 'DDDDDDDDCCCCCCCCBBBBBBBBAAAAAAAA'
    Output:     C:  Bits 'AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDD'

    All registers except C unchanged.
    Can not cause errors. Exits with 'leave': Do not query 'ok' or 'failed'!
    )
    (-------------------------------------------------------------------------)
        A -->;
    (-------------------------------------------------------------------------)
                                        ('DCBA' are the four bytes in a unit)
        A = C;                          (A: 'DCBA')
        A @> 8;                         (A: 'ADCB')
        C <@ 8;                         (C: 'CBAD')
        A & FF 00 FF 00h;               (A: 'A-C-')
        C & 00 FF 00 FFh;               (C: '-B-D')
        C | A;                          (C: 'ABCD')
    (Although 6 instructions are not that bad, if you can think of a faster
    method to revert the bytes within a unit, drop me a note: I like efficient
    bit manipulation techniques. Thank you.)
    (-------------------------------------------------------------------------)
    "Bst Reverse Bytes Exit OK"
        <-- A;
        leave;
    (=========================================================================)

(*****************************************************************************)

Linoleum Syntax Highlighting produced with LSH (© 2007 by Herbert Glarner)