D-String, an answer to P-String and C-String limitations

C-String’s are delimited by a NULL byte. P-Strings are preceded by a length identifier. Both have their downsides and I’ve developed the solution (it’s called the D-String; D for Data). The C-String’s downfall is that it cannot contain a NULL (else the interpreting language — C — will prematurely terminate the data). The P-String’s downfall is that it cannot represent more than 255 bytes (unless of course you use a wider length identifier in which case you’ve also increased the overhead). The D-String overcomes both of these limitations with minimal overhead. Let’s have a look at the specifics (free of charge).

NOTE: These are my own internal notes. They will be translated into a full technical explanation in another blog posting. However… the fact is that I’ve sat on this technology for 10 years and want to finally make it public. This is the first step in doing so. Last, this will serve as a backup should my iPhone crash (currently the only machine in the world with a documented example of the methodology). This is not meant to be digested by mere mortals (but if you can, all the more power to you — you’ll have a leg-up on the rest of those waiting on the technical discussion).

shxd.rfc04 — dStr data object

Len
0x00 => 0x00
0x01 => 0x01 0x00 0x00 DATA
0xFF => 0xFF 0x00 0x00 DATA
0x0100 => 0x0101 0x00 0x01 DATA
0x0101 => 0x0101 0x00 0x00 DATA
0xFFFF => 0xFFFF 0x00 0x00 DATA
0x010000 => 0x010101 0x00 0x03 DATA
0x010100 => 0x010101 0x00 0x02 DATA
0x010100 => 0x010101 0x00 0x01 DATA
0xFFFFFF => 0xFFFFFF 0x00 0x00 DATA
0x01000000 => 0x01010101 0x00 0x07 DATA
0x01000001 => 0x01010101 0x00 0x06 DATA
0x01000100 => 0x01010101 0x00 0x05 DATA
0x01000101 => 0x01010101 0x00 0x04 DATA
0x01010000 => 0x01010101 0x00 0x03 DATA
0x01010001 => 0x01010101 0x00 0x02 DATA
0x01010100 => 0x01010101 0x00 0x01 DATA
0xFFFFFFFF => 0xFFFFFFFF 0x00 0x00 DATA
0x0100000000 => 0x0101010101 0x00 0x0F DATA
0x0100000001 => 0x0101010101 0x00 0x0E DATA
0x0100000100 => 0x0101010101 0x00 0x0D DATA
0x0100000101 => 0x0101010101 0x00 0x0C DATA
0x0100010000 => 0x0101010101 0x00 0x0B DATA
0x0100010001 => 0x0101010101 0x00 0x0A DATA
0x0100010100 => 0x0101010101 0x00 0x09 DATA
0x0100010101 => 0x0101010101 0x00 0x08 DATA
0x0101000000 => 0x0101010101 0x00 0x07 DATA
0x0101000001 => 0x0101010101 0x00 0x06 DATA
0x0101000100 => 0x0101010101 0x00 0x05 DATA
0x0101000101 => 0x0101010101 0x00 0x04 DATA
0x0101010000 => 0x0101010101 0x00 0x03 DATA
0x0101010001 => 0x0101010101 0x00 0x02 DATA
0x0101010100 => 0x0101010101 0x00 0x01 DATA
0xFFFFFFFFFF => 0xFFFFFFFFFF 0x00 0x00 DATA
0x010000000000 => 0x010101010101 0x00 0x1F DATA
0x010000000001 => 0x010101010101 0x00 0x1E DATA
0x010000000100 => 0x010101010101 0x00 0x1D DATA
0x010000000101 => 0x010101010101 0x00 0x1C DATA
0x010000010000 => 0x010101010101 0x00 0x1B DATA
0x010000010001 => 0x010101010101 0x00 0x1A DATA
0x010000010100 => 0x010101010101 0x00 0x19 DATA
0x010000010101 => 0x010101010101 0x00 0x18 DATA
0x010001000000 => 0x010101010101 0x00 0x17 DATA
0x010001000001 => 0x010101010101 0x00 0x16 DATA
0x010001000100 => 0x010101010101 0x00 0x15 DATA
0x010001000101 => 0x010101010101 0x00 0x14 DATA
0x010001010000 => 0x010101010101 0x00 0x13 DATA
0x010001010001 => 0x010101010101 0x00 0x12 DATA
0x010001010100 => 0x010101010101 0x00 0x11 DATA
0x010001010101 => 0x010101010101 0x00 0x10 DATA
0x010100000000 => 0x010101010101 0x00 0x0F DATA
0x010100000001 => 0x010101010101 0x00 0x0E DATA
0x010100000100 => 0x010101010101 0x00 0x0D DATA
0x010100000101 => 0x010101010101 0x00 0x0C DATA
0x010100010000 => 0x010101010101 0x00 0x0B DATA
0x010100010001 => 0x010101010101 0x00 0x0A DATA
0x010100010100 => 0x010101010101 0x00 0x09 DATA
0x010100010101 => 0x010101010101 0x00 0x08 DATA
0x010101000000 => 0x010101010101 0x00 0x07 DATA
0x010101000001 => 0x010101010101 0x00 0x06 DATA
0x010101000100 => 0x010101010101 0x00 0x05 DATA
0x010101000101 => 0x010101010101 0x00 0x04 DATA
0x010101010000 => 0x010101010101 0x00 0x03 DATA
0x010101010001 => 0x010101010101 0x00 0x02 DATA
0x010101010100 => 0x010101010101 0x00 0x01 DATA
0xFFFFFFFFFFFF => 0xFFFFFFFFFFFF 0x00 0x00 DATA
.
.
.
0x0100000000000000 => 0x0101010101010101 0x00 0x7F DATA
0xFFFFFFFFFFFFFFFF => 0xFFFFFFFFFFFFFFFF 0x00 0x00 DATA
.
.
.
0x01000000000000000000000000000000 => 0x01010101010101010101010101010101 0x00 0x7FFFF DATA
0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF => 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF 0x00 0x0000 DATA

That’s a length identifier of 2^(8*16) or 2^128 or 3.4028236692094e+38 or 340,282 thousand Decillion bytes long. The length identifier is valid with only 3 bytes of overhead preceding the actual DATA (compared to 16 bytes for the length identifier).

Scaling this higher (to 512 bit integers — 64-bytes wide), the overhead would be 9 bytes.

The overhead is always the length of the length-identifier (in bytes) divided by 8 plus one (with the minimum overhead being two bytes at the low end).

If a dStr contains 0 bytes of data, the dStr will be 0x00.

If a dStr contains 1-15 bytes of data, the dStr will be 0xLL 0x00 0xNN DATA (header is 3 bytes). LL is the length of DATA. NN is the encode register.

If a dStr contains 65536-16777215 bytes of data, the dStr will be 0xLLLLLL 0x00 0xNN DATA (header is 5 bytes).

If a dStr contains 16777216-4294967295 bytes of data, the dStr will be 0xLLLLLLLL 0x00 0xNN DATA (header of 6 bytes).

If a dStr contains 4294967296-1099511627775 bytes of data, the dStr will be 0xLLLLLLLLLL 0x00 0xNN DATA (header of 7 bytes).

If a dStr contains 1099511627776-281474976710655 bytes of data, the dStr will be 0xLLLLLLLLLLLL 0x00 0xNN DATA (header of 8 bytes).

If a dStr contains 281474976710655-7.2057594037928e+16 bytes of data, the dStr will be 0xLLLLLLLLLLLLLL 0x00 0xNN DATA (header of 9 bytes).

If a dStr contains 7.2057594037928e+16-1.844674407371e+19 bytes of data, the dStr will be 0xLLLLLLLLLLLLLLLL 0x00 0xNN DATA (header of 10 bytes).

If a dStr contains 1.844674407371e+19-4.7223664828696e+21 bytes of data, the dStr will be 0xLLLLLLLLLLLLLLLLLL 0x00 0xNNNN DATA (header of 12 bytes).

If the dStr contains 4.7223664828696e+21-1.2089258196146e+24 bytes of data, the dStr will be 0xLLLLLLLLLLLLLLLLLLLL 0x00 0xNNNN DATA (header of 13 bytes).

Ad nausea to infinitum.