Compressing a website into a URL

This post explains how portabl.ink works. Portablink is a tool that creates self-contained compressed web pages in a single link. Check portablink project page for more info.

tl;dr: It uses data URLs containing compressed data which is bundled with its own decompression instructions.

Data URLs

In case you aren’t familiar with data URLs, they are URLs whose contents are in the URL itself. They all start with data:. Here, let me show you with this interactive mock browser:

(Hint: edit the URL in the address bar!)

When you load a data URL, the browser shows the content embedded in the URL directly.

So, that’s it? Website in a URL? — Use data URLs! Easy!

Wait, there’s more! The portablink tool does some more things than just putting your HTML in a data URL. The tool also compresses your content so you don’t end up with humongous URLs.

Compression

Within a data URL, the tool bundles both the compressed data and the instructions needed to decompress and bootstrap that data. This produces a self-contained, compressed document in a single portable link that can be decompressed and rendered by any modern browser.

Here’s an example URL generated by the tool:

data:text/html,<body onload="fetch`data:;base64,fVTfb5swEH7PX+HxUIFiEpJu0pRgpK0P68OyX8pDpSibDJhwqbGZbdrQpP/7DI7WpluDZPv8fb7Pd8dBXJqKJ3HJaJ7EBgxnyRJEi5aQhYbaIVk8dvhgEGvTdgZCqczbvV0RykHXnLazgrPdvEe2jTZQtGEmhWHCzDI7MeU4ymEjQjCs0id4Yc+GBa2At7MGQiUbkbMce9eg6AaERAuqGvRJmhIy9E3JLx7+3kB2q6nI8ZWsCqkMpXhBxZYqwN4HBZSjH04GLZbevxD6KHnu4SsbUmpdtGxUxkIrqMNaSdwbmikouggf7RhtFOSnSXeIS6CzQpuWhQ2zqfOmEnqmWM2o8S/xpFCBO3gPuSln76O7CoRDSgab0jyDusvSxhgp9k/F0fDAZtN3T24cBAuPvpGDUqlypmZacshRhFJOs9tTwZkwZZiVwHP/UgT4/3g4CfbP9ELOChMe4653rwqK4dtTPyPrl27x+NhC1nI913VSEudwhzJOtSZeV0mv67HY3YAgJ14UeUk8dsBLbnKGm57hoskZzTPc9AwXTc9onuGmJ1w8tvXoagTVBmmVEc9DUjClpCLegDODAG/xLa6wIqsIT/B0jUty8buRZg7CHrxeLj67LTZKkAg/kFxmTWW/uFFX8FW5HuB7UpNEM7OEisnG+L9IsqecKePXQ+dsm1XoN84O5i8VyMMjnkRRMMCU+DaggCT2ja/ccTcPYbi1N2XEah+Jm2NcbvnqlvWoorVvw/HVSMuK+S1J1IjdMdX6O5JQf4fboLuS1MHhcMK0ePcKszvDTMO/XBBc0Kqe99O9b/cD1UcDXQidsbVhpSRJR1Jk3P5zumRS63w4+N1yrPvNMZWVLfhPMlnjzLfSPu0rY59B/4L7Zh/3/9w/`.then(a=>new Response(a.body.pipeThrough(new DecompressionStream(`deflate-raw`))).text().then(a=>document.documentElement.innerHTML=a))">

The URL above is 1,078 bytes. It was compressed from a source doc of size 1.37 KB, which is a 23% compression!

Btw, the above URL loads a simple tic-tac-toe web game. See for yourself by copying it into your browser’s address bar.

So, how does it work?

The embedded document contains a singular <body> tag with an onload callback. The callback itself contains the main script that will decompress and render the desired content.

A small point: Why onload? Well, it’s shorter than using a <script> tag.

A: <body onload="/* code */">
B: <script>/* code */</script>

The closing tag is optional for body but not for script. In the end, the body tag wins by 1 character! Every character - in a URL - is precious.

<img onerror="..."> would’ve worked as well, having equal length as the body option.

Now, with that out of the way, let’s break the main script down. I’ll plop the prettified code here first, then explain the interesting bits.

fetch(
  `data:;base64,fVTfb5swEH7PX+HxUIFiEpJu0pRgpK0P68OyX8pDpSibDJhwqbGZbdrQpP/7DI7WpluDZPv8fb7Pd8dBXJqKJ3HJaJ7EBgxnyRJEi5aQhYbaIVk8dvhgEGvTdgZCqczbvV0RykHXnLazgrPdvEe2jTZQtGEmhWHCzDI7MeU4ymEjQjCs0id4Yc+GBa2At7MGQiUbkbMce9eg6AaERAuqGvRJmhIy9E3JLx7+3kB2q6nI8ZWsCqkMpXhBxZYqwN4HBZSjH04GLZbevxD6KHnu4SsbUmpdtGxUxkIrqMNaSdwbmikouggf7RhtFOSnSXeIS6CzQpuWhQ2zqfOmEnqmWM2o8S/xpFCBO3gPuSln76O7CoRDSgab0jyDusvSxhgp9k/F0fDAZtN3T24cBAuPvpGDUqlypmZacshRhFJOs9tTwZkwZZiVwHP/UgT4/3g4CfbP9ELOChMe4653rwqK4dtTPyPrl27x+NhC1nI913VSEudwhzJOtSZeV0mv67HY3YAgJ14UeUk8dsBLbnKGm57hoskZzTPc9AwXTc9onuGmJ1w8tvXoagTVBmmVEc9DUjClpCLegDODAG/xLa6wIqsIT/B0jUty8buRZg7CHrxeLj67LTZKkAg/kFxmTWW/uFFX8FW5HuB7UpNEM7OEisnG+L9IsqecKePXQ+dsm1XoN84O5i8VyMMjnkRRMMCU+DaggCT2ja/ccTcPYbi1N2XEah+Jm2NcbvnqlvWoorVvw/HVSMuK+S1J1IjdMdX6O5JQf4fboLuS1MHhcMK0ePcKszvDTMO/XBBc0Kqe99O9b/cD1UcDXQidsbVhpSRJR1Jk3P5zumRS63w4+N1yrPvNMZWVLfhPMlnjzLfSPu0rY59B/4L7Zh/3/9w/`
)
.then(compressedHtml =>
  new Response(
    compressedHtml.body
      .pipeThrough(new DecompressionStream(`deflate-raw`))
  )
  .text()
  .then(html =>
    document.documentElement.innerHTML = html
  )
)

1. fetch(`data:;base64,fVTfb5swEH...`)

The first thing you’ll notice is the huge chunk of base64-encoded data wrapped in a fetch() call.

The encoded data is the compressed HTML which has been prepared by a complementary compression script. Since compressed data is binary, it has been encoded in a text-friendly format for it to be a valid URL. Base64, an encoding that’s native to the web, was used for this purpose.

To decode the base64 data, instead of using the standard atob() function, fetch() was used. fetch sees the ;base64 flag in that data URL and decodes it natively.

While it accomplishes the same thing as atob, fetch is slightly better, because:

fetch() outputs a stream, the format needed for decompression later.
Code size. We’re optimising for the total URL length. The atob method requires extra massaging for Unicode.

Here’s a quick length comparison. A is fetch. B is atob. Below is a character ruler with markings.

A: fetch`data:;base64,dVo=`.then(a=>a.body)
B: new Blob([Uint8Array.from(atob`dVo=`,a=>a.codePointAt(0))]).stream()
                                          ^                           ^
                                          A                           B
   0        10        20        30        40        50        60       
   12345678901234567890123456789012345678901234567890123456789012345678

The atob() method requires additional 28 characters!

Tagged templates can be abused here to save a couple of characters. Instead of fetch("abc"), we can use fetch`abc`!

Here is the code described so far:

  /* wrap the compressed data in base64 */
  fetch(`data:;base64,fVTfb5swEH7PX+HxUIFiEpJu0pRgpK0P68Oy...`)
  .then(compressedHtml =>
    /* a binary stream of the compressed data is given by body */
    compressedHtml.body
  )

2. body.pipeThrough(new DecompressionStream(...))

Next thing to note is the DecompressionStream class. This is from the new Compression Streams API which allows browser native compression and decompression. This saves a lot of decompression code from being bundled with the URL.

As of writing, this API can only consume streams. That’s why the code required streams.

Continuing. The body stream containing the compressed data is piped through the decompressor, which results in the uncompressed, original HTML string - in a stream.

  fetch(`data:;base64,fVTfb5swEH7PX+HxUIFiEpJu0pRgpK0P68Oy...`)
  .then(compressedHtml =>
-   compressedHtml.body
+   /* this results in a stream of the original HTML */
+   compressedHtml.body
+     .pipeThrough(new DecompressionStream(`deflate-raw`))
  )

deflate-raw is the compression algorithm. The same algorithm must be specified for both compression and decompression.

To convert the decompressed stream to a usable string, we can use...

3. new Response(stream).text()

We can abuse the native Response class’s text() function to convert the stream into a string.

  fetch(`data:;base64,fVTfb5swEH7PX+HxUIFiEpJu0pRgpK0P68Oy...`)
  .then(compressedHtml =>
+   /* a Response wrapper will be used to decode into text */
+   new Response(
      compressedHtml.body
        .pipeThrough(new DecompressionStream(`deflate-raw`))
+   )
+   /* decode stream into text */
+   .text()
  )

4. document.documentElement.innerHTML = html

Finally, the uncompressed HTML in string form is assigned to the whole document. document.write(html) could have been used here, but some browsers don’t like this function. innerHTML works equivalently well.

This brings us to the final form:

  fetch(`data:;base64,fVTfb5swEH7PX+HxUIFiEpJu0pRgpK0P68Oy...`)
  .then(compressedHtml =>
    new Response(
      compressedHtml.body
        .pipeThrough(new DecompressionStream(`deflate-raw`))
    )
    .text()
+   .then(html =>
+     /* replace page with decoded html */
+     document.documentElement.innerHTML = html
+   )
  )

I’m pretty sure the above code could be minified further, but this is the smallest I could make it.

Preparing the compressed code

The above describes the process of decompressing the base64-encoded compressed HTML. Where does that compressed HTML string come from?

The following function creates the base64-encoded compressed string from an input HTML:

async function formatPayload(html) {
  const compressed = await new Response(
    new Blob([html])
      .stream()
      .pipeThrough(new CompressionStream('deflate-raw'))
  ).arrayBuffer();
  return btoa(String.fromCharCode(...new Uint8Array(compressed)));
}

This function is called at authoring time.

It’s important to compress the data first before encoding it in base64. Doing it in the wrong order results in a larger size! This is because base64 messes with the byte (octet) alignment but the compression algorithm works in terms of bytes. In addition, base64 encoding in itself inflates the data by 33%.

Demo

This demo was made using the portablink’s core library https://kalabasa.github.io/portabl.ink/pack.js.

Edit the HTML below!

⬇

Note: If your input is not compressible enough, the overhead of bundling the decompression code might not be worth it. The tool may decide to use plain text, whichever is smaller.

Check out the full-fledged tool at portabl.ink! It’s got a WYSIWYG editor, pretty pages, and more!

Conclusion

Portablink was a fun weekend project (actually about a week). I learned about new Web APIs and some code golfing techniques. I launched a “product”!

Potential improvement(s):

Use Base122 for more compression. We’re not restricted to ASCII anymore. We can use a larger range of characters than base64. This adds a significant amount of decoder code though.
Better authoring experience.

Limitation(s):

Data URLs, while portable, are a poor way to share links. Browsers restrict navigation to data URLs. Apps don’t accept them. These URLs are presumed to be malicious nowadays.