Vite, TypeScript, bundle-size

In the last post, I talked about the general structure of my baa-lexer. My goal was to rewrite moo with a focus of good structure and readability.

But since I have read Robert C. Martin’s “Clean Code”, I have wondered how writing clean JavaScript code was affecting the bundle size of the compiled result. No I was able to play around and tweak some bits.

All the benchmarks shown in this post are created with vitest’s “bench” feature. Bundle-sizes are reported by vite when building the bundle. The basic code tested here is available in the micro benchmarks branch of the baa-lexer project, but I didn’t check in every single test that I made.

Impacts of classes vs functions

lex() function of my lexer returns a TokenIterator, which is defined as IterableIterator<Token>. There are two ways to implement this interface.

A class

We can write a class implementing the interface. The interface IterableIterator defines a next() function that needs to be implemented. In order to increase readability, I have extracted some private methods and one public method nextToken().

export class TokenIterator<T extends LexerTypings>
  implements IterableIterator<Token<T>>
{
  readonly #string: string;
  readonly #states: StateStack<T>;
  readonly #tokenFactory: TokenFactory<T>;
  #offset: number;

  constructor(
    states: CompiledStateDict<T>,
    string: string,
    tokenFactory: TokenFactory<T>,
  ) {
    this.#string = string;
    this.#offset = 0;
    this.#states = createStateStack(states);
    this.#tokenFactory = tokenFactory;
  }

  [Symbol.iterator](): IterableIterator<Token<T>> {
    return this;
  }

  next(): IteratorResult<Token<T>> {
    const token = this.nextToken();
    return token == null ? DONE : { done: false, value: token };
  }

  nextToken(): Token<T> | null {
    /* ... */
  }

  #nextMatchOrSyntaxError() {
    /* ... */
  }

  #syntaxError(error: InternalSyntaxError) {
    /* ... */
  }
}

A function returning an object

The other way is to create a function that has IterableIterator as return type and that returns an object implementing that interface. All private fields and methods are inner functions of this function.

export function createTokenIterator<T extends LexerTypings>(
  states: CompiledStateDict<T>,
  string: string,
  tokenFactory: TokenFactory<T>,
): IterableIterator<Token<T>> {
  let offset = 0;
  const stateStack = createStateStack(states);

  function nextToken(): Token<T> | null {
    /* ... */
  }

  function nextMatchOrSyntaxError() {
    /* ... */
  }

  function syntaxError(error: InternalSyntaxError) {
    /* ... */
  }

  return {
    [Symbol.iterator](): IterableIterator<Token<T>> {
      return this;
    },
    next(): IteratorResult<Token<T>> {
      const token = nextToken();
      return token == null ? DONE : { done: false, value: token };
    },
  };
}

Performance

I have created several test-scenarios, i.e. combinations of lexer-rules and parsed strings. The class test uses the first implementation, the function-test uses the second.

  class - performance/moo-baa.bench.ts > moo-baa test: './tests/abab.ts' (+0)
    1.04x faster than function

  function - performance/moo-baa.bench.ts > moo-baa test: './tests/fallback.ts' (+0)
    1.06x faster than class

  class - performance/moo-baa.bench.ts > moo-baa test: './tests/handlears-ng.ts' (+0)
    1.27x faster than function

  class - performance/moo-baa.bench.ts > moo-baa test: './tests/handlears-ng.ts' (1)
    1.10x faster than function

  class - performance/moo-baa.bench.ts > moo-baa test: './tests/handlears-ng.ts' (2)
    1.19x faster than function

  class - performance/moo-baa.bench.ts > moo-baa test: './tests/handlears-ng.ts' (3)
    1.87x faster than function

  class - performance/moo-baa.bench.ts > moo-baa test: './tests/json-regex.ts' (+0)
    1.05x faster than function

  class - performance/moo-baa.bench.ts > moo-baa test: './tests/json.ts' (+0)
    1.15x faster than function

The class variant is slightly faster most of the time, but the most interesting peek is the test './tests/handlears-ng.ts' (3) in which the class test is 1.87 times faster than the function. This test uses the Handlebars-lexer to parse an exceptionally short string. During the test, the TokenIterator is instantiated more often than in the other test. My conclusion is that the function implementation is not slower per-se, but the instantiation takes more time.

Since V8 tries to use C++ classes internally, my theory is that this is much easier when writing an actual JavaScript class. In the function variant, it probably needs to create new function objects for each inner function, while in the class variant, creating a single class instance is enough.

Bundle size

This is a pity, because just using the function variant instead of the class variant brings the bundle-size down from 2.4 kB gzipped to 2.11 kb gzipped.

This is a huge difference, but where does it come from? Of course, in the class variant, there are initializer assignments, there is a constructor. But on the other hand we don’t need the function keyword as often. Is this really responsible for 290 bytes difference in gzipped code?

Let’s have a look at the generated code.

Private fields, WeakMaps, define class fields

The generated, minified ES6 module code for the class variant looks like this.

const q = {
  done: !0,
  value: void 0,
};
var x, l, m, p, g, S, y, T;

class z {
  constructor(t, n, r) {
    h(this, g);
    h(this, y);
    h(this, x, void 0);
    h(this, l, void 0);
    h(this, m, void 0);
    h(this, p, void 0);
    f(this, x, n), f(this, p, 0), f(this, l, V(t)), f(this, m, r);
  }

  [Symbol.iterator]() {
    return this;
  }

  next() {
    const t = this.nextToken();
    return t == null ? q : { done: !1, value: t };
  }

  nextToken() {
    if (i(this, p) >= i(this, x).length) return null;
    const t = w(this, g, S).call(this);
    f(this, p, i(this, p) + t.text.length);
    const n = i(this, m).createToken(t);
    return (
      t.rule.push && i(this, l).push(t.rule.push),
      t.rule.pop && i(this, l).pop(),
      t.rule.next && i(this, l).next(t.rule.next),
      n
    );
  }
}
(x = new WeakMap()),
  (l = new WeakMap()),
  (m = new WeakMap()),
  (p = new WeakMap()),
  (g = new WeakSet()),
  (S = function () {
    try {
      return i(this, l).current.nextMatch(i(this, x), i(this, p));
    } catch (t) {
      throw t instanceof R ? new Error(w(this, y, T).call(this, t)) : t;
    }
  }),
  (y = new WeakSet()),
  (T = function (t) {
    const { line: n, column: r } = i(this, m).currentLocation,
      o = t.expectedTokenTypes.map((c) => "`" + c + "`").join(", ");
    return `Syntax error at ${n}:${r}, expected one of ${o} but got '${t.foundChar}'`;
  });

There are some interesting parts of code here:

In the constructor, all assignments are wrapped with a mysterious h function. This is actually defined at the very top of the file and reads

var i = (e, t, n) => (
    k(e, t, "read from private field"), n ? n.call(e) : t.get(e)
  ),
  h = (e, t, n) => {
    if (t.has(e))
      throw TypeError("Cannot add the same private member more than once");
    t instanceof WeakSet ? t.add(e) : t.set(e, n);
  },
  f = (e, t, n, r) => (
    k(e, t, "write to private field"), r ? r.call(e, n) : t.set(e, n), n
  );

Turns out, this code is inserted by Vite or TypeScript (I don’t know which), because I used # to create private fields and methods (e.g. #nextMatchOrSyntaxError()).

Another irritating observation is the use of WeakMap and WeakSet in my class, although I never used WeakMap and WeakSet in my actual code.

I decided that I don’t need the “private field” checks, because I don’t expose the class itself externally, so I used my own convention and replaced all # with _ (e.g. _nextMatchOrSyntaxError()).

This reduces the bundle size from 2.4k to 2.15kb gzipped and also gets rid of the other WeakMap instances. However, the constructor calls a mysterious a-function now:

class b {
  constructor(t, n, r) {
    a(this, "_string");
    a(this, "_states");
    a(this, "_tokenFactory");
    a(this, "_offset");
    (this._string = n),
      (this._offset = 0),
      (this._states = T(t)),
      (this._tokenFactory = r);
  }
}

which is defined at the top as

var x = Object.defineProperty;
var m = (e, t, n) =>
  t in e
    ? x(e, t, { enumerable: !0, configurable: !0, writable: !0, value: n })
    : (e[t] = n);
var a = (e, t, n) => (m(e, typeof t != "symbol" ? t + "" : t, n), n);

This is actually a TypeScript feature introduced in TypeScript 3.7 and activated per default in 4.2.4. It uses Object.defineProperty to define public class fields. The reason the changelog states is that “there is an extremely strong chance that public class fields will be standardized differently” than thought before.

I have to admit, I don’t see the reasoning behind that, so I set useDefineForClassFields to false in the tsconfig.json. Since I have many tests and benchmarks, I think it is safe to do so in my case.

This makes the generated code look much more like I would have expected:

class O {
  constructor(t, n, r) {
    (this._string = n),
      (this._offset = 0),
      (this._states = S(t)),
      (this._tokenFactory = r);
  }

  [Symbol.iterator]() {
    return this;
  }

  next() {
    const t = this.nextToken();
    return t == null ? C : { done: !1, value: t };
  }

  nextToken() {
    if (this._offset >= this._string.length) return null;
    const t = this._nextMatchOrSyntaxError();
    this._offset += t.text.length;
    const n = this._tokenFactory.createToken(t);
    return (
      t.rule.push && this._states.push(t.rule.push),
      t.rule.pop && this._states.pop(),
      t.rule.next && this._states.next(t.rule.next),
      n
    );
  }

  _nextMatchOrSyntaxError() {
    try {
      return this._states.current.nextMatch(this._string, this._offset);
    } catch (t) {
      throw t instanceof f ? new Error(this._syntaxError(t)) : t;
    }
  }

  _syntaxError(t) {
    const { line: n, column: r } = this._tokenFactory.currentLocation,
      o = t.expectedTokenTypes.map((c) => "`" + c + "`").join(", ");
    return `Syntax error at ${n}:${r}, expected one of ${o} but got '${t.foundChar}'`;
  }
}

Bundle size: 2.01 kb for the class version and 1.98 kb for the function variant. This is a difference of 30 bytes, which I find acceptable given that the class performance is better.

Balancing bundle size and performance

Of course there are other classes in my code and I did performance and bundle size measurements for most of them. For some, I decided to keep the function variant.

The StateProcessor for example is instantiated only once when the lexer is created, not for every parsed document. I didn’t even notice the difference, because my performance tests don’t take lexer creation into account. It is only created once and then reused multiple times.

If you want to optimize this hard, you have to play around a lot and choose which variant to use for each occasion. I would however vote for classes most of the time.

Conclusion

The conclusion that I draw from my experiments is:

Functions that return objects are a bit smaller in code than classes.
Functions that return objects take longer than instantiating single class.
As a rule of thumb, if an object has many methods, you should create a class.
If a class is only instantiated once, or very rarely, you can use the function variant.

The largest reduction in bundle-size however was due to setting useDefineForClassFields to false and using _ instead of # to mark private fields.

My first version of the lexer was about 3 kb large. Those changes brought it down to 2.01 kb (the final version is a bit larger than that because I introduced some factory functions and thought that this increase is acceptable).

However, my feeling is that the gains are not proportional to the complete bundle size. So, if your goal is to write minimal libraries (X in less than Y kb size), this may be helpful. But it won’t reduce the size of your app from 3 MB to 2MB.

If you have a large app, and you want to reduce its size, you should use tools like vite-bundle-visualizer to see where all those bytes are coming from. Chances are good that you simply have some large dependencies and solving this is a completely different story.