Vite, TypeScript, bundle-size
In this post, I want to share some interesting conclusions from my attempt to write the baa-lexer. How can I reduce the bundle-size of my TypeScript project, without sacrificing modularity? Which compiler-settings create boilerplate code that can be omitted?
In the last post, I talked about the general structure of my baa-lexer. My goal was to rewrite moo with a focus of good structure and readability.
But since I have read Robert C. Martin’s “Clean Code”, I have wondered how writing clean JavaScript code was affecting the bundle size of the compiled result. No I was able to play around and tweak some bits.
All the benchmarks shown in this post are created with vitest
’s “bench”
feature. Bundle-sizes are reported by vite
when building the bundle. The basic
code tested here is available in the
micro benchmarks
branch of the baa-lexer
project, but I didn’t check in every single test that
I made.
Impacts of classes vs functions
lex()
function of my lexer returns a TokenIterator
, which is defined as
IterableIterator<Token>
. There are two ways to implement this interface.
A class
We can write a class implementing the interface. The interface
IterableIterator
defines a next()
function that needs to be implemented. In
order to increase readability, I have extracted some private methods and one
public method nextToken()
.
export class TokenIterator<T extends LexerTypings>
implements IterableIterator<Token<T>>
{
readonly #string: string;
readonly #states: StateStack<T>;
readonly #tokenFactory: TokenFactory<T>;
#offset: number;
constructor(
states: CompiledStateDict<T>,
string: string,
tokenFactory: TokenFactory<T>,
) {
this.#string = string;
this.#offset = 0;
this.#states = createStateStack(states);
this.#tokenFactory = tokenFactory;
}
[Symbol.iterator](): IterableIterator<Token<T>> {
return this;
}
next(): IteratorResult<Token<T>> {
const token = this.nextToken();
return token == null ? DONE : { done: false, value: token };
}
nextToken(): Token<T> | null {
/* ... */
}
#nextMatchOrSyntaxError() {
/* ... */
}
#syntaxError(error: InternalSyntaxError) {
/* ... */
}
}
A function returning an object
The other way is to create a function that has IterableIterator
as return type
and that returns an object implementing that interface. All private fields and
methods are inner functions of this function.
export function createTokenIterator<T extends LexerTypings>(
states: CompiledStateDict<T>,
string: string,
tokenFactory: TokenFactory<T>,
): IterableIterator<Token<T>> {
let offset = 0;
const stateStack = createStateStack(states);
function nextToken(): Token<T> | null {
/* ... */
}
function nextMatchOrSyntaxError() {
/* ... */
}
function syntaxError(error: InternalSyntaxError) {
/* ... */
}
return {
[Symbol.iterator](): IterableIterator<Token<T>> {
return this;
},
next(): IteratorResult<Token<T>> {
const token = nextToken();
return token == null ? DONE : { done: false, value: token };
},
};
}
Performance
I have created several test-scenarios, i.e. combinations of lexer-rules and
parsed strings. The class
test uses the first implementation, the
function
-test uses the second.
class - performance/moo-baa.bench.ts > moo-baa test: './tests/abab.ts' (+0)
1.04x faster than function
function - performance/moo-baa.bench.ts > moo-baa test: './tests/fallback.ts' (+0)
1.06x faster than class
class - performance/moo-baa.bench.ts > moo-baa test: './tests/handlears-ng.ts' (+0)
1.27x faster than function
class - performance/moo-baa.bench.ts > moo-baa test: './tests/handlears-ng.ts' (1)
1.10x faster than function
class - performance/moo-baa.bench.ts > moo-baa test: './tests/handlears-ng.ts' (2)
1.19x faster than function
class - performance/moo-baa.bench.ts > moo-baa test: './tests/handlears-ng.ts' (3)
1.87x faster than function
class - performance/moo-baa.bench.ts > moo-baa test: './tests/json-regex.ts' (+0)
1.05x faster than function
class - performance/moo-baa.bench.ts > moo-baa test: './tests/json.ts' (+0)
1.15x faster than function
The class
variant is slightly faster most of the time, but the most
interesting peek is the test './tests/handlears-ng.ts' (3)
in which the
class
test is 1.87 times faster than the function
. This test uses the
Handlebars-lexer to parse an exceptionally short string. During the test, the
TokenIterator
is instantiated more often than in the other test. My conclusion
is that the function
implementation is not slower per-se, but the
instantiation takes more time.
Since V8 tries to use C++ classes internally, my theory is that this is much
easier when writing an actual JavaScript class. In the function
variant, it
probably needs to create new function objects for each inner function, while in
the class
variant, creating a single class instance is enough.
Bundle size
This is a pity, because just using the function
variant instead of the class
variant brings the bundle-size down from 2.4 kB gzipped
to 2.11 kb gzipped
.
This is a huge difference, but where does it come from? Of course, in the
class
variant, there are initializer assignments, there is a constructor
.
But on the other hand we don’t need the function
keyword as often. Is this
really responsible for 290 bytes difference in gzipped code?
Let’s have a look at the generated code.
Private fields, WeakMaps, define class fields
The generated, minified ES6 module code for the class
variant looks like this.
const q = {
done: !0,
value: void 0,
};
var x, l, m, p, g, S, y, T;
class z {
constructor(t, n, r) {
h(this, g);
h(this, y);
h(this, x, void 0);
h(this, l, void 0);
h(this, m, void 0);
h(this, p, void 0);
f(this, x, n), f(this, p, 0), f(this, l, V(t)), f(this, m, r);
}
[Symbol.iterator]() {
return this;
}
next() {
const t = this.nextToken();
return t == null ? q : { done: !1, value: t };
}
nextToken() {
if (i(this, p) >= i(this, x).length) return null;
const t = w(this, g, S).call(this);
f(this, p, i(this, p) + t.text.length);
const n = i(this, m).createToken(t);
return (
t.rule.push && i(this, l).push(t.rule.push),
t.rule.pop && i(this, l).pop(),
t.rule.next && i(this, l).next(t.rule.next),
n
);
}
}
(x = new WeakMap()),
(l = new WeakMap()),
(m = new WeakMap()),
(p = new WeakMap()),
(g = new WeakSet()),
(S = function () {
try {
return i(this, l).current.nextMatch(i(this, x), i(this, p));
} catch (t) {
throw t instanceof R ? new Error(w(this, y, T).call(this, t)) : t;
}
}),
(y = new WeakSet()),
(T = function (t) {
const { line: n, column: r } = i(this, m).currentLocation,
o = t.expectedTokenTypes.map((c) => "`" + c + "`").join(", ");
return `Syntax error at ${n}:${r}, expected one of ${o} but got '${t.foundChar}'`;
});
There are some interesting parts of code here:
-
In the constructor, all assignments are wrapped with a mysterious
h
function. This is actually defined at the very top of the file and readsvar i = (e, t, n) => ( k(e, t, "read from private field"), n ? n.call(e) : t.get(e) ), h = (e, t, n) => { if (t.has(e)) throw TypeError("Cannot add the same private member more than once"); t instanceof WeakSet ? t.add(e) : t.set(e, n); }, f = (e, t, n, r) => ( k(e, t, "write to private field"), r ? r.call(e, n) : t.set(e, n), n );
Turns out, this code is inserted by Vite or TypeScript (I don’t know which), because I used
#
to create private fields and methods (e.g.#nextMatchOrSyntaxError()
). -
Another irritating observation is the use of
WeakMap
andWeakSet
in my class, although I never usedWeakMap
andWeakSet
in my actual code.
I decided that I don’t need the “private field” checks, because I don’t expose
the class itself externally, so I used my own convention and replaced all #
with _
(e.g. _nextMatchOrSyntaxError()
).
This reduces the bundle size from 2.4k to 2.15kb gzipped and also gets rid
of the other WeakMap
instances. However, the constructor calls a mysterious
a
-function now:
class b {
constructor(t, n, r) {
a(this, "_string");
a(this, "_states");
a(this, "_tokenFactory");
a(this, "_offset");
(this._string = n),
(this._offset = 0),
(this._states = T(t)),
(this._tokenFactory = r);
}
}
which is defined at the top as
var x = Object.defineProperty;
var m = (e, t, n) =>
t in e
? x(e, t, { enumerable: !0, configurable: !0, writable: !0, value: n })
: (e[t] = n);
var a = (e, t, n) => (m(e, typeof t != "symbol" ? t + "" : t, n), n);
This is actually a
TypeScript feature
introduced in TypeScript 3.7 and activated per default in 4.2.4. It uses
Object.defineProperty
to define public class fields. The reason the changelog
states is that “there is an extremely strong chance that public class fields
will be standardized differently” than thought before.
I have to admit, I don’t see the reasoning behind that, so I set
useDefineForClassFields
to false
in the tsconfig.json
. Since I have many
tests and benchmarks, I think it is safe to do so in my case.
This makes the generated code look much more like I would have expected:
class O {
constructor(t, n, r) {
(this._string = n),
(this._offset = 0),
(this._states = S(t)),
(this._tokenFactory = r);
}
[Symbol.iterator]() {
return this;
}
next() {
const t = this.nextToken();
return t == null ? C : { done: !1, value: t };
}
nextToken() {
if (this._offset >= this._string.length) return null;
const t = this._nextMatchOrSyntaxError();
this._offset += t.text.length;
const n = this._tokenFactory.createToken(t);
return (
t.rule.push && this._states.push(t.rule.push),
t.rule.pop && this._states.pop(),
t.rule.next && this._states.next(t.rule.next),
n
);
}
_nextMatchOrSyntaxError() {
try {
return this._states.current.nextMatch(this._string, this._offset);
} catch (t) {
throw t instanceof f ? new Error(this._syntaxError(t)) : t;
}
}
_syntaxError(t) {
const { line: n, column: r } = this._tokenFactory.currentLocation,
o = t.expectedTokenTypes.map((c) => "`" + c + "`").join(", ");
return `Syntax error at ${n}:${r}, expected one of ${o} but got '${t.foundChar}'`;
}
}
Bundle size: 2.01 kb for the class
version and 1.98 kb for the
function
variant. This is a difference of 30 bytes, which I find acceptable
given that the class
performance is better.
Balancing bundle size and performance
Of course there are other classes in my code and I did performance and bundle
size measurements for most of them. For some, I decided to keep the function
variant.
The StateProcessor
for example is instantiated only once when the lexer is
created, not for every parsed document. I didn’t even notice the difference,
because my performance tests don’t take lexer creation into account. It is only
created once and then reused multiple times.
If you want to optimize this hard, you have to play around a lot and choose which variant to use for each occasion. I would however vote for classes most of the time.
Conclusion
The conclusion that I draw from my experiments is:
- Functions that return objects are a bit smaller in code than classes.
- Functions that return objects take longer than instantiating single class.
- As a rule of thumb, if an object has many methods, you should create a class.
- If a class is only instantiated once, or very rarely, you can use the
function
variant.
The largest reduction in bundle-size however was due to setting
useDefineForClassFields
to false
and using _
instead of #
to mark
private fields.
My first version of the lexer was about 3 kb large. Those changes brought it down to 2.01 kb (the final version is a bit larger than that because I introduced some factory functions and thought that this increase is acceptable).
However, my feeling is that the gains are not proportional to the complete
bundle size. So, if your goal is to write minimal libraries (X
in less than
Y kb
size), this may be helpful. But it won’t reduce the size of your app from
3 MB to 2MB.
If you have a large app, and you want to reduce its size, you should use tools like vite-bundle-visualizer to see where all those bytes are coming from. Chances are good that you simply have some large dependencies and solving this is a completely different story.