Benchmarking Principles, Concepts, and Tips
This document explains how corsa-bind thinks about performance work.
It is intentionally more conceptual than performance.md, which is the place for commands and measured numbers.
For CI structure, local reproduction, and troubleshooting, see ci_guide.md.
Why This Exists
corsa-bind wraps the Corsa upstream.
That creates an important constraint:
- if
corsa-bindand the Corsa CLI do exactly the same work,corsa-bindshould usually aim for parity, not miracles - the realistic place to win is the end-to-end workflow, not the compiler engine itself
That is why this repository keeps benchmark layers separate. We want to answer different questions with different tools instead of forcing one number to mean everything.
Core Principles
1. No Forks, No Patches, No Fake Wins
corsa-bind follows a strict upstream policy:
- use upstream-supported Corsa entry points
- pin an exact upstream commit
- do not patch
ref/corsa-upstream
That matters for benchmarking. If we changed upstream locally, every performance claim would become harder to trust.
2. Separate Engine Speed from Wrapper Speed
There are two different questions:
- How fast is the underlying engine and transport?
- How fast is the actual user workflow we are building around it?
Those are not the same question.
Examples:
- Corsa CLI vs
tscis an engine and compiler CLI comparison msgpackvsjsonrpcis a transport comparisoncorsawarm workflow vs Corsa CLI--noEmitis an orchestration comparison
If those get mixed together, conclusions become misleading very quickly.
3. Cold and Warm Behavior Must Be Measured Separately
Cold runs include process startup, initialization, config loading, and project open cost. Warm runs ask a different question: what happens after we already paid those setup costs?
For corsa, warm behavior is especially important because session reuse is one of the main reasons to exist.
4. Apples-to-Apples First, Then Product Reality
We intentionally keep two layers:
- an apples-to-apples CLI comparison for the same project input
- a product-shaped workflow comparison for editor-like usage
The first keeps us honest. The second tells us whether orchestration is actually worth building.
5. Cleanup Is Part of Correctness
A benchmark runner that leaks child processes is not production-ready. It can distort later measurements, waste resources, and make failures harder to debug.
This repository therefore treats process cleanup as part of benchmark correctness, not as an optional nicety.
Benchmark Layers
Native Runner
The native runner is bench_real_corsa.
Its purpose:
- measure the Rust client directly against the real pinned Corsa binary
- compare transports such as
msgpackandjsonrpc - inspect hot paths like
updateSnapshot,getSourceFile, and type queries
This is the main source of truth for transport-level questions.
Tooling Runner
The tooling runner is bench_tooling_compare.
It has two workloads:
project_checkeditor_workflow
project_check compares:
tsc- Corsa CLI
typescript-eslint
on the same dataset and the same effective project configuration.
editor_workflow compares a realistic corsa session:
- open project once
- reuse a live session
- run a representative chain of symbol and type queries
This is the layer that answers whether orchestration actually changes the user-facing speed story.
Key Concepts
Why corsa Cannot Reliably Beat Corsa on Identical Work
If a wrapper talks to the same engine and asks it to do the same work, it usually inherits:
- the same parsing cost
- the same type-checking cost
- the same project graph cost
So the healthy target is:
- same work: roughly equal to the Corsa CLI, maybe with small overhead
- different workflow: potentially faster if orchestration avoids redundant work
That distinction is the heart of the benchmarking model.
Where corsa Can Win
The realistic win conditions are:
- keep the process alive
- initialize once and amortize setup cost
- prefer the faster transport
- avoid reopening the project for every query
- turn one big CLI-shaped operation into a sequence of smaller targeted queries
That is why editor_workflow exists.
It measures the class of work where corsa can reasonably outperform rerunning a whole CLI command.
Why typescript-eslint Is Still Useful in Comparisons
typescript-eslint is not the same workload as a compiler CLI check.
It performs typed linting, not just compilation.
Even so, it is still a useful reference point because it represents a real type-aware developer workflow on the same codebase. The comparison should be read as:
- "How expensive is type-aware linting on this project?"
not as:
- "This is the same thing as Corsa CLI
--noEmit."
Why Overlay tsconfig Files Exist
The tooling runner generates temporary overlay tsconfig files.
This is done for fairness and reproducibility.
The overlays are used to:
- add
customConditions: ["@typescript/source"] - keep the base project configuration intact
- avoid editing tracked upstream files
There is one subtle but important implementation detail:
- the overlays are created under
ref/corsa-upstream/.cache/..., not under the repository root.cache
This keeps TypeScript's node module resolution behavior aligned with the upstream workspace, especially for packages like @types/node.
Implementation Walkthrough
bench_real_corsa
Main files:
Flow:
- Parse CLI arguments and choose datasets.
- Load real project metadata from the pinned Corsa CLI.
- Run cold and warm scenarios.
- Collect samples into
Stats. - Emit human-readable tables and machine-readable JSON.
Important design choices:
- warm scenarios perform one untimed call before sampling
- datasets are measured against real
tsconfigfiles from the pinned upstream checkout - symbol/type benchmarks discover a real identifier from the dataset instead of relying on a fake fixture
--profileadds per-method phase samples forserialize_params,transport,deserialize_response, and binary decoding so transport-vs-wrapper costs can be separated quickly
bench_tooling_compare
Main files:
Flow:
- Load datasets through the real pinned Corsa.
- Build temporary overlay
tsconfigfiles. - Run
tsc, Corsa CLI,typescript-eslint,corsa-oxlint, bareoxlint, andtsgolintas child processes forproject_check. - Run a live
corsamsgpack session foreditor_workflow. - Emit timing tables and JSON.
Important design choices:
- the default tooling dataset is the pinned upstream
native-previewpackage, which currently completes a clean Corsa CLI project check typescript-eslint,corsa-oxlint, bareoxlint, andtsgolintare allowed to exit with code1because lint findings are expected and should not invalidate timingtypescript-eslint,corsa-oxlint, andtsgolintuse the overlapping type-aware rule set currently exported fromcorsa-oxlint/rules- bare
oxlintintentionally runs without the Corsa JS plugin or type-awaretsgolintbridge, so it is a raw Oxlint process baseline rather than the same rule workload - child processes run with timeouts
- child stdout and stderr are suppressed during timing so the measurement focuses on the actual workload
Process Cleanup and Safety
The core cleanup utilities live in process.rs.
Key helpers:
wait_for_child_exitterminate_child_processAsyncChildGuard
The cleanup policy is:
- try graceful shutdown first when the API supports it
- if the process does not exit in time, kill it
- always reap it with
wait
This avoids leaving zombie processes behind.
The msgpack worker also follows the same policy via msgpack_worker.rs.
Tips
Pick the Right Benchmark for the Question
Use:
bench_real_corsafor transport and API-path questionsbench_tooling_comparefor CLI parity and orchestration questions- Node benchmarks for JS binding overhead and consumer-facing Node workflows
Do not use one benchmark layer to answer a different layer's question.
Read Workflow Numbers Carefully
If corsa beats the Corsa CLI in editor_workflow, it does not mean the wrapper is faster than the engine.
It means the wrapper avoided redundant work by reusing state and narrowing the workload.
That is a good outcome, but it is a different claim.
Treat High Variance as a Signal
If p95, p99, or cv% are high:
- rerun on a quieter machine
- increase warmup or sample count
- check for background work
- check for process leaks or repeated setup costs
Variance often teaches more than the median.
Keep Setup Reproducible
For tooling benchmarks, install the exact comparison dependencies first:
vp run -w bench_tooling_setup
That keeps typescript, eslint, typescript-eslint, and oxlint-tsgolint
pinned for the comparison runner. Bare oxlint comes from the pinned
corsa-oxlint package dependency.
Never Forget Snapshot and Client Cleanup
If you extend the runners or build new workflows:
- release managed snapshots
- close clients explicitly
- do not rely only on
Drop
Drop is a safety net.
The primary path should still be explicit cleanup.
Prefer Real Datasets over Toy Fixtures
Toy fixtures are useful for focused regression tests. They are not enough for performance claims.
The current benchmarks intentionally run on real projects from the pinned upstream checkout because:
- module resolution matters
- project references matter
- file count and file size matter
- hot paths can look very different on real code
Keep Claims Narrow and Honest
Good claim:
- "
corsawarm editor workflow is faster than rerunning Corsa CLI--noEmiton the same project."
Bad claim:
- "
corsais faster than Corsa."
The first says what was actually measured. The second overstates what the data means.