Project Architecture, Strategy, and Implementation Tips
This document is the high-level map for corsa-bind.
It explains what the repository is trying to achieve, which constraints shape the design, how the crates fit together, and what patterns are worth following when extending the project.
If you want measured numbers, go to performance.md. If you want benchmark rationale, go to benchmarking_guide.md. If you want CI and local reproduction details, go to ci_guide.md.
Why This Project Exists
corsa-bind exists to make Corsa usable from Rust and Node.js in production-style workflows without maintaining a fork of upstream.
In practical terms, the repository is trying to provide:
- typed Rust bindings for the upstream Corsa API and LSP surfaces
- fast transport and orchestration layers for repeated editor-like queries
- Node bindings that preserve Rust-side performance while keeping JS and TS authoring ergonomic
- a strict and auditable upstream pin so regressions are reproducible
The key idea is simple:
- we do not want to reimplement the TypeScript checker
- we do want to build a strong systems layer around the real upstream checker
That systems layer is where most of the repository's value lives.
A Note on the Name
corsa-bind is the repository and distribution name for bindings around the
Corsa effort: the native TypeScript 7 implementation line tracked here in the
typescript-go codebase. The TypeScript roadmap describes this as the JS-based
TypeScript 6 line versus the native TypeScript 7 line, and it uses Strada for
the original TypeScript codename and Corsa for this effort
(TypeScript Native Port: Versioning Roadmap).
We keep corsa for crate, binary, and upstream-facing labels where that matches
the implementation surface, instead of the more generic tsc or tsgo labels
that are easy to misread in docs, code, and release notes. In practice that
means:
- use Corsa through the interfaces it already intends consumers to use
- track upstream by exact commit so behavior is reproducible and auditable
- never maintain a fork and never carry local patches against Corsa upstream
- implement hot paths in Rust, keep them zero-cost and high-performance, and
expose them to JS through
napi-rsso end users can author custom plugins and custom rules in JS/TS
Non-Negotiable Constraints
Several decisions in the codebase look unusual until you read them through the repository's constraints.
1. No Forks, No Patches
This repository intentionally does not patch Corsa upstream.
That means:
- behavior claims stay attributable to an exact upstream commit
- benchmark wins have to come from transport and orchestration, not private engine modifications
- upgrading upstream is work, but it is honest work
This policy is enforced through ref/corsa-upstream, corsa_ref.lock.toml, and corsa_ref.
2. Reproducibility Beats Convenience
The project prefers:
- exact upstream pins
- exact benchmark reports
- exact baseline tests
- strict verification of the managed ref
over a looser "mostly works on my machine" development style.
That strictness is deliberate. Without it, regressions in a fast-moving upstream project become very hard to reason about.
3. Workflow Speed Matters More Than Single-Call Glory
corsa sits on top of Corsa.
If both are asked to do exactly the same work exactly once, parity is the healthy target.
The realistic win conditions are:
- process reuse
- snapshot reuse
- lower-overhead transport
- narrower, targeted queries instead of rerunning a whole CLI command
- orchestration across multiple workers
This shapes almost every performance-related choice in the repository.
4. Process Cleanup Is Correctness
The repository treats subprocess cleanup as part of correctness, not as polish.
Why:
- leaked workers distort later benchmarks
- leaked workers waste resources in editor and CI scenarios
- unreaped children become operational debt
This is why process guards and explicit kill-plus-wait logic exist even in code that "only benchmarks" or "only runs tests".
Architectural Overview
At a high level, the repository looks like this:
flowchart LR
A["Rust caller"] --> B["corsa facade"]
B --> C["corsa_client"]
B --> D["corsa_lsp"]
C --> E["corsa_jsonrpc"]
C --> F["msgpack worker"]
D --> E
C --> G["corsa_core"]
D --> G
E --> G
C --> H["Corsa process"]
D --> H
I["corsa_orchestrator"] --> C
I --> D
J["corsa_node"] --> B
K["corsa_oxlint"] --> J
The mental model is:
coreowns shared primitives and process safetyjsonrpcowns framing and generic request/response machineryclientowns typed API bindingslspowns LSP transport and virtual-document overlaysorchestratorowns pooling, caching, and replicated stateruntimekeeps async execution lightweight and local to the repositoryrefowns upstream pinning and verificationcorsa_nodeandcorsa_oxlintexpose the Rust engine to JS and TS consumers
Workspace Walkthrough
corsa_core
Role:
- shared error type
- process lifecycle helpers
- compact fast-path aliases
Why it exists:
- the higher crates need one consistent error surface
- process cleanup policy should be implemented once
- hot-path data structure choices should be shared, not repeated ad hoc
Touch this crate when:
- adding a new cross-cutting error case
- adjusting process shutdown behavior
- changing low-level shared performance primitives
corsa_jsonrpc
Role:
Content-Lengthframe parsing and writing- request ID and message modeling
- thread-backed JSON-RPC connection management
Why it exists:
- both the stdio API client and the LSP client need the same protocol machinery
- keeping transport generic avoids baking Corsa-specific concepts into the protocol layer
Touch this crate when:
- a protocol bug appears in request/response handling
- a callback or event routing issue shows up
- a transport-level benchmark regression points at JSON-RPC framing or synchronization
corsa_client
Role:
- typed bindings for the upstream Corsa API surface
- support for both async JSON-RPC and sync msgpack stdio transports
- snapshot lifecycle management
- symbol, type, and relation query methods
Why it exists:
- the wire protocol should be close to upstream
- consumers should not have to hand-author JSON payloads for every call
Touch this crate when:
- adding a new upstream API endpoint
- refining Rust-side response or handle modeling
- changing transport defaults or behavior
corsa_lsp
Role:
- Corsa LSP stdio client
- virtual document and overlay handling
- custom LSP request definitions such as
initializeAPISession
Why it exists:
- editor-like workflows need in-memory document state, not just on-disk files
- LSP and stdio API are related, but not the same integration surface
Touch this crate when:
- adding editor-oriented features
- debugging overlay or UTF-16 position handling
- replicating virtual state through higher-level orchestration
corsa_orchestrator
Role:
- worker prewarming
- round-robin leasing
- snapshot caching
- result memoization
- distributed replicated state for experiments
Why it exists:
- this is the layer where workflow-level wins can happen
- the repository wants to measure and exploit reuse explicitly
Touch this crate when:
- building higher-level services on top of Corsa
- adding new caching strategies
- experimenting with replicated editor state
corsa_runtime
Role:
- tiny local
block_on - thread-backed
spawn - lightweight broadcast channel
Why it exists:
- the project wants runtime control without making
tokioa transitive architectural commitment - the async needs are focused and local
Touch this crate when:
- runtime behavior itself is the problem
- a higher crate genuinely needs a new primitive
Do not touch it casually:
- every new primitive added here becomes an architectural choice for the whole workspace
corsa_ref
Role:
- exact upstream lockfile modeling
- managed checkout sync and verification
Why it exists:
- reproducibility needs a first-class tool, not a convention
Touch this crate when:
- the upstream pinning policy changes
- CI or local reproduction around
ref/corsa-upstreamneeds stronger guarantees
corsa
Role:
- facade crate
- mock server
- benchmark binaries
- integration tests against the full workspace
Why it exists:
- consumers often want one import surface
- repo-level tests and benchmark runners need a home
src/bindings/nodejs/corsa_node
Role:
napi-rsbinding between Rust and JS
Why it exists:
- JS users should be able to use the Rust implementation without losing the performance work done in Rust
Touch this package when:
- a Rust capability needs to be surfaced to JS
- the JS wrapper API shape should change
src/bindings/nodejs/corsa_oxlint
Role:
- type-aware JS and TS authoring model similar to
typescript-eslint - compatibility layer over the Rust and Node bindings
Why it exists:
- end users want to write custom rules in JS and TS, not in Rust
- the repository wants the heavy work in Rust while keeping authoring ergonomics high
Touch this package when:
- parser services, checker shims, or rule ergonomics need to evolve
- parity with upstream or
typescript-eslint-style workflows matters
End-to-End Flows
API Flow
Typical path:
- Build an
ApiSpawnConfig. - Spawn an
ApiClient. - Initialize once.
- Create or refresh a snapshot.
- Resolve projects, symbols, and types against that snapshot.
- Close the client or let an orchestrator keep it warm.
Important properties:
ApiClientcachesinitializeManagedSnapshotreleases its handle automatically on drop- msgpack keeps the hottest stdio path leaner than JSON-RPC
LSP Flow
Typical path:
- Spawn
LspClient. - Create an
LspOverlay. - Open or synthesize
VirtualDocuments. - Apply
VirtualChanges in UTF-16 coordinates. - Let the overlay emit
didOpen,didChange, anddidClose.
Important properties:
- overlay state is authoritative for open in-memory documents
- UTF-16 handling is centralized in
VirtualDocument - virtual-document behavior is intentionally close to editor semantics
Orchestration Flow
Typical path:
- Name a worker configuration with
ApiProfile. - Prewarm workers for that profile.
- Cache snapshots by stable application keys.
- Memoize expensive derived results by key and TTL.
- Fan work out across multiple workers when parallelism helps.
Important properties:
- caching and pooling are explicit, not accidental
- workflow speedup comes from reuse and narrower queries
- distributed replication mirrors metadata and virtual state, not every live process detail
Node Flow
Typical path:
- Build Rust code and
napi-rsbindings. - Import
@corsa-bind/napior the higher-level compatibility layer. - Use Rust-backed checker or session behavior from JS and TS.
Important properties:
- Rust stays the performance-critical implementation
- JS and TS stay the customization surface
Strategy Guide
Why Msgpack Is the Default
ApiSpawnConfig::new() defaults to SyncMsgpackStdio.
That is not because JSON-RPC is wrong. It is because the measured hot paths consistently show that sync msgpack is the better default for:
- repeated requests
- binary-heavy paths
- latency-sensitive editor-like use
JSON-RPC remains valuable because:
- it is easier to inspect by eye
- it works well with callback-driven flows
- it maps naturally onto LSP-style transport thinking
The repository treats JSON-RPC as an important compatibility and debugging path, not as the fastest default.
Why the Runtime Is Custom
The repository currently avoids tokio and similar full runtimes.
Reasons:
- the async needs are narrow
- stdio and worker-thread usage dominates
- owning the runtime primitives keeps the dependency surface and execution model simpler
This should not become dogma. If the project grows into a shape where a larger runtime genuinely simplifies correctness, the repository can revisit the choice. For now, the custom runtime is a deliberate fit, not novelty.
Why Some Payloads Stay Opaque
Some endpoints still return EncodedPayload instead of a fully decoded Rust AST.
That is intentional.
Reasons:
- the upstream payload is already structured for Corsa
- decoding everything eagerly would add cost and maintenance surface
- many consumers only need to round-trip or print the payload
The repository should decode more only when a real consumer and a stable value case exist.
Why the Ref Verification Is Strict
verify_ref insists on:
- exact pinned commit
- detached
HEAD - clean tracked worktree
That strictness keeps several higher-level guarantees valid:
- regression baselines stay attributable
- benchmark results stay attributable
- bug reports can cite a real upstream revision
Weakening this policy would make the repository easier to use casually and harder to trust technically.
Implementation Tips
When Adding a New API Endpoint
Recommended pattern:
- Find the upstream endpoint name and wire shape.
- Add or reuse a small request struct in
requests_*. - Add a typed response struct if the shape deserves one.
- Add a method on
ApiClientin the appropriatemethods_*module. - Prefer names that mirror upstream closely.
- Add a mock-server integration test.
- Add a real-Corsa regression test if the endpoint matters to compatibility.
Good instincts:
- keep wire modeling boring
- keep serde names aligned with upstream field names
- normalize optional/empty response shapes intentionally, not accidentally
When Adding LSP Features
Recommended pattern:
- Decide whether the feature belongs in transport, overlay, or custom request types.
- Keep UTF-16 position logic centralized in
VirtualDocument. - Preserve real LSP sequencing rules around open/change/close.
- Add tests that cover invalid ranges and duplicate lifecycle events.
Good instincts:
- treat editor semantics as the truth model
- avoid inventing a second overlay abstraction when
VirtualDocumentcan stay the source of truth
When Changing Process Behavior
Checklist:
- does the child process get terminated on all paths?
- is the child always reaped?
- will this affect benchmark correctness?
- will this affect long-lived editor or test sessions?
Good instincts:
- prefer explicit shutdown paths
- treat zombie prevention as part of the feature, not cleanup trivia
When Extending the Orchestrator
Checklist:
- what exactly is being reused?
- what is keyed by profile?
- what is keyed by logical workspace or request?
- what is safe to replicate?
- what remains process-local?
Good instincts:
- cache only what has a clear invalidation story
- keep replicated state deterministic
- keep live process handles and replicated metadata conceptually separate
When Touching corsa_oxlint
Checklist:
- does the source-level TS surface still line up with the Rust and Node binding shape?
- does
vp checkstill work before build artifacts exist? - are you depending on generated output where source should be enough?
Good instincts:
- point source checks at source
- treat compatibility code as a real consumer, not as temporary glue
Common Pitfalls
Mistaking Wrapper Wins for Engine Wins
If corsa wins in an editor-style benchmark, that does not mean it out-compiled Corsa.
It usually means it:
- reused state
- avoided startup cost
- narrowed the workload
That is still a real win. It is just a different kind of win.
Treating JSON-RPC and Msgpack as Interchangeable Internally
They are equivalent at the conceptual API level. They are not equivalent in:
- framing cost
- callback handling shape
- debugging ergonomics
- hot-path performance
Transport choices are part of the design, not just configuration trivia.
Relaxing Upstream Pin Hygiene to Unblock Local Work
It is tempting to "just let verify pass" when the managed ref is dirty. Do not do that.
The better pattern is:
- restore accidental drift
- pin intentionally when upgrading
- keep the lockfile and checkout relationship exact
Over-Decoding Binary Payloads
If a consumer only needs to print or forward a binary node payload, fully decoding it into Rust structures may add complexity without producing value.
Prefer:
- opaque payloads first
- richer decoding only with a concrete use case
Recommended Development Workflow
For most work:
vp checkcargo test --workspace- if the change touches the real upstream path, run the real-Corsa regression tests
- if the change touches performance-sensitive code, run the relevant benchmark layer
- if the change touches docs.rs-facing Rust API, run
RUSTDOCFLAGS='-D warnings' cargo doc --workspace --no-deps
For upstream pin updates:
- sync the managed ref
- move it intentionally to the new upstream commit
- pin current metadata
- rebuild the real Corsa binary
- rerun regression tests and benchmarks
How to Read the Repository Efficiently
If you are new to the codebase, this reading order works well:
- README.md
- this guide
- crate roots under
src/core/*/src/lib.rsplussrc/bindings/rust/corsa/src/lib.rs corsa_clientmethods and response typescorsa_lspoverlay and virtual document logiccorsa_orchestratorpooling, state, and Raft code- benchmark runners under
src/bindings/rust/corsa/src/bin
If you are debugging performance:
- benchmarking_guide.md
- performance.md
bench_real_corsa- transport code in
client,jsonrpc, andruntime
If you are debugging CI or environment issues:
- ci_guide.md
vite.config.tscorsa_ref- the managed upstream checkout under
ref/corsa-upstream
Final Mental Model
The easiest way to reason about corsa is:
- Corsa upstream is the compiler engine
- this repository is the systems layer around that engine
That systems layer is responsible for:
- safe process control
- transport quality
- typed API ergonomics
- editor-style virtual state
- worker reuse
- benchmark discipline
- upstream reproducibility
- JS and TS integration ergonomics
If a proposed change improves one of those without violating the repository's core constraints, it is probably moving in the right direction.