Project Architecture, Strategy, and Implementation Tips

This document is the high-level map for corsa-bind. It explains what the repository is trying to achieve, which constraints shape the design, how the crates fit together, and what patterns are worth following when extending the project.

If you want measured numbers, go to performance.md. If you want benchmark rationale, go to benchmarking_guide.md. If you want CI and local reproduction details, go to ci_guide.md.

Why This Project Exists

corsa-bind exists to make Corsa usable from Rust and Node.js in production-style workflows without maintaining a fork of upstream.

In practical terms, the repository is trying to provide:

typed Rust bindings for the upstream Corsa API and LSP surfaces
fast transport and orchestration layers for repeated editor-like queries
Node bindings that preserve Rust-side performance while keeping JS and TS authoring ergonomic
a strict and auditable upstream pin so regressions are reproducible

The key idea is simple:

we do not want to reimplement the TypeScript checker
we do want to build a strong systems layer around the real upstream checker

That systems layer is where most of the repository's value lives.

A Note on the Name

corsa-bind is the repository and distribution name for bindings around the Corsa effort: the native TypeScript 7 implementation line tracked here in the typescript-go codebase. The TypeScript roadmap describes this as the JS-based TypeScript 6 line versus the native TypeScript 7 line, and it uses Strada for the original TypeScript codename and Corsa for this effort (TypeScript Native Port: Versioning Roadmap).

We keep corsa for crate, binary, and upstream-facing labels where that matches the implementation surface, instead of the more generic tsc or tsgo labels that are easy to misread in docs, code, and release notes. In practice that means:

use Corsa through the interfaces it already intends consumers to use
track upstream by exact commit so behavior is reproducible and auditable
never maintain a fork and never carry local patches against Corsa upstream
implement hot paths in Rust, keep them zero-cost and high-performance, and expose them to JS through napi-rs so end users can author custom plugins and custom rules in JS/TS

Non-Negotiable Constraints

Several decisions in the codebase look unusual until you read them through the repository's constraints.

1. No Forks, No Patches

This repository intentionally does not patch Corsa upstream.

That means:

behavior claims stay attributable to an exact upstream commit
benchmark wins have to come from transport and orchestration, not private engine modifications
upgrading upstream is work, but it is honest work

This policy is enforced through ref/corsa-upstream, corsa_ref.lock.toml, and corsa_ref.

2. Reproducibility Beats Convenience

The project prefers:

exact upstream pins
exact benchmark reports
exact baseline tests
strict verification of the managed ref

over a looser "mostly works on my machine" development style.

That strictness is deliberate. Without it, regressions in a fast-moving upstream project become very hard to reason about.

3. Workflow Speed Matters More Than Single-Call Glory

corsa sits on top of Corsa. If both are asked to do exactly the same work exactly once, parity is the healthy target.

The realistic win conditions are:

process reuse
snapshot reuse
lower-overhead transport
narrower, targeted queries instead of rerunning a whole CLI command
orchestration across multiple workers

This shapes almost every performance-related choice in the repository.

4. Process Cleanup Is Correctness

The repository treats subprocess cleanup as part of correctness, not as polish.

Why:

leaked workers distort later benchmarks
leaked workers waste resources in editor and CI scenarios
unreaped children become operational debt

This is why process guards and explicit kill-plus-wait logic exist even in code that "only benchmarks" or "only runs tests".

Architectural Overview

At a high level, the repository looks like this:

mermaid

flowchart LR
    A["Rust caller"] --> B["corsa facade"]
    B --> C["corsa_client"]
    B --> D["corsa_lsp"]
    C --> E["corsa_jsonrpc"]
    C --> F["msgpack worker"]
    D --> E
    C --> G["corsa_core"]
    D --> G
    E --> G
    C --> H["Corsa process"]
    D --> H
    I["corsa_orchestrator"] --> C
    I --> D
    J["corsa_node"] --> B
    K["corsa_oxlint"] --> J

The mental model is:

core owns shared primitives and process safety
jsonrpc owns framing and generic request/response machinery
client owns typed API bindings
lsp owns LSP transport and virtual-document overlays
orchestrator owns pooling, caching, and replicated state
runtime keeps async execution lightweight and local to the repository
ref owns upstream pinning and verification
corsa_node and corsa_oxlint expose the Rust engine to JS and TS consumers

Workspace Walkthrough

`corsa_core`

Role:

shared error type
process lifecycle helpers
compact fast-path aliases

Why it exists:

the higher crates need one consistent error surface
process cleanup policy should be implemented once
hot-path data structure choices should be shared, not repeated ad hoc

Touch this crate when:

adding a new cross-cutting error case
adjusting process shutdown behavior
changing low-level shared performance primitives

`corsa_jsonrpc`

Role:

Content-Length frame parsing and writing
request ID and message modeling
thread-backed JSON-RPC connection management

Why it exists:

both the stdio API client and the LSP client need the same protocol machinery
keeping transport generic avoids baking Corsa-specific concepts into the protocol layer

Touch this crate when:

a protocol bug appears in request/response handling
a callback or event routing issue shows up
a transport-level benchmark regression points at JSON-RPC framing or synchronization

`corsa_client`

Role:

typed bindings for the upstream Corsa API surface
support for both async JSON-RPC and sync msgpack stdio transports
snapshot lifecycle management
symbol, type, and relation query methods

Why it exists:

the wire protocol should be close to upstream
consumers should not have to hand-author JSON payloads for every call

Touch this crate when:

adding a new upstream API endpoint
refining Rust-side response or handle modeling
changing transport defaults or behavior

`corsa_lsp`

Role:

Corsa LSP stdio client
virtual document and overlay handling
custom LSP request definitions such as initializeAPISession

Why it exists:

editor-like workflows need in-memory document state, not just on-disk files
LSP and stdio API are related, but not the same integration surface

Touch this crate when:

adding editor-oriented features
debugging overlay or UTF-16 position handling
replicating virtual state through higher-level orchestration

`corsa_orchestrator`

Role:

worker prewarming
round-robin leasing
snapshot caching
result memoization
distributed replicated state for experiments

Why it exists:

this is the layer where workflow-level wins can happen
the repository wants to measure and exploit reuse explicitly

Touch this crate when:

building higher-level services on top of Corsa
adding new caching strategies
experimenting with replicated editor state

`corsa_runtime`

Role:

tiny local block_on
thread-backed spawn
lightweight broadcast channel

Why it exists:

the project wants runtime control without making tokio a transitive architectural commitment
the async needs are focused and local

Touch this crate when:

runtime behavior itself is the problem
a higher crate genuinely needs a new primitive

Do not touch it casually:

every new primitive added here becomes an architectural choice for the whole workspace

`corsa_ref`

Role:

exact upstream lockfile modeling
managed checkout sync and verification

Why it exists:

reproducibility needs a first-class tool, not a convention

Touch this crate when:

the upstream pinning policy changes
CI or local reproduction around ref/corsa-upstream needs stronger guarantees

`corsa`

Role:

facade crate
mock server
benchmark binaries
integration tests against the full workspace

Why it exists:

consumers often want one import surface
repo-level tests and benchmark runners need a home

`src/bindings/nodejs/corsa_node`

Role:

napi-rs binding between Rust and JS

Why it exists:

JS users should be able to use the Rust implementation without losing the performance work done in Rust

Touch this package when:

a Rust capability needs to be surfaced to JS
the JS wrapper API shape should change

`src/bindings/nodejs/corsa_oxlint`

Role:

type-aware JS and TS authoring model similar to typescript-eslint
compatibility layer over the Rust and Node bindings

Why it exists:

end users want to write custom rules in JS and TS, not in Rust
the repository wants the heavy work in Rust while keeping authoring ergonomics high

Touch this package when:

parser services, checker shims, or rule ergonomics need to evolve
parity with upstream or typescript-eslint-style workflows matters

End-to-End Flows

API Flow

Typical path:

Build an ApiSpawnConfig.
Spawn an ApiClient.
Initialize once.
Create or refresh a snapshot.
Resolve projects, symbols, and types against that snapshot.
Close the client or let an orchestrator keep it warm.

Important properties:

ApiClient caches initialize
ManagedSnapshot releases its handle automatically on drop
msgpack keeps the hottest stdio path leaner than JSON-RPC

LSP Flow

Typical path:

Spawn LspClient.
Create an LspOverlay.
Open or synthesize VirtualDocuments.
Apply VirtualChanges in UTF-16 coordinates.
Let the overlay emit didOpen, didChange, and didClose.

Important properties:

overlay state is authoritative for open in-memory documents
UTF-16 handling is centralized in VirtualDocument
virtual-document behavior is intentionally close to editor semantics

Orchestration Flow

Typical path:

Name a worker configuration with ApiProfile.
Prewarm workers for that profile.
Cache snapshots by stable application keys.
Memoize expensive derived results by key and TTL.
Fan work out across multiple workers when parallelism helps.

Important properties:

caching and pooling are explicit, not accidental
workflow speedup comes from reuse and narrower queries
distributed replication mirrors metadata and virtual state, not every live process detail

Node Flow

Typical path:

Build Rust code and napi-rs bindings.
Import @corsa-bind/napi or the higher-level compatibility layer.
Use Rust-backed checker or session behavior from JS and TS.

Important properties:

Rust stays the performance-critical implementation
JS and TS stay the customization surface

Strategy Guide

Why Msgpack Is the Default

ApiSpawnConfig::new() defaults to SyncMsgpackStdio.

That is not because JSON-RPC is wrong. It is because the measured hot paths consistently show that sync msgpack is the better default for:

repeated requests
binary-heavy paths
latency-sensitive editor-like use

JSON-RPC remains valuable because:

it is easier to inspect by eye
it works well with callback-driven flows
it maps naturally onto LSP-style transport thinking

The repository treats JSON-RPC as an important compatibility and debugging path, not as the fastest default.

Why the Runtime Is Custom

The repository currently avoids tokio and similar full runtimes.

Reasons:

the async needs are narrow
stdio and worker-thread usage dominates
owning the runtime primitives keeps the dependency surface and execution model simpler

This should not become dogma. If the project grows into a shape where a larger runtime genuinely simplifies correctness, the repository can revisit the choice. For now, the custom runtime is a deliberate fit, not novelty.

Why Some Payloads Stay Opaque

Some endpoints still return EncodedPayload instead of a fully decoded Rust AST.

That is intentional.

Reasons:

the upstream payload is already structured for Corsa
decoding everything eagerly would add cost and maintenance surface
many consumers only need to round-trip or print the payload

The repository should decode more only when a real consumer and a stable value case exist.

Why the Ref Verification Is Strict

verify_ref insists on:

exact pinned commit
detached HEAD
clean tracked worktree

That strictness keeps several higher-level guarantees valid:

regression baselines stay attributable
benchmark results stay attributable
bug reports can cite a real upstream revision

Weakening this policy would make the repository easier to use casually and harder to trust technically.

Implementation Tips

When Adding a New API Endpoint

Recommended pattern:

Find the upstream endpoint name and wire shape.
Add or reuse a small request struct in requests_*.
Add a typed response struct if the shape deserves one.
Add a method on ApiClient in the appropriate methods_* module.
Prefer names that mirror upstream closely.
Add a mock-server integration test.
Add a real-Corsa regression test if the endpoint matters to compatibility.

Good instincts:

keep wire modeling boring
keep serde names aligned with upstream field names
normalize optional/empty response shapes intentionally, not accidentally

When Adding LSP Features

Recommended pattern:

Decide whether the feature belongs in transport, overlay, or custom request types.
Keep UTF-16 position logic centralized in VirtualDocument.
Preserve real LSP sequencing rules around open/change/close.
Add tests that cover invalid ranges and duplicate lifecycle events.

Good instincts:

treat editor semantics as the truth model
avoid inventing a second overlay abstraction when VirtualDocument can stay the source of truth

When Changing Process Behavior

Checklist:

does the child process get terminated on all paths?
is the child always reaped?
will this affect benchmark correctness?
will this affect long-lived editor or test sessions?

Good instincts:

prefer explicit shutdown paths
treat zombie prevention as part of the feature, not cleanup trivia

When Extending the Orchestrator

Checklist:

what exactly is being reused?
what is keyed by profile?
what is keyed by logical workspace or request?
what is safe to replicate?
what remains process-local?

Good instincts:

cache only what has a clear invalidation story
keep replicated state deterministic
keep live process handles and replicated metadata conceptually separate

When Touching `corsa_oxlint`

Checklist:

does the source-level TS surface still line up with the Rust and Node binding shape?
does vp check still work before build artifacts exist?
are you depending on generated output where source should be enough?

Good instincts:

point source checks at source
treat compatibility code as a real consumer, not as temporary glue

Common Pitfalls

Mistaking Wrapper Wins for Engine Wins

If corsa wins in an editor-style benchmark, that does not mean it out-compiled Corsa. It usually means it:

reused state
avoided startup cost
narrowed the workload

That is still a real win. It is just a different kind of win.

Treating JSON-RPC and Msgpack as Interchangeable Internally

They are equivalent at the conceptual API level. They are not equivalent in:

framing cost
callback handling shape
debugging ergonomics
hot-path performance

Transport choices are part of the design, not just configuration trivia.

Relaxing Upstream Pin Hygiene to Unblock Local Work

It is tempting to "just let verify pass" when the managed ref is dirty. Do not do that.

The better pattern is:

restore accidental drift
pin intentionally when upgrading
keep the lockfile and checkout relationship exact

Over-Decoding Binary Payloads

If a consumer only needs to print or forward a binary node payload, fully decoding it into Rust structures may add complexity without producing value.

Prefer:

opaque payloads first
richer decoding only with a concrete use case

Recommended Development Workflow

For most work:

vp check
cargo test --workspace
if the change touches the real upstream path, run the real-Corsa regression tests
if the change touches performance-sensitive code, run the relevant benchmark layer
if the change touches docs.rs-facing Rust API, run RUSTDOCFLAGS='-D warnings' cargo doc --workspace --no-deps

For upstream pin updates:

sync the managed ref
move it intentionally to the new upstream commit
pin current metadata
rebuild the real Corsa binary
rerun regression tests and benchmarks

How to Read the Repository Efficiently

If you are new to the codebase, this reading order works well:

README.md
this guide
crate roots under src/core/*/src/lib.rs plus src/bindings/rust/corsa/src/lib.rs
corsa_client methods and response types
corsa_lsp overlay and virtual document logic
corsa_orchestrator pooling, state, and Raft code
benchmark runners under src/bindings/rust/corsa/src/bin

If you are debugging performance:

benchmarking_guide.md
performance.md
bench_real_corsa
transport code in client, jsonrpc, and runtime

If you are debugging CI or environment issues:

ci_guide.md
vite.config.ts
corsa_ref
the managed upstream checkout under ref/corsa-upstream

Final Mental Model

The easiest way to reason about corsa is:

Corsa upstream is the compiler engine
this repository is the systems layer around that engine

That systems layer is responsible for:

safe process control
transport quality
typed API ergonomics
editor-style virtual state
worker reuse
benchmark discipline
upstream reproducibility
JS and TS integration ergonomics

If a proposed change improves one of those without violating the repository's core constraints, it is probably moving in the right direction.

# Project Architecture, Strategy, and Implementation Tips

# Why This Project Exists

# A Note on the Name

# Non-Negotiable Constraints

# 1. No Forks, No Patches

# 2. Reproducibility Beats Convenience

# 3. Workflow Speed Matters More Than Single-Call Glory

# 4. Process Cleanup Is Correctness

# Architectural Overview

# Workspace Walkthrough

# corsa_core

# corsa_jsonrpc

# corsa_client

# corsa_lsp

# corsa_orchestrator

# corsa_runtime

# corsa_ref

# corsa

# src/bindings/nodejs/corsa_node

# src/bindings/nodejs/corsa_oxlint

# End-to-End Flows

# API Flow

# LSP Flow

# Orchestration Flow

# Node Flow

# Strategy Guide

# Why Msgpack Is the Default

# Why the Runtime Is Custom

# Why Some Payloads Stay Opaque

# Why the Ref Verification Is Strict

# Implementation Tips

# When Adding a New API Endpoint

# When Adding LSP Features

# When Changing Process Behavior

# When Extending the Orchestrator

# When Touching corsa_oxlint

# Common Pitfalls

# Mistaking Wrapper Wins for Engine Wins

# Treating JSON-RPC and Msgpack as Interchangeable Internally

# Relaxing Upstream Pin Hygiene to Unblock Local Work

# Over-Decoding Binary Payloads

# Recommended Development Workflow

# How to Read the Repository Efficiently

# Final Mental Model

Project Architecture, Strategy, and Implementation Tips

Why This Project Exists

A Note on the Name

Non-Negotiable Constraints

1. No Forks, No Patches

2. Reproducibility Beats Convenience

3. Workflow Speed Matters More Than Single-Call Glory

4. Process Cleanup Is Correctness

Architectural Overview

Workspace Walkthrough

`corsa_core`

`corsa_jsonrpc`

`corsa_client`

`corsa_lsp`

`corsa_orchestrator`

`corsa_runtime`

`corsa_ref`

`corsa`

`src/bindings/nodejs/corsa_node`

`src/bindings/nodejs/corsa_oxlint`

End-to-End Flows

API Flow

LSP Flow

Orchestration Flow

Node Flow

Strategy Guide

Why Msgpack Is the Default

Why the Runtime Is Custom

Why Some Payloads Stay Opaque

Why the Ref Verification Is Strict

Implementation Tips

When Adding a New API Endpoint

When Adding LSP Features

When Changing Process Behavior

When Extending the Orchestrator

When Touching `corsa_oxlint`

Common Pitfalls

Mistaking Wrapper Wins for Engine Wins

Treating JSON-RPC and Msgpack as Interchangeable Internally

Relaxing Upstream Pin Hygiene to Unblock Local Work

Over-Decoding Binary Payloads

Recommended Development Workflow

How to Read the Repository Efficiently

Final Mental Model