C++ machine learning, compiled into your binary.
On-device inference for edge and WebAssembly — no inference server, no GPU, no Python.
Three claims, in order — each one gets less hand-wavy the further you scroll.
uchen::Model kClassifier = Input<28*28> | Linear<128> | Relu | Linear<10> | Logits;Example — an MNIST classifier. The Dots game above is this same framework, compiled to WebAssembly and running in your tab.
01 — Define model at C++ compile time
A library you link with, not an alien ecosystem.
Defined in C++ source
The model is C++, alongside the rest of your project. Change a constant, the architecture updates.
Native dependency
Link the library, include the headers. It's already part of your toolchain.
No heap on the inference path
Memory is known at compile time. Predictable, no surprises at runtime.
SIMD via Google Highway
Portable vectorization in the math core — linear, matrix, and convolution kernels. AVX, AVX2, ARM NEON, and WebAssembly SIMD.
// The dots game's board size is also the model's input shape.
// Change kBoardSize once; the conv input, the policy head, and
// the parameter layout all resize at compile time.
constexpr size_t kBoardSize = 8; // 8×8 board
inline constexpr uchen::Model kPolicy =
uchen::layers::Input<
uchen::convolution::ConvolutionInput<
kBoardChannels, kBoardSize, kBoardSize>> |
uchen::layers::Conv2dWithFilter<32, 3, 3, 1, 1>(
uchen::convolution::Flatten<
uchen::convolution::ReluFilter>()) |
uchen::layers::Linear<kBoardSize * kBoardSize> | // one logit per cell
uchen::layers::Logits;02 — Training is a build step
Small models with the math you actually need. Don't pay for what you don't use, taken to heart.
Layers out of the box
Linear, Conv2d, RNN, activations (ReLU / sigmoid / tanh), Logits, Softmax, Embeddings.
Training core
Backpropagation, SGD, Adam, Kaiming He init, squared loss, cross entropy, Deep Q loss.
Multi-head models
Shared trunk with split heads — policy/value workflows are a first-class composition, not glue code.
MCTS in the box
uchen/learning/mcts. Game-agnostic Oracle interface; deterministic given (oracle, root, config).
// layers::Rnn wraps any pipeline in a recurrent loop,
// threading a fixed-size hidden state across each step
// of an iterable input.
constexpr uchen::Model kNameRnn =
uchen::layers::Rnn<internal::Input, 50>(
uchen::layers::Linear<10> | uchen::layers::Relu |
uchen::layers::Linear<10> | uchen::layers::Relu) |
uchen::layers::Linear<'z' - 'a' + 2> | uchen::layers::Logits;03 — Inference ships with the product
One artifact. No model server, no Python on the target.
No model server
The model is part of the application binary. Nothing to host, route, or page on at 3 a.m.
No GPU pool, no Python
CPU-first execution with SIMD. Pure C++ runtime — nothing to pip-install on the target machine.
One artifact
Native binary or WebAssembly module — same C++ framework, same model definition, same training.
Privacy by deployment shape
Inference stays on-device. No data leaves the user's machine, because there's nothing to send it to.
One model, where you need it
Train your C++ model once, then run it where you need it — WebAssembly in the browser, native on iOS/Android. No rewrite, no conversion, no second model to keep in sync. Same source recompiles where you need it.
- WASM - Browsers and edge
- Compiled to WebAssembly SIMD, runs in the browser or edge workers.
- Native apps (C++)
- Linked into your binary — in-process, no IPC.
- Game engines
- Per-frame inference inside the engine loop.
- Embedded targets
- Fixed memory footprint, no runtime dependency.
- Cloud instances
- Deploy to any environment, CPU is required.
- Privacy-local features
- On-device inference; no data leaves the machine.
Project Status
Plain accounting — what's stable, what's in progress, and what's explicitly out. The framework is scoped on purpose.
In the box, stable
- C++20 runtime + training core (Bazel build)
- Layers: Linear, Conv2d, RNN, activations, Logits, Softmax, Embeddings
- Optimizers: SGD, Adam · losses: squared, cross entropy, Deep Q
- Multi-head models + MCTS
- Optimized for AVX, AVX2, ARM NEON, and WebAssembly SIMD via Google Highway — covers x86, Apple Silicon, ARM Linux, and browsers
In progress
- More demos, driving the requirements
- Generalization of the training pipeline beyond current demos
- More documentation, articles and examples
Not in scope
- GPU backend — focus is on edge and embedded
- Import from ONNX and other model formats
- Opening the code - unable to open source yet due to personal reasons. September 2025 snapshot is on the GitHub.
My notes from building UchenML
C++ machine learning compiled into real products — plus the C++ and performance rabbit holes I fall into around it. Technical, irregular by design.
Double opt-in — I'll send a confirmation email first. Unsubscribe in one click.