← Back to Projects

Chess2FEN

Production computer vision system converting chessboard images to FEN notation using lightweight CNN classifiers deployed on Google Cloud Run.

December 2025 – January 2026 Solo Developer & ML Engineer active
python computer-vision deep-learning onnx fastapi google-cloud production-ml

Technology

  • Python
  • PyTorch
  • ONNX Runtime
  • FastAPI
  • React
  • TypeScript
  • Docker
  • Google Cloud Run
  • GitHub Actions

What it is

A production-ready computer vision API that converts top-down chessboard images into FEN (Forsyth-Edwards Notation) strings using per-square CNN classification. The system achieves 100% exact-match accuracy on clean images with 7-15ms inference latency on CPU, deployed as a serverless REST API with a React frontend.

Why it matters

  • Demonstrates full ML lifecycle from data generation to production deployment at personal scale.
  • Solves real accuracy vs. efficiency tradeoffs: 9 model architectures ranging from 5.8K to 63K parameters.
  • Achieves production-grade robustness (>95% accuracy) under realistic distortions: blur, JPEG artifacts, perspective warps, lighting variations.
  • Implements proper ML systems engineering: model registry, versioned artifacts with SHA256 checksums, ONNX export for cross-platform inference, comprehensive CI/CD with 126 automated tests.

How it works

  • Preprocessing: Tiles input image into 8×8 grid with configurable margin (default: 2% crop per square to eliminate borders), resizes each to 64×64.
  • Batched inference: Processes all 64 squares simultaneously via ONNX Runtime (10x faster than per-square loops), outputs 13-class logits per square (empty, 6 white pieces, 6 black pieces).
  • Sanity checks: Validates board state (exactly one king per side, no pawns on ranks 1/8), repairs low-confidence predictions only when invariants violated.
  • Model registry: JSON-based model index tracks 5 FP32 models with metrics (accuracy, latency, size), supports precision variants (INT8 for edge deployment).
  • Deployment: Docker container (~600MB) on Cloud Run with auto-scaling (0-10 instances), rate limiting (60 req/min), comprehensive monitoring.

Tech

  • ML: PyTorch for training, ONNX Runtime for inference (CPU-optimized, no CUDA dependency at runtime)
  • API: FastAPI with Gunicorn/Uvicorn workers, CORS middleware, rate limiting via in-memory IP tracking
  • Frontend: React 18 + TypeScript + Vite, Tailwind CSS, Framer Motion animations, Lighthouse score 93/100 performance
  • Infrastructure: Google Cloud Run (us-west2), Artifact Registry, Cloud Monitoring with error rate and latency alerts
  • CI/CD: GitHub Actions for tests (pytest, ruff, black), Docker build/push, automated deployment with smoke tests
  • Testing: pytest (126 tests), Playwright (15 UI tests), 100% endpoint coverage
  • Designed and implemented 9 CNN architectures (depthwise separable, cascade, multitask) with squeeze-excite attention.
  • Built complete training infrastructure: synthetic dataset generation (10K images), augmentation pipeline (mixup, cutout, JPEG compression), early stopping on full-board FEN exact match.
  • Engineered inference pipeline with batched ONNX execution, conservative sanity repair (max 4 squares, confidence threshold 0.60), deterministic preprocessing.
  • Deployed production API on Cloud Run with monitoring, rollback procedures, cost optimization.
  • Authored 18+ technical docs covering architecture, model registry, deployment, cost reduction, rollback procedures.
  • Live Demo: chess2fen-api-2qkqblvvma-wl.a.run.app
  • Source: github.com/kklike32/chess2fen

Overview

A personal ML engineering project demonstrating end-to-end ownership of a production computer vision system. Started as a proof-of-concept in December 2024 (v1.0), evolved through production deployment in December 2025 (v2.0), and currently active in UI/UX enhancement phase (v3.0). The system processes chessboard images through a per-square classification pipeline, converting visual board state to machine-readable FEN strings used by chess engines and analysis tools.

The project solves a real problem in chess digitization: accurately recognizing piece positions from photographs or screenshots without requiring specialized hardware or manual annotation. Unlike board detection approaches that require complex perspective correction, this system assumes top-down orthogonal views (common in online chess diagrams and screenshot tools) and focuses on high-accuracy piece classification with minimal inference latency.

Technical Details

Architecture Pattern

Per-square classification pipeline with four decoupled stages:

  1. Preprocessing (12ms): Load image, tile into 8×8 grid using exact square boundaries, apply 2% margin crop to exclude borders, resize each tile to 64×64 RGB. Uses PIL for image ops, NumPy for batching.

  2. Inference (7ms): Batch all 64 crops into [64,3,64,64] tensor, run through ONNX session (CPUExecutionProvider default), output [64,13] logits. Apply softmax to get per-square class probabilities. Class mapping: 0=empty, 1-6=PNBRQK (white), 7-12=pnbrqk (black).

  3. Sanity validation (0.1ms): Check invariants (exactly one K and one k, no pawns on ranks 1/8). If violated and confidence below 0.60 threshold, try top-2 class alternatives for low-confidence squares (max 4 repairs per board to prevent wild rewrites). Prefer fail-loudly over aggressive guessing.

  4. FEN generation (0.1ms): Convert 8×8 grid to FEN piece-placement string via rank-by-rank serialization, compress consecutive empty squares (e.g., “3” for three blanks).

Model Zoo

9 Trained Architectures (v2.0):

ArchitectureParametersAccuracyRobustness†LatencySize
dwsep_se_a075 (default)14.7K100.00%99.99%7.31ms162KB
multitask_dw_a07517.4K100.00%99.92%7.08ms146KB
cascade_tiny39.2K100.00%99.78%7.81ms211KB
dwsep_se_a0507.3K100.00%97.56%6.69ms126KB
nanoconv_d0222K92.69%99.43%6.45ms96KB

†Robustness: Mean accuracy across 7 distortion types (GaussianBlur, JPEGCompression, GaussianNoise, BrightnessContrast, Perspective, Rotate, Clean)

Design Choices:

  • Depthwise separable convolutions: Reduces parameters vs. standard conv by 8-10x (e.g., 3×3 depthwise + 1×1 pointwise replaces 3×3 full conv).
  • Squeeze-Excite blocks: Channel attention mechanism improves accuracy +2-3% with minimal overhead (2 conv layers, <5% parameter increase).
  • Width multiplier: Scales channel counts (α=0.50, 0.75, 1.00) to explore accuracy/speed frontier. α=0.75 selected as default for best robustness-to-size ratio.
  • No residual connections: Input resolution (64×64) too small to benefit from skip connections, direct paths simpler and faster.

Training Pipeline

Synthetic Dataset Generation:

  • Base positions sourced from chess game databases (PGN files), converted to FEN.
  • Render each position via python-chess SVG export, rasterize to 512×512 PNG.
  • Generate 10,000 boards (9,000 train, 1,000 val) with diverse piece configurations.
  • Split recorded in JSON manifests (splits/train.json, splits/val.json) mapping image paths to FEN strings.

Augmentation Strategy:

  • Geometric: RandomResizedCrop (scale 0.8-1.0), small rotation (±5°).
  • Color: ColorJitter (brightness ±0.1, contrast ±0.1, saturation ±0.1, hue ±0.05).
  • Corruption: GaussianBlur (σ=0.5-2.0, p=0.3), JPEGCompression (quality 70-95, p=0.3).
  • Regularization: Mixup (α=0.2 for smooth label mixing), label smoothing (ε=0.1).
  • Normalization: ImageNet mean/std (transfer learning priors, though models trained from scratch).

Training Configuration:

  • Optimizer: AdamW (lr=1e-3 → 1e-5 cosine decay, weight decay=1e-4).
  • Batch size: 128 (all 64 squares from 2 boards).
  • Loss: CrossEntropyLoss with label smoothing.
  • Early stopping: Patience=5 epochs on FEN exact match (not per-square accuracy, product requires full-board correctness).
  • Device: MPS (Apple Silicon) with autocast FP16, fallback to CPU.
  • Typical runtime: 8-12 hours per model (50 epochs with early stop).

Evaluation Metrics:

  • square_acc_clean: Per-square accuracy on clean validation set (no distortions).
  • fen_exact_clean: Full-board FEN exact match (primary metric for early stopping).
  • acc_mean_dist: Mean accuracy across 7 distortion types (robustness indicator).
  • latency_cpu_ms: Batched 64-crop inference time on M4 Mac CPU (1000 runs, warmup excluded).

ONNX Export & Optimization

Export Pipeline:

  • Train in PyTorch, export via torch.onnx.export(opset_version=17).
  • Models output either [batch,13] or [batch,13,1,1] (architecture-dependent), inference code normalizes to [batch,13].
  • SHA256 checksums computed for all artifacts, stored in model cards for reproducibility.

Quantization (INT8):

  • Static quantization via ONNX Runtime: collect calibration data (512 samples from train set), quantize weights and activations to INT8.
  • Current status: INT8 models disabled (loading issues in production, 12/2025). Planned fix in v3.0 with updated ONNX Runtime.
  • Expected gains: 50-75% size reduction (162KB → 50KB for dwsep_se_a075), minimal accuracy loss (<0.5%).

Production API

FastAPI Application (api/app.py):

Endpoints:

  • GET /health: Returns status, loaded model count, runtime version, git commit.
  • GET /models: Lists all models from registry with metadata.
  • POST /infer: Accepts multipart image file + optional model parameter, returns FEN + confidence + timing.

Middleware:

  • CORS: Configurable origins (env: CHESS2FEN_ALLOWED_ORIGINS), allows credentials, all methods/headers.
  • Rate limiting: In-memory IP-based tracker (60 req/min per IP, 60s window). Returns 429 on exceed. Works with Cloud Run’s X-Forwarded-For header.

Input Validation:

  • Max upload size: 10MB (env: CHESS2FEN_MAX_UPLOAD_MB).
  • Max image dimensions: 8192×8192 pixels (env: CHESS2FEN_MAX_IMAGE_PIXELS).
  • Allowed formats: JPEG, PNG (validated via PIL).
  • EXIF stripping: Automatically removes metadata to prevent exploits.

Error Handling:

  • 400: Invalid file format, oversized image, corrupted file.
  • 404: Model not found in registry.
  • 422: Missing required parameters (file).
  • 429: Rate limit exceeded.
  • 500: Internal server error (model loading failure, ONNX runtime crash).
  • 503: Service unavailable (models not loaded at startup).

Lifespan Management:

  • Startup: Load model registry, validate index.json, cache default model session, log config.
  • Shutdown: Clear session cache, log graceful exit.

Cloud Run Deployment

Container Optimization:

  • Base image: python:3.11-slim (minimal Debian).
  • Production dependencies: onnxruntime, fastapi, uvicorn, gunicorn, pillow, numpy, python-chess (total ~200MB).
  • Final image size: ~600MB.

Container Configuration:

  • Non-root user: chess2fen (security best practice).
  • Health check: HTTP GET /health every 30s, 10s timeout.
  • Entry point: gunicorn api.app:app --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000 --workers 2.

Cloud Run Service:

  • Region: us-west2 (Los Angeles, low latency for US West Coast users).
  • CPU: 2 vCPU.
  • Memory: 2 GiB.
  • Concurrency: 80 requests/container.
  • Auto-scaling: 0-10 instances (scale-to-zero enabled, cold start ~200ms).
  • Timeout: 30s per request.

Monitoring & Operations

Cloud Monitoring Dashboards:

  • Request count by status code (200, 4xx, 5xx).
  • Latency distribution (P50, P95, P99).
  • Container CPU and memory utilization.
  • Model inference timing (preprocess, inference, postprocess breakdowns).

Alerts:

  • Error rate >5% sustained for 5 minutes → email notification.
  • P95 latency >500ms sustained for 5 minutes → email notification.

Rollback Procedures:

  • Automated: GitHub Actions tracks previous image tag, script scripts/rollback_deployment.sh reverts to last known good.
  • Manual: gcloud run services update-traffic --to-revisions=<previous_revision>=100.
  • Validation: Smoke tests after every deployment (health check, inference on test image, FEN validation).

React Frontend (v3.0)

Stack:

  • React 18 + TypeScript (strict mode) for type safety.
  • Vite 7.3 for fast HMR and optimized production builds.
  • Tailwind CSS with custom design system (8px grid, blue/purple gradients, dark/light themes).
  • Framer Motion for 60fps animations (drag-drop feedback, confetti on success).
  • react-chessboard for interactive board visualization (FEN string → rendered board).

Features:

  • Drag-and-drop file upload with preview thumbnail and size display.
  • Model selector with stats cards (accuracy, speed, size badges).
  • Progress steps indicator (Upload → Detect → Process → Results).
  • FEN output with copy-to-clipboard (animated feedback).
  • Confidence heatmap (8×8 grid, color-coded by softmax probability).
  • Performance metrics display (preprocessing, inference, postprocessing timing).
  • Responsive design (mobile-first, tablet and desktop breakpoints).

Performance:

  • Lighthouse score: 93/100 performance, 100/100 accessibility.
  • Bundle size: 125KB (gzipped), code splitting via React.lazy.
  • First Contentful Paint: <1s.
  • Playwright tests: 15/15 passing (theme toggle, file upload, responsive layout).

CI/CD Pipeline

GitHub Actions Workflows:

Tests (ci.yml):

  • Trigger: Every push, every PR.
  • Steps: Install deps (uv pip), run pytest (126 tests), ruff linting, black formatting check.
  • Fail conditions: Any test failure, linting errors, formatting violations.

Deployment (deploy-cloudrun.yml):

  • Trigger: Push to main branch.
  • Steps:
    1. Checkout code.
    2. Build Docker image: gcloud builds submit --tag us-west2-docker.pkg.dev/chess2fen/chess2fen/api:latest.
    3. Deploy to Cloud Run: gcloud run deploy chess2fen-api --image <tag> --region us-west2.
    4. Run smoke tests: health check, inference on test fixture, FEN validation.
    5. Cleanup old images: Keep only latest image to minimize storage costs.
    6. Send Slack notification (success/failure).

Pre-commit Hooks:

  • black (format code).
  • ruff (lint and auto-fix).
  • pytest (run tests locally before push).

Testing Strategy

Unit Tests (pytest, 126 tests):

  • test_infer_api.py (18 tests): Core inference, preprocessing, ONNX session loading.
  • test_model_registry.py (8 tests): Registry loading, validation, model lookup.
  • test_fen_utils.py (15 tests): FEN parsing, grid conversion, validation.
  • test_sanity.py (12 tests): Invariant checks, repair logic, confidence thresholding.
  • test_calibration.py (12 tests): Confidence calibration, ECE (Expected Calibration Error).

Integration Tests (pytest, 32 tests):

  • test_api_integration.py (32 tests): All endpoints, error handling, CORS headers, response schemas.
  • test_random_inference.py (13 tests): End-to-end with real images from test fixtures (8 boards, 256KB dataset).

UI Tests (Playwright, 15 tests):

  • Theme toggle (dark/light mode).
  • File upload (drag-drop, preview, remove).
  • Responsive design (mobile, tablet, desktop).
  • Navigation (GitHub link opens in new tab).
  • Error states (API unavailable, invalid file).

Test Fixtures:

  • Small dataset committed to git: data/test_fixtures/ (8 JPEG images, 256KB).
  • Used by CI when full data/train/ unavailable (prevents 10GB dataset download).
  • Representative board positions (starting position, endgame, complex middlegame).

Code Quality

Tooling:

  • black: Opinionated code formatter (88 char line length, PEP 8 compliance).
  • ruff: Fast linter combining flake8, isort, pyupgrade (selects: E, W, F, I, B, C4).
  • mypy: Static type checker (optional, not enforced in CI due to untyped third-party deps).

Configuration (pyproject.toml):

  • [tool.black]: line-length=88, target-version=py311.
  • [tool.ruff]: line-length=88, select=[E,W,F,I,B,C4], ignore=[E501] (black handles line length).

Pre-commit Checklist (Enforced):

  1. Format code: black src/ tests/ scripts/ api/.
  2. Lint and fix: ruff check --fix src/ tests/ scripts/ api/.
  3. Run tests: pytest tests/ -v.
  4. Verify imports: No unused imports, correct ordering.

Security Hardening

Input Validation:

  • File size limit: 10MB (prevents DoS via large uploads).
  • Pixel limit: 8192×8192 (prevents memory exhaustion).
  • Content-type validation: Reject non-image MIME types.
  • EXIF stripping: Remove metadata to prevent exploits (e.g., embedded scripts).

Container Security:

  • Non-root user: chess2fen (UID/GID 999).
  • No shell access in container (CMD runs Python directly).
  • Minimal attack surface (no SSH, no unnecessary packages).

API Security:

  • Rate limiting: 60 req/min per IP (prevents brute force, scraping).
  • CORS restrictions: Whitelist only trusted origins (env: CHESS2FEN_ALLOWED_ORIGINS).
  • No authentication: Public API, but rate-limited to prevent abuse.
  • Inference timeout: 30s per request (prevents hanging on malicious inputs).

Learning Outcomes

This project demonstrates practical experience with:

  • End-to-end ML systems: Data generation, model training, hyperparameter tuning, evaluation, deployment, monitoring. Owned entire lifecycle.

  • Computer vision architecture design: Implemented 9 CNN variants (depthwise separable, cascade, multitask), explored squeeze-excite attention, width multipliers, and per-square classification strategies.

  • ONNX optimization: Cross-platform inference via ONNX Runtime, static quantization (INT8), session caching, batched execution for 10x speedup vs per-square loops.

  • Production ML deployment: Dockerized FastAPI on Google Cloud Run with auto-scaling, rate limiting, comprehensive monitoring, rollback procedures, cost optimization.

  • Model registry systems: JSON-based registry with versioned artifacts, SHA256 checksums, model cards tracking metrics and provenance, API discovery endpoint.

  • Testing rigor: 126 pytest unit/integration tests, 15 Playwright UI tests, 100% endpoint coverage, deterministic fixtures, CI/CD with pre-commit hooks.

  • Cost engineering: Reduced container from 2.5GB → 600MB by removing PyTorch (training-only dependency), implemented automated Docker image cleanup saving cost.

  • Frontend development: React 18 + TypeScript SPA with Tailwind CSS, Framer Motion animations, responsive design, Lighthouse 93/100 performance, Playwright-tested.

  • Documentation discipline: 18+ technical docs covering architecture, API, model registry, training, deployment, cost reduction, rollback procedures, version history. Maintained through 3 major versions (v1.0 → v2.0 → v3.0).

  • Production operational patterns: Lifespan management (startup/shutdown hooks), structured logging, error handling with appropriate HTTP status codes, CORS configuration, health checks, graceful degradation.