<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Ariel Rajmaliuk's Substack]]></title><description><![CDATA[I write stuff here]]></description><link>https://blog.rajmaliuk.com</link><image><url>https://substackcdn.com/image/fetch/$s_!uU2-!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54f8d315-8f9a-4631-8a4b-812dee194710_512x512.png</url><title>Ariel Rajmaliuk&apos;s Substack</title><link>https://blog.rajmaliuk.com</link></image><generator>Substack</generator><lastBuildDate>Mon, 20 Apr 2026 20:28:28 GMT</lastBuildDate><atom:link href="https://blog.rajmaliuk.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Ariel Rajmaliuk]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[rajmaliuk@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[rajmaliuk@substack.com]]></itunes:email><itunes:name><![CDATA[Ariel Rajmaliuk]]></itunes:name></itunes:owner><itunes:author><![CDATA[Ariel Rajmaliuk]]></itunes:author><googleplay:owner><![CDATA[rajmaliuk@substack.com]]></googleplay:owner><googleplay:email><![CDATA[rajmaliuk@substack.com]]></googleplay:email><googleplay:author><![CDATA[Ariel Rajmaliuk]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[The Algorithmic Battlefield: A Technical Breakdown of AI Systems Reshaping Modern Warfare]]></title><description><![CDATA[From edge inference on 15W chips to multi-agent reinforcement learning for swarm coordination &#8212; what&#8217;s actually being deployed, what&#8217;s still unsolved, and where the engineering opportunities are.]]></description><link>https://blog.rajmaliuk.com/p/the-algorithmic-battlefield-a-technical</link><guid isPermaLink="false">https://blog.rajmaliuk.com/p/the-algorithmic-battlefield-a-technical</guid><dc:creator><![CDATA[Ariel Rajmaliuk]]></dc:creator><pubDate>Mon, 16 Mar 2026 15:52:14 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!uU2-!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54f8d315-8f9a-4631-8a4b-812dee194710_512x512.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The modern battlefield is undergoing a systems-level architecture change. Not a UI refresh &#8212; a full rewrite. The monolithic, cloud-dependent, human-in-every-loop model of military operations is being replaced by distributed, edge-native, increasingly autonomous systems that process sensor data locally, make decisions under uncertainty, and coordinate without centralized control.</p><p>This post breaks down the core technical systems driving that shift, the hard engineering problems that remain unsolved, and the specific layers of the stack where startups can build defensible products.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.rajmaliuk.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Ariel Rajmaliuk's Substack is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>-----</p><p>## 1. Edge Inference: The Fundamental Constraint</p><p>The single most important technical challenge in military AI isn&#8217;t model quality &#8212; it&#8217;s *where the model runs*. Commercial AI assumes persistent cloud connectivity, low latency, and unlimited power. Battlefield environments offer none of those things.</p><p>Modern tactical AI systems must operate in DDIL environments: Denied, Disrupted, Intermittent, and Limited connectivity. GPS is jammed. Satellite links are targeted. Electronic warfare blankets entire frequency ranges. A drone that loses its comms link to the cloud under a traditional architecture becomes an inert projectile.</p><p>**The hardware stack that&#8217;s emerging:**</p><p>Edge inference is converging on a specific class of hardware &#8212; compact AI accelerators that combine CPUs, GPUs, and dedicated neural processing units (NPUs) into tightly integrated, low-power modules optimized for inference. The NVIDIA Jetson Orin Nano has become the de facto platform for drone-mounted AI, delivering 40 TOPS of compute at just 15W. That&#8217;s enough to run real-time YOLOv11 object detection at ~5 FPS while leaving headroom for path planning and sensor fusion. Thermal management is essentially free &#8212; propeller airflow handles cooling.</p><p>But SWaP (Size, Weight, and Power) constraints are ruthless. Military-grade edge compute must fit inside airframes measured in centimeters, run on batteries with finite capacity, and survive vibration, temperature extremes, and electromagnetic interference. This creates a hard optimization problem: model architecture selection, quantization strategy (INT8, INT4, binary), pruning depth, and hardware-model co-design all become first-order engineering decisions.</p><p>**The software architecture pattern:**</p><p>The pattern that&#8217;s winning is hybrid edge-cloud with graceful degradation:</p><p>- **Core autonomy stack runs entirely on-device:** Perception (object detection, tracking, terrain classification), navigation (SLAM, visual-inertial odometry, obstacle avoidance), and pre-authorized decision logic.</p><p>- **Cloud/HQ offload for non-critical tasks:** Mission reporting, natural language summarization via LLMs, fleet-wide learning updates. These are nice-to-have, not mission-critical.</p><p>- **Graceful degradation on link loss:** When comms drop, the drone doesn&#8217;t stop &#8212; it falls back to pre-loaded mission parameters, threat libraries, and locally cached maps. Think of it like a submarine: computationally self-sufficient, capable of completing objectives without any external input.</p><p>This is a meaningful departure from how most AI systems are architected in the commercial world, and it&#8217;s a greenfield opportunity for startups that understand offline-first, edge-native design.</p><p>**Startup opportunity:** Purpose-built inference runtimes optimized for military SWaP constraints. The TensorRT / ONNX Runtime / TFLite stack wasn&#8217;t designed for contested environments. There&#8217;s room for runtimes that handle model hot-swapping in the field, encrypted model weights with hardware-backed attestation, and deterministic latency guarantees under thermal throttling.</p><p>-----</p><p>## 2. Computer Vision Pipelines: From Pixels to Kill Chains</p><p>The perception layer is where most of the deployed AI lives today. The core pipeline running on Ukrainian drones and their Western-supplied counterparts looks something like this:</p><p>**Detection &#8594; Tracking &#8594; Classification &#8594; Geolocation &#8594; Targeting**</p><p>Each stage has distinct engineering challenges:</p><p>**Detection** uses variants of YOLO (currently v11 in deployed systems) running on edge hardware. The key constraint isn&#8217;t accuracy on benchmarks &#8212; it&#8217;s robustness to real-world degradation: smoke, dust, rain, IR countermeasures, camouflage, and adversarial conditions. Models trained on clean datasets catastrophically underperform in combat.</p><p>**Tracking** is where things get interesting. Single-object trackers (KCF, MOSSE) are lightweight but fragile. Multi-object tracking (MOT) approaches like ByteTrack or OC-SORT provide better persistence across occlusions but cost more compute. On a 15W edge device processing live video, every extra millisecond of tracking latency is a tradeoff against detection refresh rate.</p><p>**Last-mile autonomous guidance** is the critical capability that&#8217;s changing kill rates. Ukrainian forces report that AI-enabled last-mile navigation &#8212; where the drone locks onto a target via onboard computer vision and guides itself through the final ~800 meters without any operator input or data link &#8212; raises hit rates from 10-20% to 70-80%. This single capability neutralizes electronic warfare jamming, which is the primary drone countermeasure on both sides.</p><p>The technical implementation: the drone&#8217;s CV model captures and tracks the target using onboard inference, then a PID or model-predictive controller adjusts flight path to maintain lock through terminal approach. No comms link needed. No GPS needed. Just a camera, an accelerometer, and ~5W of compute.</p><p>**Where the training data problem lives:**</p><p>Ukraine just opened access to millions of annotated frames from active combat &#8212; arguably the richest military computer vision dataset ever assembled. But the data engineering challenge is enormous: heterogeneous formats from hundreds of drone types, inconsistent labeling quality across thousands of operators, domain shift between seasons/terrain/weather, and adversarial adaptation by the enemy (camouflage, decoys, civilian-vehicle attacks).</p><p>**Startup opportunity:** Military-grade annotation and data pipeline infrastructure. Think: automated labeling with active learning loops, domain adaptation tooling for sim-to-real transfer, and secure federated learning systems that let allies train on shared data without exposing raw imagery. The &#8220;Snowflake for defense CV data&#8221; doesn&#8217;t exist yet.</p><p>-----</p><p>## 3. Swarm Coordination: The Multi-Agent Problem</p><p>Individual autonomous drones are a solved-enough problem. The next technical frontier is *N* drones acting as a coherent system &#8212; and this is where the hardest open problems in AI intersect with real-world deployment constraints.</p><p>**The architecture: Centralized Training, Decentralized Execution (CTDE)**</p><p>The dominant paradigm in multi-agent reinforcement learning for swarms is CTDE: train a global policy using centralized information (full state, all agent observations), then deploy a decentralized version where each agent acts only on local observations. Key algorithms in production and research:</p><p>- **MAPPO (Multi-Agent Proximal Policy Optimization):** The workhorse. Stable training, good sample efficiency, handles cooperative and competitive settings. Used for task allocation, formation control, and adversarial engagement.</p><p>- **MADDPG (Multi-Agent Deep Deterministic Policy Gradient):** Better for continuous action spaces (flight control), but less stable at scale.</p><p>- **Hierarchical RL (HRL):** Army Research Lab work on decomposing swarm control into group-level micro control and swarm-level macro control. Reduces learning time by 80% vs. centralized approaches with only 5% optimality loss. This is the pattern that will scale to hundreds of agents.</p><p>**The unsolved engineering challenges:**</p><p>*Partial observability.* In reality, each drone sees a fraction of the battlefield. Communication between agents is intermittent and bandwidth-constrained. You can&#8217;t share full state. You&#8217;re operating in a POMDP (Partially Observable Markov Decision Process), and the observation space is noisy &#8212; sensor drift, occlusion, adversarial spoofing.</p><p>*Sim-to-real transfer.* Policies trained in simulation break in the real world. Physics engines don&#8217;t capture turbulence, sensor noise, or electromagnetic interference accurately. Zero-shot sim-to-real transfer has been demonstrated for small formations (Batra et al., 2022 &#8212; quadrotor pursuit-evasion), but scaling to 50+ heterogeneous agents in contested airspace remains an open problem.</p><p>*Communication protocol design.* Swarm agents need to share enough information to coordinate without saturating limited bandwidth. This intersects with mesh networking, dynamic topology management, and anti-jamming frequency hopping. A swarm that can&#8217;t communicate degrades to N independent agents &#8212; better than nothing, but far from optimal.</p><p>*Heterogeneous agent coordination.* Real swarms aren&#8217;t homogeneous. You might have recon drones, strike drones, EW drones, and relay drones in the same formation. Each has different dynamics, sensors, and objectives. Multi-agent RL for heterogeneous systems (like the HMDRL-UC approach using separate MAPPO for cluster heads and IPPO for cluster members) is an active research area with minimal production deployment.</p><p>**Startup opportunity:** Swarm simulation environments with high-fidelity EW modeling, turnkey CTDE training pipelines that handle heterogeneous agent types, and mesh networking stacks purpose-built for adversarial RF environments. Also: formal verification tools for swarm policies &#8212; how do you prove a swarm won&#8217;t exhibit emergent behavior that violates rules of engagement?</p><p>-----</p><p>## 4. Sensor Fusion and the Data Integration Problem</p><p>A modern autonomous system doesn&#8217;t rely on a single sensor. The full stack includes:</p><p>- **EO/IR cameras** (visible + thermal imaging)</p><p>- **LiDAR** (terrain mapping, obstacle detection)</p><p>- **Radar** (all-weather detection, velocity measurement)</p><p>- **RF sensors** (electronic warfare detection, signal intelligence)</p><p>- **IMU + barometric altimeters** (inertial navigation when GPS is denied)</p><p>- **Acoustic sensors** (drone detection, gunfire localization)</p><p>Fusing these into a coherent world model is a hard engineering problem. The standard approach is Bayesian sensor fusion &#8212; typically extended Kalman filters (EKF) or particle filters for state estimation &#8212; but deep learning-based fusion architectures are gaining ground, particularly for combining 2D image data with 3D point clouds.</p><p>**The key technical challenge is temporal alignment and conflicting modalities.** An IR sensor might detect a heat signature where the EO camera sees nothing (camouflage). A radar return might indicate a vehicle where LiDAR shows empty terrain (corner reflectors / decoys). The fusion system needs to reason about sensor reliability, environmental conditions, and potential adversarial manipulation &#8212; not just average the inputs.</p><p>**At the platform level**, the bigger challenge is JADC2 (Joint All-Domain Command and Control): connecting sensors and shooters across *all* military services into a single data mesh. This is essentially a distributed systems problem at continental scale &#8212; event-driven architectures, pub/sub messaging, data serialization standards (the military equivalent of choosing between Protobuf and Avro), and latency-aware routing through heterogeneous networks.</p><p>Anduril&#8217;s Lattice OS is the most mature attempt at this &#8212; a middleware layer that ingests data from arbitrary sensor types, runs AI-powered threat classification, and routes actionable intelligence to the right effector. Think of it as Kafka + a real-time inference engine + a targeting system, deployed across air, land, sea, and space.</p><p>**Startup opportunity:** Modular sensor fusion SDKs that handle heterogeneous input types with plug-and-play drivers. Middleware for cross-platform data interoperability (the F-22 and F-35 literally can&#8217;t talk to each other natively &#8212; different datalink standards). And real-time anomaly detection in sensor streams to flag adversarial manipulation or hardware degradation.</p><p>-----</p><p>## 5. Electronic Warfare: The Adversarial ML Battlefield</p><p>Electronic warfare (EW) is the invisible layer that shapes everything above it. Every AI capability on the battlefield has an EW countermeasure, and vice versa.</p><p>**GPS jamming** is ubiquitous. Both sides in Ukraine blanket the front lines with GPS denial. The counter: visual-inertial odometry (VIO), terrain-contour matching, celestial navigation via star trackers, and increasingly, AI-based signal-of-opportunity navigation that uses ambient RF signatures (cell towers, broadcast signals) as position references.</p><p>**Communications jamming** targets the data links between drones and operators. The counter: autonomous operation (no link needed), frequency-hopping spread spectrum (FHSS), and adaptive waveforms that detect and avoid jammed frequencies in real-time.</p><p>**Spoofing and adversarial attacks** are the next frontier. If a drone uses computer vision to identify targets, an adversary can deploy adversarial patches &#8212; physical objects designed to fool neural networks (think: a printed pattern on a vehicle roof that makes a tank classify as a civilian car). Defending against this requires adversarial training, input preprocessing (spatial smoothing, JPEG compression), and multi-modal verification (if the CV says &#8220;civilian car&#8221; but the radar says &#8220;60-ton metallic object moving at 40 kph,&#8221; trust the radar).</p><p>**Startup opportunity:** Adversarial robustness testing platforms for defense CV models. Adaptive electronic counter-countermeasure (ECCM) systems that use RL to learn optimal frequency-hopping strategies in real-time. And encrypted, tamper-evident AI model distribution systems &#8212; when you push a model update to 10,000 drones in the field, how do you guarantee integrity?</p><p>-----</p><p>## 6. The LLM Layer: Where Foundation Models Actually Fit</p><p>There&#8217;s a common misconception that large language models are the core of military AI. They&#8217;re not. The core autonomy stack &#8212; perception, navigation, control &#8212; runs on specialized, lightweight models. LLMs sit on top as an *interface and analysis layer*.</p><p>Where LLMs actually add value in defense:</p><p>- **Mission planning acceleration:** Converting natural language objectives into structured mission templates. Pytho AI compresses a 48-step mission analysis process from days to minutes using agent systems.</p><p>- **Intelligence summarization:** Processing large volumes of SIGINT, HUMINT, and OSINT reports into actionable briefings. This is a RAG problem &#8212; retrieval-augmented generation over classified document stores.</p><p>- **Human-machine teaming:** Natural language interfaces for operators to query and task autonomous systems. &#8220;Show me all thermal signatures within 2km of grid reference XY that appeared in the last 30 minutes&#8221; is easier to say than to program.</p><p>- **After-action analysis:** Generating structured summaries from thousands of hours of drone footage and sensor logs.</p><p>The Pentagon awarded $200M contracts to Google, xAI, Anthropic, and OpenAI specifically for &#8220;agentic AI workflows&#8221; &#8212; orchestrating multi-step processes that combine tool use, reasoning, and human-in-the-loop checkpoints.</p><p>**The constraint:** LLMs are too large and too power-hungry for tactical edge deployment on current hardware. A 7B parameter model quantized to INT4 still needs ~4GB of RAM and draws significant power for inference. The current pattern is LLMs at the command post / base level, with small specialized models at the edge. As model distillation and speculative decoding improve, this boundary will shift.</p><p>**Startup opportunity:** Domain-specific fine-tuned models for military planning and intelligence (trained on doctrine, tactics, and operational data &#8212; not internet text). Secure RAG architectures for classified environments with air-gapped vector stores. And agentic frameworks that orchestrate multi-step military workflows with formal audit trails and human-in-the-loop gates.</p><p>-----</p><p>## 7. Manufacturing and Deployment at Scale</p><p>The final engineering bottleneck isn&#8217;t algorithmic &#8212; it&#8217;s physical. Ukraine needs 4.5 million drones per year. The EU projects needing 3 million annually just for one small country&#8217;s defense. Current production systems can&#8217;t scale to these numbers.</p><p>**The technical challenges:**</p><p>- **Rapid hardware iteration:** Drone designs are evolving on weekly cycles in Ukraine. The production system needs to handle constant BOM changes, firmware updates, and component substitution (when supply chains break).</p><p>- **AI model deployment at fleet scale:** Pushing OTA model updates to thousands of fielded drones, each potentially running different hardware variants with different accelerator architectures. This is a harder version of the mobile app deployment problem.</p><p>- **Quality assurance for autonomous weapons:** How do you test that a CV model won&#8217;t misclassify targets across the full distribution of real-world conditions? Traditional software testing doesn&#8217;t cover it. You need systematic adversarial testing, formal verification where possible, and continuous monitoring of deployed model performance.</p><p>**Startup opportunity:** CI/CD pipelines for edge AI models that handle hardware-aware compilation, A/B testing in simulation before deployment, and rollback mechanisms. Fleet management platforms for heterogeneous autonomous systems. And automated production lines that combine robotics, additive manufacturing, and machine vision QA for attritable drone manufacturing.</p><p>-----</p><p>## Where This Is Heading</p><p>The trajectory is clear: warfare is becoming a software problem. The platforms are increasingly commoditized (a basic FPV drone costs $400). The differentiation is in the AI stack &#8212; perception, decision-making, coordination, and the infrastructure that trains, deploys, and maintains these systems at scale.</p><p>For technical founders, the key insight is that defense AI isn&#8217;t one market &#8212; it&#8217;s dozens of hard engineering problems, each with its own constraint set, each representing a potentially massive category. The builders who will win aren&#8217;t generalists building &#8220;AI for defense.&#8221; They&#8217;re specialists solving specific, deeply technical problems: edge inference under SWaP constraints, multi-agent coordination in adversarial RF environments, sensor fusion across incompatible platforms, or fleet-scale model deployment for attritable systems.</p><p>The stack is being built right now. Most layers are still open</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.rajmaliuk.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Ariel Rajmaliuk's Substack is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[When AI Coding Agents Fight Over Your CPU (And What I Did About It)]]></title><description><![CDATA[The missing layer in monorepos: coordinating AI agents so validation is fast, serialized, and reliable.]]></description><link>https://blog.rajmaliuk.com/p/when-ai-coding-agents-fight-over</link><guid isPermaLink="false">https://blog.rajmaliuk.com/p/when-ai-coding-agents-fight-over</guid><dc:creator><![CDATA[Ariel Rajmaliuk]]></dc:creator><pubDate>Sun, 15 Feb 2026 19:24:21 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!uU2-!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54f8d315-8f9a-4631-8a4b-812dee194710_512x512.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I&#8217;m the CTO of <strong><a href="https://www.diiirect.com">diiirect.com</a></strong> &#8212; a remote workforce platform and talent recruiting engine. Our product lives and dies by iteration speed: shipping the web app, the API, internal tooling, and shared packages fast, without turning the dev machine into a space heater.</p><p>We also lean hard into AI coding agents. I regularly run multiple Claude Code sessions against the same repo: one fixing a UI regression, another refactoring a shared package, another touching backend workflows. It&#8217;s insanely productive&#8230; until all the agents decide to &#8220;verify&#8221; at the same time.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.rajmaliuk.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Ariel Rajmaliuk's Substack is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>If you use Claude Code / Cursor / Copilot / Codex on anything bigger than a toy repo, you&#8217;ll run into the same wall.</p><div><hr></div><h2>Why this problem is becoming the new normal</h2><p>Modern JS/TS teams are converging on a familiar setup:</p><ul><li><p><strong>Monorepo</strong></p></li><li><p><strong>pnpm</strong></p></li><li><p><strong>Turborepo</strong> (or Nx)</p></li><li><p>multiple apps + shared packages</p></li></ul><p>This is becoming the default because it&#8217;s the cleanest way to:</p><ul><li><p>share code safely (types, UI, utilities)</p></li><li><p>ship multiple surfaces (web, API, workers, desktop/mobile)</p></li><li><p>keep CI sane with caching and deterministic builds</p></li></ul><p>But it also creates a new failure mode: the repo is shared, while your dev tooling isn&#8217;t coordinated.</p><p>Humans coordinate implicitly:</p><ul><li><p>&#8220;You run type-check, I&#8217;ll wait.&#8221;</p></li><li><p>&#8220;Don&#8217;t start a heavy build while I&#8217;m compiling.&#8221;</p></li></ul><p>AI agents don&#8217;t have that social layer. They just do the reasonable thing locally:</p><blockquote><p>make changes &#8594; verify &#8594; repeat</p></blockquote><p>Multiply that by 2&#8211;4 agents and &#8220;reasonable&#8221; turns into resource warfare.</p><div><hr></div><h2>The problem: duplicated heavy work</h2><p>My stack is a Turborepo + pnpm TypeScript monorepo.</p><p>After an agent edits code, it naturally verifies with:</p><ul><li><p><code>pnpm type-check</code></p></li><li><p>which runs <code>tsc --noEmit</code></p></li></ul><p>That&#8217;s correct behavior. The issue is cost: each <code>tsc --noEmit</code> spins up a full TypeScript compiler pass and can easily consume <strong>~800MB+ RAM</strong> plus a chunk of CPU.</p><p>Now imagine 3 AI sessions finishing around the same time:</p><ul><li><p><strong>Session 1</strong>: <code>pnpm type-check</code> &#8594; <code>tsc --noEmit</code> &#8594; ~800MB RAM, CPU pinned</p></li><li><p><strong>Session 2</strong>: <code>pnpm type-check</code> &#8594; <code>tsc --noEmit</code> &#8594; ~800MB RAM, CPU pinned</p></li><li><p><strong>Session 3</strong>: <code>pnpm type-check</code> &#8594; <code>tsc --noEmit</code> &#8594; ~800MB RAM, CPU pinned</p></li></ul><p><strong>Peak:</strong> ~2.4GB RAM spike + CPU thrashing<br><strong>Result:</strong> everything slows down, sometimes OOM kills, failed builds, wasted time</p><p>And here&#8217;s the key:</p><p>Running the same type-check 3 times concurrently is completely pointless. They&#8217;re checking the same repo state. Runs #2 and #3 produce the same output as run #1.</p><p>What you actually want is:</p><ul><li><p>one real run</p></li><li><p>everyone else waits</p></li><li><p>then they get a <strong>Turborepo cache hit</strong> and return instantly</p></li></ul><p>In other words: <strong>serialization + caching</strong>. Make the second run free.</p><div><hr></div><h2>Solution Part 1: a concurrency guard in front of Turborepo</h2><p>I wrote a bash script, <code>turbo-guard.sh</code> (~270 lines), that sits between all my pnpm scripts and Turborepo.</p><p>It has two layers of defense:</p><ol><li><p><strong>Process-level detection</strong> (catches &#8220;bypass&#8221; scenarios)</p></li><li><p><strong>Atomic file locking</strong> (serializes identical tasks)</p></li></ol><h3>Layer 1 &#8212; Process-level detection (the hard safety net)</h3><p>AI agents are creative. Even if you tell them &#8220;always run <code>pnpm type-check</code>&#8221;, they&#8217;ll sometimes run:</p><ul><li><p><code>npx tsc --noEmit</code></p></li><li><p><code>pnpm turbo build</code></p></li><li><p><code>pnpm --filter app lint</code></p></li></ul><p>So the guard doesn&#8217;t just trust who called it &#8212; it looks at what&#8217;s actually running.</p><p>It scans processes scoped to this repo and checks for heavy commands:</p><ul><li><p><code>tsc --noEmit</code></p></li><li><p><code>next build</code></p></li></ul><p>If it finds any, it waits for them to finish.</p><p>This makes the system robust even when agents bypass the wrapper.</p><h3>Layer 2 &#8212; Atomic file lock (serialize identical work)</h3><p>For concurrent invocations of the guard itself, it uses a POSIX-portable atomic lock via <code>mkdir</code>.</p><p>The lock name is derived from the command arguments:</p><ul><li><p><code>turbo-guard.sh lint</code> and <code>turbo-guard.sh build</code> get different locks &#8594; they can run in parallel</p></li><li><p>two <code>turbo-guard.sh type-check</code> calls get the same lock &#8594; serialized</p></li></ul><p>Same task = same lock = no redundant work.</p><h3>Stale lock recovery (the part that matters in practice)</h3><p>This script exists because processes get killed unexpectedly:</p><ul><li><p>OOM</p></li><li><p>SIGKILL</p></li><li><p>crashes</p></li><li><p>laptop sleep/wake</p></li><li><p>etc.</p></li></ul><p>When that happens, you can be left with a stale lock (lock directory exists, but no real owner).</p><p>So the guard:</p><ul><li><p>reads the PID from the lock</p></li><li><p>checks it&#8217;s alive <strong>and</strong> still a relevant process (node/tsc/turbo/pnpm)</p></li><li><p>cleans up stale locks and safely reacquires (including race handling)</p></li></ul><p>Without PID validation, recycled PIDs can deadlock you waiting on an unrelated process.</p><div><hr></div><h2>Integration: make it invisible to the agents</h2><p>The trick is to make agents do the right thing without knowing anything.</p><p>In <code>package.json</code>, route heavy tasks through the guard:</p><pre><code><code>{
  "build": "./scripts/turbo-guard.sh build",
  "lint": "./scripts/turbo-guard.sh lint",
  "type-check": "./scripts/turbo-guard.sh type-check",
  "validate": "./scripts/turbo-guard.sh lint type-check"
}
</code></code></pre><p>Then in your agent instructions (e.g. <code>CLAUDE.md</code>), be explicit:</p><ul><li><p><strong>Always</strong> use root scripts: <code>pnpm validate</code>, <code>pnpm type-check</code>, <code>pnpm lint</code>, <code>pnpm build</code></p></li><li><p><strong>Never</strong> run Turbo directly: <code>pnpm turbo ...</code>, <code>npx tsc ...</code>, etc.</p></li></ul><p>That&#8217;s the &#8220;soft&#8221; coordination layer (instructions).<br>The process scan + locks are the &#8220;hard&#8221; layer (enforcement).</p><div><hr></div><h2>What it looks like in practice</h2><h3>Without the guard (3 sessions type-checking concurrently)</h3><ul><li><p>Session 1: ~18s, ~820MB RAM</p></li><li><p>Session 2: ~22s, ~810MB RAM (slower due to contention)</p></li><li><p>Session 3: ~25s, ~830MB RAM (even slower, CPU thrashing)</p></li></ul><p><strong>Total wall time:</strong> ~25s<br><strong>Peak RAM:</strong> ~2.4GB<br><strong>CPU:</strong> pegged and thrashing</p><h3>With the guard</h3><ul><li><p>Session 1 acquires lock &#8594; ~18s, ~820MB RAM</p></li><li><p>Session 2 waits &#8594; then cache hit &#8594; &lt;1s</p></li><li><p>Session 3 waits &#8594; then cache hit &#8594; &lt;1s</p></li></ul><p><strong>Total wall time:</strong> ~19s<br><strong>Peak RAM:</strong> ~820MB<br><strong>CPU:</strong> normal</p><p>Same correctness, one-third the resource hit. The second and third checks still happen &#8212; they&#8217;re just effectively free because the cache is warm.</p><div><hr></div><h2>Solution Part 2: keep TypeScript warm with a watch agent (and make it agent-friendly)</h2><p>The concurrency guard fixes the resource problem. But there&#8217;s a second problem: speed.</p><p>A cold <code>tsc --noEmit</code> often takes <strong>15&#8211;25 seconds</strong>. In an edit &#8594; check &#8594; fix &#8594; check loop, that&#8217;s brutal. Over 10 iterations you can burn minutes just waiting.</p><p>TypeScript&#8217;s <code>--watch</code> solves this. After the initial compile, incremental rechecks often take <strong>~2&#8211;3 seconds</strong> because tsc keeps program state in memory.</p><p>The catch: <code>tsc --watch</code> streams human-readable text to stdout. It&#8217;s made for a developer staring at a terminal, not for an AI agent that needs structured results.</p><p>So I built <code>tsc-watch-agent.sh</code> (~450 lines):</p><ul><li><p>runs <code>tsc --watch</code> in the background</p></li><li><p>parses output</p></li><li><p>writes structured JSON to a status file</p></li><li><p>exposes commands agents can call (<code>start</code>, <code>wait</code>, <code>errors</code>, <code>status</code>, <code>stop</code>)</p></li></ul><h3>The agent workflow</h3><pre><code><code>pnpm tsc:watch:start
pnpm tsc:watch:wait     # blocks until current check completes, returns JSON
pnpm tsc:watch:errors   # returns just the errors array
pnpm tsc:watch:stop
</code></code></pre><p><code>wait</code> is the key interface:</p><ul><li><p>it blocks until tsc finishes the current cycle</p></li><li><p>then returns structured JSON</p></li><li><p>no parsing terminal output, no guessing when compilation is done</p></li></ul><h3>Architecture (3 background processes)</h3><p>The script spawns:</p><ol><li><p><strong>tsc process</strong> &#8212; <code>tsc --watch --noEmit --preserveWatchOutput</code><br>Uses <code>exec</code> so the tracked PID is the real node/tsc process (important on macOS).</p></li><li><p><strong>parser</strong> &#8212; reads from a FIFO line-by-line, extracts errors with regex, writes <code>status.json</code> after each cycle.</p></li><li><p><strong>cleanup watcher</strong> &#8212; if tsc dies unexpectedly, it unblocks the parser so it doesn&#8217;t hang forever on a dead pipe.</p></li></ol><p>This is the difference between &#8220;cool demo&#8221; and &#8220;works for months without wedging.&#8221;</p><div><hr></div><h2>How the two scripts coexist cleanly</h2><p>One easy mistake:</p><p>If the guard&#8217;s process scan detects the watch process, it will wait forever because watch never exits.</p><p>So the guard excludes watch mode explicitly:</p><ul><li><p>it looks for <code>tsc --noEmit</code></p></li><li><p>but ignores anything containing <code>--watch</code></p></li></ul><p>The watch agent also uses a separate lock namespace, so it doesn&#8217;t collide with guard locks.</p><div><hr></div><h2>The speed difference is the point</h2><h3>Without the watcher (cold checks)</h3><p>Five iterations:</p><ul><li><p>18s</p></li><li><p>17s</p></li><li><p>19s</p></li><li><p>18s</p></li><li><p>17s</p></li></ul><p>Total: <strong>~89 seconds</strong> spent type-checking</p><h3>With the watcher</h3><ul><li><p>initial compile: ~12s (one-time)</p></li><li><p>then incremental checks: ~2&#8211;3s each</p></li></ul><p>Total (including initial): <strong>~24 seconds</strong></p><p>That&#8217;s a ~70% reduction in time spent waiting. Over longer sessions, it changes what&#8217;s possible.</p><div><hr></div><h2>The full picture</h2><p>You end up with two &#8220;lanes&#8221;:</p><h3>Lane A: one-off validation (serialized + cached + safe)</h3><p>Use:</p><ul><li><p><code>pnpm validate</code></p></li><li><p><code>pnpm type-check</code></p></li><li><p><code>pnpm build</code></p></li></ul><p>These go through <code>turbo-guard.sh</code>.</p><h3>Lane B: iterative debugging (fast incremental feedback)</h3><p>Use:</p><ul><li><p><code>pnpm tsc:watch:*</code></p></li></ul><p>These go through the watch agent and return structured JSON quickly.</p><p>Agents don&#8217;t need to understand the machinery. They just use the commands they&#8217;re told to use, and the infrastructure coordinates the rest.</p><div><hr></div><h2>What I&#8217;d do differently</h2><p>Honestly: not much. A few things that ended up being essential:</p><ul><li><p><code>mkdir</code><strong> locks are underrated.</strong> Atomic, portable, zero dependencies.</p></li><li><p><strong>PID validation matters.</strong> &#8220;PID exists&#8221; isn&#8217;t enough because PIDs get recycled.</p></li><li><p><strong>The FIFO + cleanup watcher is necessary</strong> if you want the watch agent to survive real-world failures.</p></li><li><p><strong>Instructions matter.</strong> Soft coordination (<code>CLAUDE.md</code>) plus hard enforcement (process scan + locks) is what makes this reliable with AI agents.</p></li></ul><div><hr></div><h2>If you&#8217;re hitting this too</h2><p>The core insight is simple:</p><ul><li><p><strong>Serialize identical expensive work</strong></p></li><li><p><strong>Cache results so the second run is free</strong></p></li><li><p><strong>Keep the compiler warm for iterative loops</strong></p></li><li><p><strong>Expose results as structured output for agents</strong></p></li></ul><p>It doesn&#8217;t have to be Turborepo. Nx, Bazel, plain caching &#8212; same idea.</p><p>Also: if you publish tooling like this, avoid embedding secrets or internal endpoints in scripts. The approach here is intentionally local-only and dependency-free.</p><div><hr></div><h2>The scripts (sanitized, production-ready)</h2><p>Below are both scripts in full. They&#8217;re dependency-free beyond standard POSIX tooling and are designed to work on macOS and Linux.</p><blockquote><p>Note: Paths under <code>/tmp</code> are intentionally generic to avoid tying tooling artifacts to a specific company name.</p></blockquote><div><hr></div><h3><code>scripts/turbo-guard.sh</code></h3><pre><code><code>#!/usr/bin/env bash
#
# turbo-guard.sh &#8212; Serializes concurrent Turborepo tasks across processes.
#
# PROBLEM:
#   Multiple AI agent sessions (or terminals) may independently run
#   build/lint/type-check at the same time. Heavy tasks like `tsc --noEmit`
#   can consume ~800MB+ RAM each &#8212; running 3-4 concurrently causes OOM kills
#   and wastes CPU time thrashing. Worse, instances may bypass the guard by
#   running `npx tsc`, `pnpm turbo build`, or `pnpm --filter app lint` directly.
#
# SOLUTION (two layers):
#   Layer 1 &#8212; Process-level: Before starting, check for ANY running
#     tsc/next-build processes (regardless of how they were started).
#     If found, wait for them to finish. This catches bypass scenarios.
#
#   Layer 2 &#8212; File lock: Uses mkdir(2) as an atomic POSIX lock so that
#     concurrent invocations of THIS script are serialized. The second
#     caller waits for the first to finish, then runs the same command.
#     Turborepo caches results, so the second run is instant (cache hit).
#
# USAGE:
#   ./scripts/turbo-guard.sh &lt;turbo-args...&gt;
#   ./scripts/turbo-guard.sh lint type-check --filter=app
#   ./scripts/turbo-guard.sh build --filter=app
#   ./scripts/turbo-guard.sh --force lint    # Break stale lock and run
#
# LOCK BEHAVIOR:
#   - Lock name derived from args, so different tasks can run in parallel
#   - Same task from multiple processes is serialized
#   - Stale locks (from killed processes) are auto-detected and cleaned
#   - Locks stored in /tmp, cleared on reboot
#
# EXIT CODES:
#   Passes through the exit code from `pnpm turbo`.
#
# --------------------------------------------------------------

set -euo pipefail

# -- Parse --force flag (must be first arg) --------------------

FORCE=false
if [ "${1:-}" = "--force" ]; then
  FORCE=true
  shift
fi

if [ $# -eq 0 ]; then
  echo "Usage: turbo-guard.sh [--force] &lt;turbo-args...&gt;"
  echo ""
  echo "Examples:"
  echo "  turbo-guard.sh lint type-check --filter=app"
  echo "  turbo-guard.sh build --filter=app"
  echo "  turbo-guard.sh --force type-check   # Break stale lock"
  exit 1
fi

# -- Configuration ---------------------------------------------

MAX_WAIT=600          # Max seconds to wait for another process (10 min)
POLL_INTERVAL=3       # Seconds between liveness checks
LOCK_BASE="/tmp/turbo-guard"
PROJECT_DIR="$(cd "$(dirname "$0")/.." &amp;&amp; pwd)"

# -- Derive lock name from turbo args --------------------------
# Normalize args into a filesystem-safe string. Different arg combos get
# different locks, so `lint` and `build` can run in parallel.

LOCK_NAME=$(printf '%s' "$*" | tr ' =/' '-' | tr -cd 'a-zA-Z0-9-')
LOCKDIR="${LOCK_BASE}-${LOCK_NAME}.lock"
PIDFILE="${LOCKDIR}/pid"

# -- Helper functions ------------------------------------------

cleanup() {
  rm -f "$PIDFILE" 2&gt;/dev/null
  rmdir "$LOCKDIR" 2&gt;/dev/null || true
}

# Check if a PID is alive AND is a node/turbo/pnpm process (not a recycled PID).
is_turbo_alive() {
  local pid="$1"
  if ! kill -0 "$pid" 2&gt;/dev/null; then
    return 1  # Process doesn't exist
  fi
  local comm
  comm=$(ps -p "$pid" -o comm= 2&gt;/dev/null || echo "")
  case "$comm" in
    *node*|*turbo*|*pnpm*|*npm*|*tsc*) return 0 ;;
    *) return 1 ;;
  esac
}

# Wait for a specific PID to finish, with timeout.
wait_for_pid() {
  local other_pid="$1"
  local label="${2:-process}"
  local waited=0

  while kill -0 "$other_pid" 2&gt;/dev/null; do
    sleep "$POLL_INTERVAL"
    waited=$((waited + POLL_INTERVAL))
    if [ $waited -ge $MAX_WAIT ]; then
      echo "turbo-guard: timed out after ${MAX_WAIT}s waiting for $label (PID $other_pid)."
      echo "turbo-guard: run with --force to skip waiting, or kill PID $other_pid."
      exit 1
    fi
  done

  return "$waited"
}

# =============================================================
# LAYER 1: Process-level guard
# =============================================================

wait_for_heavy_processes() {
  if [ "$FORCE" = true ]; then
    return 0
  fi

  local found_any=false
  local waited_total=0

  while true; do
    local heavy_pids=""
    heavy_pids=$(
      ps -eo pid,args 2&gt;/dev/null \
        | grep "$PROJECT_DIR" \
        | grep -E '(tsc --noEmit|next build)' \
        | grep -v -- '--watch' \
        | grep -v "grep" \
        | grep -v "turbo-guard" \
        | awk '{print $1}' \
        | tr '\n' ' '
    ) || true

    heavy_pids=$(echo "$heavy_pids" | xargs 2&gt;/dev/null || echo "")

    if [ -z "$heavy_pids" ]; then
      if [ "$found_any" = true ]; then
        echo "turbo-guard: previous heavy process(es) finished (waited ${waited_total}s)."
      fi
      return 0
    fi

    if [ "$found_any" = false ]; then
      found_any=true
      echo "turbo-guard: heavy process(es) already running: $heavy_pids"
      echo "turbo-guard: waiting for them to finish before starting..."
    fi

    sleep "$POLL_INTERVAL"
    waited_total=$((waited_total + POLL_INTERVAL))

    if [ $waited_total -ge $MAX_WAIT ]; then
      echo "turbo-guard: timed out after ${MAX_WAIT}s waiting for heavy processes."
      echo "turbo-guard: still running: $heavy_pids"
      echo "turbo-guard: run with --force to skip waiting."
      exit 1
    fi
  done
}

wait_for_heavy_processes

# =============================================================
# LAYER 2: File-lock guard
# =============================================================

if [ "$FORCE" = true ] &amp;&amp; [ -d "$LOCKDIR" ]; then
  echo "turbo-guard: --force used, removing existing lock."
  rm -rf "$LOCKDIR"
fi

if mkdir "$LOCKDIR" 2&gt;/dev/null; then
  trap cleanup EXIT INT TERM HUP
  echo $$ &gt; "$PIDFILE"

  set +e
  pnpm turbo "$@"
  TURBO_EXIT=$?
  set -e

  exit $TURBO_EXIT
fi

OTHER_PID=""
if [ -f "$PIDFILE" ]; then
  OTHER_PID=$(cat "$PIDFILE" 2&gt;/dev/null || echo "")
fi

if [ -n "$OTHER_PID" ] &amp;&amp; is_turbo_alive "$OTHER_PID"; then
  echo "turbo-guard: task already running (PID $OTHER_PID). Waiting..."
  wait_for_pid "$OTHER_PID" "lock holder"
  echo "turbo-guard: previous run finished. Running with turbo cache..."
  pnpm turbo "$@"
  exit $?
fi

rm -rf "$LOCKDIR"

if mkdir "$LOCKDIR" 2&gt;/dev/null; then
  trap cleanup EXIT INT TERM HUP
  echo $$ &gt; "$PIDFILE"

  set +e
  pnpm turbo "$@"
  TURBO_EXIT=$?
  set -e

  exit $TURBO_EXIT
fi

echo "turbo-guard: another process acquired the lock first. Waiting..."
sleep 1

OTHER_PID=""
if [ -f "$PIDFILE" ]; then
  OTHER_PID=$(cat "$PIDFILE" 2&gt;/dev/null || echo "")
fi

if [ -n "$OTHER_PID" ] &amp;&amp; is_turbo_alive "$OTHER_PID"; then
  wait_for_pid "$OTHER_PID" "lock holder"
  echo "turbo-guard: previous run finished. Running with turbo cache..."
fi

pnpm turbo "$@"
exit $?
</code></code></pre><div><hr></div><h3><code>scripts/tsc-watch-agent.sh</code></h3><pre><code><code>#!/usr/bin/env bash
#
# tsc-watch-agent.sh &#8212; Structured tsc --watch wrapper for AI agents.
#
# Runs tsc --watch in background and writes structured JSON that agents
# can read instantly after each recompile (~2-3s incremental vs ~15-25s cold).
#
# COMMANDS:
#   start [app]  - Start watcher (default: app1). Runs in background.
#   stop         - Stop watcher and clean up.
#   status       - Print JSON status to stdout.
#   errors       - Print errors array to stdout.
#   wait         - Block until next check completes (polls every 0.3s, 120s timeout).
#
# OUTPUT: /tmp/tsc-watch-agent/status.json
#
# AGENT WORKFLOW:
#   pnpm tsc:watch:start
#   pnpm tsc:watch:wait
#   pnpm tsc:watch:errors
#   pnpm tsc:watch:stop
#
# PROCESS ARCHITECTURE:
#   cmd_start spawns 3 background processes:
#     1. tsc process   &#8212; `exec` replaces subshell so PID IS the real node/tsc
#     2. parser        &#8212; reads FIFO line-by-line, writes status.json
#     3. cleanup       &#8212; waits for tsc to die, then unblocks parser
#
# --------------------------------------------------------------

set -euo pipefail

# -- Configuration ---------------------------------------------

PROJECT_DIR="$(cd "$(dirname "$0")/.." &amp;&amp; pwd)"
OUT_DIR="/tmp/tsc-watch-agent"
STATUS_FILE="${OUT_DIR}/status.json"
RAW_LOG="${OUT_DIR}/raw.log"
TSC_PID_FILE="${OUT_DIR}/tsc.pid"
LOCKDIR="${OUT_DIR}/lock"
FIFO="${OUT_DIR}/tsc.fifo"
WAIT_POLL=0.3
WAIT_TIMEOUT=120

# -- App definitions -------------------------------------------
# Replace these app mappings with your actual monorepo apps if desired.
# The defaults are intentionally generic.

get_app_config() {
  local app="${1:-app1}"
  case "$app" in
    app1)
      APP_DIR="${PROJECT_DIR}/apps/app1"
      TSC_ARGS="--noEmit --watch --preserveWatchOutput"
      NODE_OPTS="--max-old-space-size=8192"
      ;;
    app2)
      APP_DIR="${PROJECT_DIR}/apps/app2"
      TSC_ARGS="--noEmit --watch --preserveWatchOutput"
      NODE_OPTS=""
      ;;
    *)
      echo "{\"error\": \"Unknown app: ${app}. Valid: app1, app2\"}" &gt;&amp;2
      exit 1
      ;;
  esac
}

# -- JSON helpers ----------------------------------------------

write_status() {
  local status="$1" app="$2" error_count="$3" errors_json="$4"
  local check_at="$5" duration="$6" pid="$7"

  local tmp="${STATUS_FILE}.tmp"
  cat &gt; "$tmp" &lt;&lt;ENDJSON
{
  "status": "${status}",
  "app": "${app}",
  "errorCount": ${error_count},
  "errors": ${errors_json},
  "lastCheckAt": "${check_at}",
  "lastCheckDuration": "${duration}",
  "pid": ${pid}
}
ENDJSON
  mv -f "$tmp" "$STATUS_FILE"
}

# -- Lock management -------------------------------------------

acquire_lock() {
  if mkdir "$LOCKDIR" 2&gt;/dev/null; then
    echo $$ &gt; "${LOCKDIR}/pid"
    return 0
  fi

  local other_pid=""
  if [ -f "$TSC_PID_FILE" ]; then
    other_pid=$(cat "$TSC_PID_FILE" 2&gt;/dev/null || echo "")
  fi
  if [ -z "$other_pid" ] &amp;&amp; [ -f "${LOCKDIR}/pid" ]; then
    other_pid=$(cat "${LOCKDIR}/pid" 2&gt;/dev/null || echo "")
  fi

  if [ -n "$other_pid" ] &amp;&amp; kill -0 "$other_pid" 2&gt;/dev/null; then
    echo "{\"error\": \"Watcher already running (PID ${other_pid}). Use 'stop' first.\"}"
    exit 1
  fi

  rm -rf "$LOCKDIR"
  if mkdir "$LOCKDIR" 2&gt;/dev/null; then
    echo $$ &gt; "${LOCKDIR}/pid"
    return 0
  fi

  echo "{\"error\": \"Failed to acquire lock.\"}"
  exit 1
}

release_lock() {
  rm -f "${LOCKDIR}/pid" 2&gt;/dev/null
  rmdir "$LOCKDIR" 2&gt;/dev/null || true
}

# -- Process management ----------------------------------------

kill_tree() {
  local pid="$1"
  if [ -z "$pid" ]; then return; fi
  local children
  children=$(pgrep -P "$pid" 2&gt;/dev/null || echo "")
  kill "$pid" 2&gt;/dev/null || true
  local child
  for child in $children; do
    kill_tree "$child"
  done
}

kill_tree_wait() {
  local pid="$1"
  kill_tree "$pid"
  local waited=0
  while kill -0 "$pid" 2&gt;/dev/null &amp;&amp; [ $waited -lt 5 ]; do
    sleep 0.5
    waited=$((waited + 1))
  done
  if kill -0 "$pid" 2&gt;/dev/null; then
    kill -9 "$pid" 2&gt;/dev/null || true
  fi
}

# -- Commands ---------------------------------------------------

cmd_start() {
  local app="${1:-app1}"
  get_app_config "$app"

  if [ ! -d "$APP_DIR" ]; then
    echo "{\"error\": \"App directory not found: ${APP_DIR}\"}"
    exit 1
  fi

  mkdir -p "$OUT_DIR"
  acquire_lock

  local tsc_bin="${APP_DIR}/node_modules/.bin/tsc"
  if [ ! -f "$tsc_bin" ]; then
    tsc_bin="$(command -v tsc 2&gt;/dev/null || echo "")"
    if [ -z "$tsc_bin" ]; then
      release_lock
      echo "{\"error\": \"tsc not found. Run pnpm install first.\"}"
      exit 1
    fi
  fi

  &gt; "$RAW_LOG"
  rm -f "$FIFO"
  mkfifo "$FIFO"
  write_status "starting" "$app" 0 "[]" "" "" "0"

  # Parser: reads tsc output from FIFO, writes status.json
  (
    local errors_buf="" error_count=0 tsc_pid="0"
    local check_start
    check_start=$(date +%s)

    while IFS= read -r line; do
      if [ "$tsc_pid" = "0" ] &amp;&amp; [ -f "$TSC_PID_FILE" ]; then
        tsc_pid=$(cat "$TSC_PID_FILE" 2&gt;/dev/null || echo "0")
      fi

      echo "$line" &gt;&gt; "$RAW_LOG"

      if echo "$line" | grep -qE '(Starting compilation|Starting incremental compilation)'; then
        check_start=$(date +%s)
        errors_buf=""
        error_count=0
        write_status "checking" "$app" 0 "[]" "" "" "$tsc_pid"
        continue
      fi

      if echo "$line" | grep -qE '^.+\([0-9]+,[0-9]+\): error TS[0-9]+:'; then
        local file msg_line msg_col code message
        file=$(echo "$line" | sed -E 's/^(.+)\([0-9]+,[0-9]+\): error TS[0-9]+: .+$/\1/')
        msg_line=$(echo "$line" | sed -E 's/^.+\(([0-9]+),[0-9]+\): error TS[0-9]+: .+$/\1/')
        msg_col=$(echo "$line" | sed -E 's/^.+\([0-9]+,([0-9]+)\): error TS[0-9]+: .+$/\1/')
        code=$(echo "$line" | sed -E 's/^.+\([0-9]+,[0-9]+\): error (TS[0-9]+): .+$/\1/')
        message=$(echo "$line" | sed -E 's/^.+\([0-9]+,[0-9]+\): error TS[0-9]+: (.+)$/\1/')
        file="${file#"${APP_DIR}/"}"
        message=$(echo "$message" | sed 's/\\/\\\\/g; s/"/\\"/g; s/\t/\\t/g')

        local entry="{\"file\": \"${file}\", \"line\": ${msg_line}, \"col\": ${msg_col}, \"code\": \"${code}\", \"message\": \"${message}\"}"
        if [ -z "$errors_buf" ]; then
          errors_buf="$entry"
        else
          errors_buf="${errors_buf}, ${entry}"
        fi
        error_count=$((error_count + 1))
        continue
      fi

      if echo "$line" | grep -qE 'Found [0-9]+ errors?\.'; then
        local now duration_s final_status check_at
        now=$(date +%s)
        duration_s=$((now - check_start))
        check_at=$(date -u +%Y-%m-%dT%H:%M:%SZ)

        if [ "$error_count" -eq 0 ]; then
          final_status="ready"
        else
          final_status="error"
        fi

        write_status "$final_status" "$app" "$error_count" "[${errors_buf}]" "$check_at" "${duration_s}s" "$tsc_pid"
        continue
      fi
    done &lt; "$FIFO"

    write_status "stopped" "$app" 0 "[]" "$(date -u +%Y-%m-%dT%H:%M:%SZ)" "" "0"
    release_lock
  ) &amp;
  local parser_pid=$!

  # tsc process: exec replaces subshell so PID IS the real node/tsc process
  (
    cd "$APP_DIR"
    if [ -n "$NODE_OPTS" ]; then
      NODE_OPTIONS="$NODE_OPTS" exec "$tsc_bin" $TSC_ARGS
    else
      exec "$tsc_bin" $TSC_ARGS
    fi
  ) &gt; "$FIFO" 2&gt;&amp;1 &amp;
  local tsc_pid=$!
  echo "$tsc_pid" &gt; "$TSC_PID_FILE"
  echo "$tsc_pid" &gt; "${LOCKDIR}/pid"

  write_status "checking" "$app" 0 "[]" "" "" "$tsc_pid"

  # Cleanup watcher: when tsc dies, unblock parser
  (
    wait "$tsc_pid" 2&gt;/dev/null || true
    sleep 1
    if kill -0 "$parser_pid" 2&gt;/dev/null; then
      echo "" &gt; "$FIFO" 2&gt;/dev/null || true
    fi
  ) &amp;

  echo "{\"started\": true, \"app\": \"${app}\", \"pid\": ${tsc_pid}, \"statusFile\": \"${STATUS_FILE}\"}"
}

cmd_stop() {
  if [ ! -d "$OUT_DIR" ]; then
    echo "{\"stopped\": true, \"wasRunning\": false}"
    return 0
  fi

  local tsc_pid=""
  if [ -f "$TSC_PID_FILE" ]; then
    tsc_pid=$(cat "$TSC_PID_FILE" 2&gt;/dev/null || echo "")
  fi

  local was_running=false
  if [ -n "$tsc_pid" ] &amp;&amp; kill -0 "$tsc_pid" 2&gt;/dev/null; then
    was_running=true
    kill_tree_wait "$tsc_pid"
  fi

  # Kill any processes stuck on the FIFO (best-effort)
  if [ -p "$FIFO" ]; then
    local fifo_pids=""
    fifo_pids=$(lsof -t "$FIFO" 2&gt;/dev/null || fuser "$FIFO" 2&gt;/dev/null || echo "")
    local p
    for p in $fifo_pids; do
      kill "$p" 2&gt;/dev/null || true
    done
  fi

  local app="unknown"
  if [ -f "$STATUS_FILE" ]; then
    app=$(grep -o '"app": "[^"]*"' "$STATUS_FILE" 2&gt;/dev/null | head -1 | sed 's/"app": "//;s/"//' || echo "unknown")
  fi

  rm -f "$TSC_PID_FILE" "$FIFO" 2&gt;/dev/null
  write_status "stopped" "$app" 0 "[]" "$(date -u +%Y-%m-%dT%H:%M:%SZ)" "" "0"
  release_lock

  echo "{\"stopped\": true, \"wasRunning\": ${was_running}}"
}

cmd_status() {
  if [ ! -f "$STATUS_FILE" ]; then
    echo "{\"status\": \"stopped\", \"app\": \"\", \"errorCount\": 0, \"errors\": [], \"lastCheckAt\": \"\", \"lastCheckDuration\": \"\", \"pid\": 0}"
    return 0
  fi

  local tsc_pid=""
  if [ -f "$TSC_PID_FILE" ]; then
    tsc_pid=$(cat "$TSC_PID_FILE" 2&gt;/dev/null || echo "")
  fi

  if [ -n "$tsc_pid" ] &amp;&amp; kill -0 "$tsc_pid" 2&gt;/dev/null; then
    cat "$STATUS_FILE"
  else
    echo "{\"status\": \"stopped\", \"app\": \"\", \"errorCount\": 0, \"errors\": [], \"lastCheckAt\": \"\", \"lastCheckDuration\": \"\", \"pid\": 0}"
  fi
}

cmd_errors() {
  if [ ! -f "$STATUS_FILE" ]; then
    echo "[]"
    return 0
  fi

  local content
  content=$(cat "$STATUS_FILE")

  if command -v python3 &amp;&gt;/dev/null; then
    echo "$content" | python3 -c "import sys,json; d=json.load(sys.stdin); print(json.dumps(d.get('errors',[]),indent=2))"
  else
    echo "$content" | sed -n '/"errors":/,/\]/p' | sed '1s/.*"errors": //'
  fi
}

cmd_wait() {
  if [ ! -f "$STATUS_FILE" ] &amp;&amp; [ ! -f "$TSC_PID_FILE" ]; then
    echo "{\"error\": \"No watcher running. Start one with: pnpm tsc:watch:start\"}"
    exit 1
  fi

  local start_check_at=""
  if [ -f "$STATUS_FILE" ]; then
    start_check_at=$(grep -o '"lastCheckAt": "[^"]*"' "$STATUS_FILE" 2&gt;/dev/null | sed 's/"lastCheckAt": "//;s/"//' || echo "")
  fi

  local poll_count=0
  local max_polls=400  # 120s / 0.3s

  while true; do
    if [ ! -f "$TSC_PID_FILE" ]; then
      echo "{\"error\": \"Watcher process exited unexpectedly.\"}"
      exit 1
    fi

    local tsc_pid
    tsc_pid=$(cat "$TSC_PID_FILE" 2&gt;/dev/null || echo "")
    if [ -n "$tsc_pid" ] &amp;&amp; ! kill -0 "$tsc_pid" 2&gt;/dev/null; then
      echo "{\"error\": \"Watcher process (PID ${tsc_pid}) is no longer running.\"}"
      exit 1
    fi

    if [ -f "$STATUS_FILE" ]; then
      local current_status current_check_at
      current_status=$(grep -o '"status": "[^"]*"' "$STATUS_FILE" 2&gt;/dev/null | head -1 | sed 's/"status": "//;s/"//' || echo "")
      current_check_at=$(grep -o '"lastCheckAt": "[^"]*"' "$STATUS_FILE" 2&gt;/dev/null | sed 's/"lastCheckAt": "//;s/"//' || echo "")

      if [ "$current_status" = "ready" ] || [ "$current_status" = "error" ]; then
        if [ "$current_check_at" != "$start_check_at" ] || [ -z "$start_check_at" ]; then
          cat "$STATUS_FILE"
          return 0
        fi
      fi

      if [ "$current_status" = "stopped" ]; then
        echo "{\"error\": \"Watcher stopped while waiting.\"}"
        exit 1
      fi
    fi

    sleep "$WAIT_POLL"
    poll_count=$((poll_count + 1))

    if [ "$poll_count" -ge "$max_polls" ]; then
      echo "{\"error\": \"Timed out after ${WAIT_TIMEOUT}s waiting for type-check to complete.\"}"
      exit 1
    fi
  done
}

# -- Main dispatch ---------------------------------------------

CMD="${1:-}"
shift || true

case "$CMD" in
  start)  cmd_start "$@" ;;
  stop)   cmd_stop ;;
  status) cmd_status ;;
  errors) cmd_errors ;;
  wait)   cmd_wait ;;
  *)
    echo "Usage: tsc-watch-agent.sh &lt;command&gt; [args]"
    echo ""
    echo "Commands:"
    echo "  start [app]  Start tsc --watch (default: app1)"
    echo "               Apps: app1, app2"
    echo "  stop         Stop the watcher"
    echo "  status       Print JSON status"
    echo "  errors       Print errors array"
    echo "  wait         Block until next check completes"
    exit 1
    ;;
esac
</code></code></pre><div><hr></div><h2>Final note</h2><p>This isn&#8217;t really about TypeScript. It&#8217;s about a new reality:</p><p>When multiple automated actors work on the same repo, you need coordination primitives. Task runners with caching (Turborepo/Nx) solve the &#8220;do less work&#8221; problem, but you still need a thin layer to solve the &#8220;don&#8217;t do the same work at the same time&#8221; problem.</p><p>Serialize identical work. Cache it. Keep hot compilers alive for loops. And give agents structured outputs so they don&#8217;t waste cycles parsing noise.</p><p>That&#8217;s how you stop your AI assistants from fighting over your CPU&#8212;and turn them into an actual multiplier.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.rajmaliuk.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Ariel Rajmaliuk's Substack is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>