Ariel Rajmaliuk's Substack

The GTM Engineer role is too small

Ariel Rajmaliuk — Wed, 24 Jun 2026 23:33:03 GMT

Everyone is suddenly talking about the “GTM Engineer.”

And honestly, I get it.

Go-to-market has become much more technical. Sales and marketing teams are no longer just running campaigns, writing sequences, and updating CRM fields. They are stitching together Clay, HubSpot, Salesforce, enrichment providers, sequencing tools, intent data, webhooks, AI workflows, and a growing pile of internal automations.

So yes, the GTM Engineer is a real role.

But I think the current definition is falling short.

Most people are defining GTM Engineering as some combination of:

CRM architecture
workflow automation
Clay tables
lead enrichment
routing logic
AI-personalized outbound
HubSpot or Salesforce ops
Zapier, Make, n8n, or custom scripts
basic reporting and dashboards

That is useful. It is also not enough.

Because the hardest problem in modern go-to-market is no longer just execution.

It is measurement.

A company can automate outbound at scale, personalize every email with AI, enrich every account, route every lead perfectly, and still have no idea what actually created pipeline.

That is the uncomfortable part.

We are getting very good at creating more GTM activity. But many teams are still bad at building the systems that tell them which activity worked.

And that is where I think the GTM Engineer role needs to evolve.

The current GTM Engineer is too focused on “doing more”

The common GTM Engineer profile makes sense on the surface.

Companies want someone who can sit between revenue, ops, data, and engineering. Someone who understands the business side but can also build. Someone who can turn a messy sales or marketing motion into an automated system.

Clay, for example, describes GTM Engineering around automated revenue systems, AI, enrichment, and workflow automation. That framing helped popularize the role, and it is directionally right.

But look at most GTM Engineer job descriptions today and the center of gravity is still execution:

Build outbound workflows.
Enrich leads.
Automate CRM updates.
Improve routing.
Connect tools.
Help SDRs move faster.
Use AI to personalize at scale.

Again, all of this matters.

But the problem is that it optimizes for volume and speed before it optimizes for truth.

And in GTM, truth is the compounding asset.

If you do not know which campaigns create real pipeline, which channels produce high-quality opportunities, which signals predict conversion, which ads generate revenue, and which touches are just noise, then automation can actually make things worse.

You are not scaling learning.

You are scaling confusion.

The real GTM bottleneck is becoming measurement infrastructure

A few years ago, “tracking” sounded like a marketing ops task.

Install Google Tag Manager.
Add GA4.
Place the Meta pixel.
Make sure UTMs exist.
Call it a day.

That world is basically over.

Modern tracking is no longer just browser pixels and dashboards. It is becoming an infrastructure problem.

Google now pushes server-side tagging and first-party tagging patterns. GA4 has Measurement Protocol for server and offline events, but even Google says it should augment normal collection, not replace it. Meta has Conversions API and event deduplication between browser and server events. Google Ads has enhanced conversions and offline conversion imports. LinkedIn has its own Conversions API.

This is not “just install a pixel” anymore.

This is systems design.

You need to understand:

What happens in the browser.
What happens on the server.
What happens in the CRM.
What happens in the warehouse.
What gets sent back to ad platforms.
What gets deduplicated.
What gets consented.
What gets attributed.
What gets trusted.

That is not a side task.

That is the revenue data plane.

And if the GTM Engineer does not understand it, the role is incomplete.

The classic failure mode

Here is the common B2B SaaS mess.

A user clicks an ad.

They land on the website with UTMs and maybe a click ID.

GA4 records a session.

Meta or Google records a browser-side event.

The user fills out a form.

HubSpot or Salesforce creates a lead.

A workflow enriches the account.

An SDR touches it.

The lead becomes an opportunity two weeks later.

The deal closes 60 days later.

Marketing wants to know which campaign sourced it.

Sales wants credit for the outbound touch.

The ad platform wants a conversion signal.

Finance wants CAC and payback.

Leadership wants to know what to scale.

And then everyone discovers that the system is a mess.

The UTM was overwritten.
The lead source field was manually changed.
The form submit was counted twice.
The browser event and server event were not deduplicated.
The click ID was lost.
The contact got merged into another account.
The opportunity was created under the wrong company.
The offline conversion was never sent back to Google or LinkedIn.
GA4 says one thing, HubSpot says another, Salesforce says another, and the ad platform says something else.

This is not a reporting problem.

It is an architecture problem.

And this is exactly where a serious GTM Engineer should be useful.

The role should evolve from GTM Engineer to GTM Systems Engineer

The current version of the role is mostly:

“I can connect the GTM stack and automate workflows.”

The stronger version is:

“I can design the full system from anonymous visitor to closed revenue, and make sure the company can trust the data.”

That is a much bigger job.

It includes CRM and automation, yes. But it also includes measurement, identity, data quality, attribution, and feedback loops.

A better title might be:

GTM Systems Engineer
Revenue Systems Engineer
GTM Measurement Engineer
Revenue Data Engineer

I do not care much about the title. But I do care about the scope.

Because the valuable person is not just the one who can connect HubSpot, Clay, and Slack.

The valuable person is the one who can answer:

Where did this lead really come from?
Which touchpoints influenced the deal?
What should we send back to Google Ads?
Can we trust this conversion event?
Are we double-counting?
Did consent mode change our data?
Are our UTMs governed?
Is our CRM object model compatible with our attribution model?
Can our AI agents act on clean, current, permission-safe data?

That is the next level.

What GTM Engineers should actually master

I do not think every GTM Engineer needs to become a full data engineer, analytics engineer, ML engineer, and privacy engineer at the same time.

That is unrealistic.

But I do think the senior version of this role needs a much deeper technical base than most job descriptions currently require.

Here are the areas I would add.

1. Server-side tracking

This is the obvious one.

A modern GTM Engineer should understand server-side Google Tag Manager, Stape, first-party domains, custom tagging servers, and the difference between client-side and server-side collection.

They should know why server-side tracking exists.

Not because it is trendy. Not because someone on LinkedIn said pixels are dead. But because companies need more control over data collection, routing, validation, privacy, and reliability.

The skill is not “knows Stape.”

Stape is just one implementation path.

The real skill is knowing when to use hosted server-side GTM, when to self-host, when to use Google Cloud Run or App Engine, when a first-party gateway is enough, and when the company needs a more composable event pipeline.

2. Event architecture

Bad event naming destroys companies quietly.

If one team sends demo_request, another sends book_demo, another sends form_submit, and another sends generate_lead, you do not have analytics. You have vibes.

GTM Engineers should know how to design an event taxonomy.

That means defining:

Canonical event names.
Required parameters.
Optional parameters.
Naming conventions.
Source systems.
Event owners.
Validation rules.
Deprecation rules.
Warehouse models.

This is especially important in GA4, where recommended events, custom events, and parameters need to be designed intentionally.

A serious GTM Engineer should not just ask, “Is GA4 installed?”

They should ask, “Is the event model usable?”

3. Conversion APIs and offline conversion loops

This is one of the biggest gaps.

B2B revenue rarely happens in the browser.

The important conversion is not just a page view or a form submit. It is qualified pipeline, opportunity creation, sales acceptance, closed-won revenue, expansion, retention, and payback.

Those events often live in HubSpot, Salesforce, Stripe, the product database, or the warehouse.

So the real question is:

Can you send those outcomes back to the platforms that need them?

A good GTM Engineer should understand:

Meta Conversions API.
Google Ads enhanced conversions.
Google offline conversion imports.
LinkedIn Conversions API.
Event deduplication.
Hashed identifiers.
Click IDs.
Match quality.
Revenue values.
Conversion timing.

This is where the loop closes.

Without this, ad platforms optimize for shallow conversions because that is all you gave them.

With this, they can optimize closer to revenue.

That is a massive difference.

4. Identity resolution

Attribution is mostly an identity problem pretending to be a dashboard problem.

In B2B, you are not tracking one clean entity. You are tracking many entities that need to be connected:

Anonymous visitor.
Cookie ID.
User ID.
Email.
Lead.
Contact.
Company.
Account.
Opportunity.
Deal.
Workspace.
Subscription.

If those identities are not stitched together correctly, attribution falls apart.

A GTM Engineer does not need to build a Snowflake-grade identity graph from scratch. But they should understand the model.

They should know how HubSpot contacts, companies, and deals relate to Salesforce leads, contacts, accounts, and opportunities. They should understand user-level versus account-level analytics. They should know when not to merge. They should understand why “lead source” is usually too simplistic.

Especially in B2B, the account is often the buying unit, not the individual visitor.

That changes the measurement architecture.

5. Warehouse-native GTM

The CRM is not enough.

For a modern company, the warehouse increasingly becomes the place where revenue truth lives.

That means GTM Engineers should be comfortable with the basic warehouse-native stack:

BigQuery or Snowflake.
dbt or SQL models.
Reverse ETL tools like Hightouch, Census, RudderStack, or Fivetran Activations.
Warehouse-to-CRM syncs.
Warehouse-to-ad-platform syncs.
Audience activation.
Lifecycle triggers.

This matters because the warehouse can combine data that no single GTM tool sees alone.

Website events.
Product usage.
CRM lifecycle.
Billing data.
Support data.
Ad clicks.
Campaign touches.
Closed revenue.

Once that data is modeled, it can be pushed back into the operating tools.

That is where GTM becomes a learning system.

6. Consent, privacy, and first-party data

Consent is not just legal compliance. It changes measurement.

If consent mode is implemented badly, you can lose data, break modeling, or create misleading reports. If first-party identifiers are handled badly, you create risk. If hashing is done incorrectly, conversion APIs underperform. If PII flows everywhere, the system becomes a liability.

A GTM Engineer should not replace legal counsel.

But they should understand the technical mechanics:

Consent states.
Tag behavior before and after consent.
PII classification.
Hashing user-provided data.
Data retention.
Regional routing.
What can be sent to which destination.

The future of GTM is first-party data.

That means the people building GTM systems need to know how first-party data actually moves.

7. Observability and QA

This is one of the least glamorous but most important pieces.

Most GTM stacks break silently.

A tag stops firing.
A form changes.
A field mapping breaks.
A workflow loops.
A webhook fails.
A destination rejects payloads.
A schema drifts.
A consent banner update changes collection behavior.

Nobody notices until the dashboard looks weird three weeks later.

That is not acceptable.

If GTM systems are becoming production systems, they need production-style observability.

A senior GTM Engineer should know how to monitor:

Event volume.
Conversion rates by source.
Payload errors.
Destination failures.
Schema changes.
Field mapping issues.
Sync delays.
Deduplication rates.
Match quality.
Attribution gaps.

This is where the role starts looking less like ops and more like engineering.

8. AI agent orchestration

This is the next frontier.

A lot of teams are already experimenting with AI in GTM: research agents, enrichment agents, outbound agents, routing agents, CRM update agents, call summary agents, expansion agents, churn-risk agents.

But most of this is still fragile.

The real technical skill is not “uses ChatGPT.”

It is orchestration.

Can the system decide when an agent should run?
Can it call the right tools?
Can it retry safely?
Can it respect permissions?
Can it avoid hallucinating CRM updates?
Can it write back with audit logs?
Can humans approve high-risk actions?
Can it use memory without polluting the CRM?
Can it operate on trusted data instead of random scraped context?

AI agents are only as good as the systems around them.

This is why measurement and data architecture matter even more in the AI era.

If you put agents on top of bad CRM data, broken attribution, duplicate accounts, and messy events, they will just automate the mess.

The future GTM Engineer should understand tool calling, workflow orchestration, agent memory, RAG, evaluation, observability, and guardrails.

Not because every GTM Engineer needs to become an AI researcher.

But because agentic GTM will need operators who can make AI workflows reliable enough for real revenue teams.

The future GTM Engineer is full-stack, but not in the traditional sense

When people say “full-stack GTM Engineer,” they often mean someone who can work across sales and marketing tools.

I think full-stack should mean something deeper.

A full-stack GTM Engineer should understand the path from:

Traffic → event → identity → lead → account → opportunity → revenue → attribution → activation → optimization.

That is the full stack.

Not frontend/backend.

Demand/backend.

Revenue/backend.

Measurement/backend.

Whatever you want to call it.

The point is that the role should cover both execution and truth.

A better competency model

I would split GTM Engineers into three tracks.

Automation-track GTM Engineer

This is the current mainstream version.

They know HubSpot, Salesforce, Clay, enrichment, sequencing, workflows, lead routing, lifecycle automation, and AI-assisted outbound.

Very useful.

But not enough for advanced measurement.

Measurement-track GTM Engineer

This person owns tracking, attribution, server-side infrastructure, conversion APIs, identity stitching, warehouse activation, and data quality.

This is the missing role in many companies.

Full-stack GTM Systems Engineer

This is the expensive one.

They can build revenue workflows and the measurement infrastructure underneath them.

They understand CRM and server-side tracking.
They understand outbound automation and conversion APIs.
They understand HubSpot properties and warehouse models.
They understand AI agents and observability.
They understand how pipeline is created and how pipeline is measured.

This person is rare today.

That is why the role will become valuable.

What a serious job description should include

If I were hiring for the next version of this role, I would not only write:

“Experience with HubSpot, Salesforce, Clay, Zapier, and AI tools.”

I would add:

Experience designing event taxonomies across web, product, CRM, and warehouse systems.
Experience with server-side GTM, Stape, or equivalent first-party tracking infrastructure.
Understanding of GA4, Measurement Protocol, and BigQuery export.
Experience implementing Meta CAPI, Google enhanced conversions, and LinkedIn Conversions API.
Ability to build offline conversion pipelines from CRM or warehouse to ad platforms.
Understanding of consent mode, PII handling, and hashed identifiers.
Strong grasp of CRM object models: leads, contacts, accounts, companies, deals, opportunities.
Ability to work with SQL and reverse ETL tools.
Experience monitoring event quality, schema drift, sync failures, and attribution gaps.
Bonus: experience orchestrating AI agents or LLM workflows in revenue operations.

That is a very different role from “marketing ops person who knows Clay.”

And that is the point.

Why this matters more now

There are two big reasons this is becoming urgent.

First, paid acquisition and outbound are getting more expensive and noisier.

If your data is bad, you will waste money faster. You will scale the wrong campaigns, optimize toward the wrong conversions, and misread your best channels.

Second, AI will increase GTM activity dramatically.

More emails.
More personalization.
More segmentation.
More experiments.
More content.
More workflows.
More agent-driven actions.

When activity explodes, measurement becomes even more important.

The winners will not be the companies that automate the most.

The winners will be the companies that learn the fastest.

And learning requires trustworthy measurement.

The uncomfortable truth

A lot of GTM teams do not have a growth problem.

They have a feedback loop problem.

They can launch campaigns.
They can buy tools.
They can run outbound.
They can generate activity.
They can create dashboards.

But they cannot confidently say what is working.

That is the gap.

And that is where the GTM Engineer role should go next.

Not just more automation.

Better instrumentation.

Not just more workflows.

Better feedback loops.

Not just more AI.

Better systems for AI to operate on.

Final thought

The current GTM Engineer trend is real. It is useful. It is not hype.

But it is incomplete.

The role started around GTM automation because that was the obvious pain. Revenue teams had too many manual processes, too many disconnected tools, and too much repetitive work.

But the next pain is already here.

Companies do not just need people who can make GTM move faster.

They need people who can make GTM measurable, reliable, and intelligent.

That means server-side tracking.
That means first-party data.
That means event architecture.
That means conversion APIs.
That means identity stitching.
That means warehouse activation.
That means observability.
That means AI agent orchestration.

The future GTM Engineer is not just a builder of workflows.

The future GTM Engineer is a builder of revenue systems.

GTM vs RevOps: What is the difference?

Ariel Rajmaliuk — Fri, 29 May 2026 15:54:45 GMT

I get some version of this question almost every week. Usually it’s a founder who just hired someone with “RevOps” in their title and isn’t totally sure what they bought. Sometimes it’s a marketing lead who keeps hearing “GTM” in every meeting and nods along without anyone ever defining it. Once it was a CEO who told me, completely straight-faced, that he didn’t need RevOps because he “already had a go-to-market strategy.” That’s a bit like saying you don’t need an engine because you already own a map.

So here’s the short version before I unpack it. GTM is the strategy for how you win customers. RevOps is the operational system that makes that strategy actually run. They depend on each other, but they are not the same job. Treating them as one thing is how companies end up with a gorgeous go-to-market deck sitting on top of a CRM held together with duct tape and good intentions.

And before anyone files RevOps under “nice to have,” look at how fast it stopped being optional. Gartner pegged the share of the highest-growth companies running a RevOps model at under 30% a few years back, and projected it to hit 75% by 2026.

Share of the highest-growth companies running a RevOps model. Source: Gartner.

That’s not a fad curve. That’s a function becoming standard equipment. So let me break the two apart properly, because the difference is worth real money once you see it.

What GTM actually is

GTM, go-to-market, is your bet on the market. It answers the big strategic questions: who you’re selling to, what you’re selling them and how it’s positioned, how you reach them (sales-led, product-led, partner-led, or some blend), how you price and package, and the message that makes a buyer actually care.

When a client tells me “we’re changing our GTM,” they almost never mean they’re tweaking a workflow. They mean they’re moving upmarket, adding a segment, switching from self-serve to sales-assisted, or entering a new country. It’s directional. It’s the answer to “where are we going and how do we win there.”

A real example I worked through last year: a B2B SaaS company that had grown on a freemium, product-led motion suddenly wanted to land six-figure enterprise contracts. That’s a GTM change. Different buyer (a VP instead of an individual user), different sales motion (demos and procurement instead of a credit card), different message (ROI and security instead of “try it free”). None of that is an operational question yet. It’s a strategic decision about who you’re chasing and how you intend to win them.

What RevOps actually is

RevOps is the function that makes the revenue engine run. It owns the systems, the data, the processes, and the reporting that sit underneath marketing, sales, and customer success. The CRM (HubSpot, for most of the companies I work with), the automation, lead routing, lifecycle and deal stages, the handoffs between teams, forecasting, attribution.

If GTM is the plan for the road trip, RevOps is the car. The engine, the dashboard, the fuel lines. Nobody books a vacation excited about fuel lines, but you are not getting anywhere without them.

Back to that same SaaS company. Once they decided to chase enterprise, the RevOps questions started piling up. How do we score and route a sales-qualified lead now that there’s an actual sales team? What does the deal pipeline look like when a cycle takes four months instead of four minutes? How do we stop the enterprise deals from getting buried in the same view as 9,000 free signups? Who gets the lead when both a self-serve user and their boss fill out a form? That’s all RevOps. None of it is strategy. All of it determines whether the strategy survives contact with reality.

This is also why “just have the sales ops person handle it” stops working at a certain size. Gartner has found that sales ops teams spend roughly 68% of their time on work that isn’t directly selling. RevOps exists to take that operational weight off every revenue team at once, not just sales. It’s become enough of a real discipline that RevOps is now one of the fastest-growing job titles around, with well over a hundred thousand open roles floating around the job boards at any given time. The whole point of the function is to kill friction and create predictability: one source of truth, clean handoffs, a forecast you can actually believe, and the ability to answer “what’s working” without a week of spreadsheet archaeology.

The cleanest way to tell them apart

GTM lives in the world of decisions. RevOps lives in the world of execution and measurement.

A GTM question sounds like: “Should we move from SMB into mid-market?”

A RevOps question sounds like: “Our SMB and mid-market deals are sitting in the same pipeline with the same stages, so our forecast is garbage. How do we fix the model?”

Here are a few more pairs, because once you see the pattern it gets easy to sort almost anything:

“Which three industries should we target next quarter” is GTM. “Why does our reporting still bucket those three industries as ‘Other’” is RevOps.

“What’s our pricing for the new tier” is GTM. “Why does a deal closed at the new price still trigger the old onboarding sequence” is RevOps.

“Should marketing own the first sales conversation” is GTM. “Marketing and sales each have their own definition of a qualified lead and the numbers never match” is RevOps.

Both columns matter enormously. One is about where you’re going. The other is about whether your machine can actually take you there and tell you how fast you’re moving.

Why this is worth getting right

This is the part that turns it from a vocabulary lesson into a business case. When companies actually align their people, process, and technology across the full revenue motion, which is exactly what RevOps is built to do, the gap shows up in the numbers. Forrester’s research puts it at around 36% more revenue growth and up to 28% more profitability versus companies that leave those functions siloed.

Aligned revenue teams vs. siloed ones. Source: Forrester.

Read that again. Same product, same market, same headcount. The difference is whether the engine underneath the strategy is connected or fragmented. That delta is almost entirely a RevOps story.

Why everyone still mixes them up

Because they bleed into each other constantly. A brilliant GTM strategy is worthless if RevOps can’t operationalize it. And RevOps with no GTM direction is just very tidy data that nobody is using to make a decision. In smaller companies one person (or one agency, hi) often does both at once, so the line gets blurry fast.

The confusion is understandable. But the skills are different, and more importantly, the failure modes are different. A GTM mistake means you’re aiming at the wrong target. A RevOps mistake means you’re aiming at the right target with a scope that’s three degrees off and a trigger that sticks. Different problems, different fixes.

How to know which problem you actually have

This is the part worth bookmarking.

You probably have a GTM problem if the leads coming in are simply the wrong people, your win rate is low because you’re talking to buyers who were never going to buy, your messaging doesn’t land in discovery calls, or you can’t confidently name your single best type of customer.

You probably have a RevOps problem if the leads are right but they fall through the cracks, reps spend half their day updating the CRM by hand, your forecast and reality have never once agreed, marketing and sales report different numbers and both are somehow wrong, deals stall at the same stage every time with no one noticing, or you genuinely cannot answer “where did this customer come from” without opening five tabs.

The expensive trap is treating a RevOps problem as a GTM problem. I see it constantly. A company decides their go-to-market is broken and burns a full quarter rebuilding positioning, hiring a brand consultant, and rewriting the website. Then we look under the hood and the real issue was that inbound leads were sitting unrouted for three days because the assignment workflow quietly broke during a HubSpot migration nobody finished. New strategy, same broken plumbing, same result. They changed the map when the engine was the thing leaking.

It runs the other way too, just less often. I’ve watched a team obsess over a flawless lead-routing setup and a beautiful attribution dashboard while ignoring the fact that almost none of those leads were a fit in the first place. Perfect operations, perfectly tracking the wrong audience. The dashboard was honest. The strategy was the problem.

A quick reality check on where you might be

One more number worth sitting with. RevOps adoption is nowhere near even. Among enterprises it’s basically table stakes now, mid-market is about half in, and small businesses are still mostly figuring it out.

Companies with a defined RevOps function, by size. Source: The Business Growth Report.

If you’re a smaller company, you can read that two ways. As “we’re behind,” sure. But I’d read it as the opposite: the operational discipline that bigger competitors treat as mandatory is still rare in your weight class, which means doing it well is an edge that’s genuinely available to you right now.

They’re partners, not rivals

The healthy setup is genuinely simple. GTM sets the direction. RevOps builds and runs the system that executes it, then feeds the data back so the next GTM bet is smarter than the last one. It’s a loop, not a hierarchy. Your attribution data tells you which segments actually convert, which sharpens your targeting, which changes the strategy, which changes what RevOps has to build next. Around and around.

If you’re a founder, here’s the practical takeaway. You almost certainly own GTM whether you want to or not, because it’s a direct reflection of your strategy and your conviction about the market. Nobody can hold that for you in the early days. RevOps is the part you can systematize, delegate, or bring in help for. And the earlier you treat it as a real function instead of “whoever on the team is least afraid of HubSpot,” the less painful your scaling is going to be.

Get the strategy right and the operations sloppy, and you’ll grow in spite of yourself while bleeding margin and sanity. Get the operations tight and the strategy wrong, and you’ll execute beautifully in the wrong direction. You need both. You just need to stop pretending they’re the same thing.

What’s next on here

I’m not writing this from theory. I’m in the middle of building diiirect.com right now, and over the next few posts I’m going to walk through its actual GTM strategy out in the open. The real one. The segments I’m betting on, the motion I’m choosing and why, the pricing logic, and where I expect the RevOps side to get hard. Not a sanitized after-the-fact case study, but the decisions as I’m actually making them.

If that sounds useful, follow along so you catch it when it drops.

And a quick gut check for your own setup in the meantime: when a great-fit lead comes in, can you trace exactly what happens to it, from first touch to closed deal, without guessing once? If yes, your RevOps is probably solid and any pain you’re feeling is likely strategic. If you flinched reading that, you just found your real problem, and it isn’t your positioning.

That gap is most of what we fix at Draidel. If your pipeline doesn’t add up and you can’t tell which side of this line the trouble is on, that’s the conversation I’m always happy to have.

What if the next counter-drone system is not a bigger gun, but a field of tiny eyes?

Ariel Rajmaliuk — Tue, 26 May 2026 16:17:16 GMT

I am not involved in defense, weapons procurement, or any official counter-drone program. I am just a technologist looking at the problem from the outside. (Disclaimer: I think this is a vertical that would be a lot of fun to work with, so if you have a cool project and you are willing to sponsor my visa, shoot me a message!)

So… I keep coming back to one question.

If small drones are becoming one of the dominant battlefield threats, and if some of the most dangerous ones are now becoming harder to detect through radio signals, why are we still mostly thinking about counter-drone defense as a few expensive systems sitting around a base?

What if the better idea is much more distributed?

What if, instead of only building larger radars, larger jammers, and larger interceptor systems, a military could rapidly create a temporary “sensor carpet” over dangerous terrain?

Imagine thousands of small autonomous sensor pods dropped from aircraft, drones, helicopters, or balloons. They land across fields, roads, tree lines, trenches, ridges, and likely approach corridors. They self-orient, wake up, form a mesh network, and start listening and watching for small drones.

Not as a weapon. As a perception layer.

A disposable, self-healing, air-deployable nervous system for the battlefield.

That sounds futuristic, but most of the pieces already exist. Cheap cameras exist. Thermal cameras exist. Event cameras exist. Tiny microphone arrays exist. Low-power edge AI chips exist. Mesh radios exist. Solar trickle charging exists. Unattended ground sensors have existed for decades. Acoustic drone detection is already being used at scale. What may be missing is not the technology itself, but the packaging, doctrine, and willingness to treat sensors like tactical consumables.

And the reason this matters now is simple: drones are changing faster than traditional defense systems can adapt.

Fiber-optic drones make the old assumption weaker

For years, a lot of counter-drone thinking was built around radio.

Detect the control link. Detect the video signal. Jam the command channel. Jam GPS. Geolocate the operator. Disrupt the drone’s connection.

That still matters. Many drones still use RF links. Many still leak signals. Many still depend on GPS or radio control. But fiber-optic FPV drones attack that assumption directly.

Instead of relying on a radio control link, these drones unspool a thin fiber-optic cable behind them. The operator can control the drone through the physical fiber connection. That makes them much harder to jam in the usual way. The Guardian described fiber-optic drones in Ukraine as a new threat precisely because they “cannot be jammed” in the normal RF sense, with some systems reportedly using many kilometers of fiber cable. (The Guardian)

AP recently reported that Hezbollah has adopted fiber-optic guided drones, a technology already widely used in Ukraine. The report describes them as small, difficult to detect, and immune to electronic jamming because they are connected through nearly invisible fiber-optic cables. It also notes something important: advanced detection technologies may exist, but they are not necessarily deployed broadly enough where troops are vulnerable. (AP News)

That last point is the key.

The problem is not only detection quality. It is detection density.

A perfect sensor in the wrong place does not see the drone. A radar blocked by terrain does not help the squad behind a tree line. A camera tower looking over a base perimeter does not necessarily see the low-flying FPV that comes through a ditch. A jamming system does not matter if the drone is not using a radio link.

Fiber-optic drones do not make detection impossible. They just shift the detection problem from “find the signal” to “find the physical object.”

And physical objects have signatures.

They move air. They make sound. They have heat. They have propellers. They cast tiny visual motion patterns. They fly through terrain. They must be launched from somewhere. They may drag a cable. They may create glints, shadows, vibration, or traces. None of these signatures is perfect. But together, across thousands of cheap sensing points, they may become useful.

The current counter-drone stack is too centralized

The usual counter-drone stack looks something like this:

Radar. RF detector. EO/IR camera. Jammer. Maybe interceptor. Maybe gun. Maybe laser. Maybe command dashboard.

That works for some scenarios, especially fixed bases and open terrain. But it has an architectural weakness: it is often centralized around high-value sensors and high-value defended points.

The battlefield has become more distributed than that.

A $500 drone does not care that your $10 million system has great coverage in the wrong direction. A low FPV does not need to be visible for minutes. It may only need a few seconds. A fiber drone does not need to emit a control signal. A soldier in a trench or a vehicle column does not need a beautiful air-defense picture. He needs a warning that something is coming from the left, now.

So the interesting question is whether counter-drone detection should become more like distributed computing.

Instead of one big brain, many small sensory cells.

Instead of streaming everything to a central operator, local edge inference.

Instead of perfect classification at one point, probabilistic confidence across many weak detections.

Instead of only defending the target, instrument the terrain the drone must cross.

This is not science fiction

There is historical precedent.

The military category called unattended ground sensors already includes systems with seismic, acoustic, magnetic, infrared, daylight imaging, communications, and remote processing. These systems were designed to detect ground activity such as people and vehicles, often covertly and for long duration. (Wikipedia)

The U.S. Army also developed acoustic localization systems such as UTAMS, which used arrays of acoustic sensor stations connected by radio to detect and locate events like rockets, mortars, explosions, and other battlefield acoustic signatures. (Wikipedia)

Persistent surveillance from above also has precedent. Systems like Kestrel used electro-optical and infrared cameras on aerostats to provide wide-area day and night overwatch around forward operating bases. (Wikipedia)

And in Ukraine, distributed acoustic detection is no longer just an academic idea. Reuters reported that the U.S. military deployed Ukrainian counter-drone technology from Sky Fortress, specifically the Sky Map platform, at Prince Sultan Air Base in Saudi Arabia. The system integrates sensor and radar data to help respond to drone threats. (Reuters) The Financial Times also reported that Ukraine’s acoustic detection network, Sky Fortress, uses more than 10,000 acoustic sensors to detect low-flying Shahed drones and support air defense response. (Financial Times)

So the direction is already visible.

The question is whether the same logic can be pushed further, from acoustic warning networks for larger drones into multimodal sensor carpets for small FPV and fiber-optic drones.

The idea: air-deployable multimodal sensor carpets

The basic concept is simple.

A state-level actor drops thousands of autonomous sensor pods into relevant terrain. Each pod has some combination of:

RGB camera
Event camera
Thermal camera
Microphone array
Low-power edge AI
Mesh communication
GPS or local positioning
Battery and maybe solar charging
Self-orienting or self-deploying structure
Tamper detection and encryption

The pods do not stream video continuously. That would be a terrible idea. The bandwidth, power consumption, and operator overload would be absurd.

Instead, each pod behaves more like a smart biological sensor.

It sleeps most of the time. It listens cheaply. If it hears a rotor-like acoustic signature, it wakes the visual sensors. If the event camera sees fast motion, it asks neighboring pods to wake up. If a thermal camera sees a small moving heat source, it sends a short event packet. If multiple pods produce weak detections in a consistent direction, the local mesh builds a probability track.

The base, vehicle, or command post does not receive “one thousand video feeds.”

It receives something more like:

Probable small quadcopter
Bearing northeast
Confidence 72 percent
Three acoustic confirmations
Two visual motion confirmations
One thermal confirmation
Estimated path crossing sector Bravo in 40 seconds

That is the product: not footage, but perception.

Why cameras alone are not enough

A naive version of this idea would be “drop thousands of cameras.”

That is probably not good enough.

Small FPV drones are hard visual targets. At distance, they can be a handful of pixels. They fly low. They pass behind trees, grass, buildings, smoke, rubble, hills, and power lines. They may move against complex backgrounds. They may appear for only a few seconds.

A camera network will also produce endless false positives.

Birds. Leaves. Insects. Rain. Dust. Trash. Reflections. Grass. Friendly drones. Soldiers. Shadows. Vehicles. Muzzle flashes. Smoke. Branches.

If every camera cries wolf, the system becomes useless.

That is why the more interesting version is multimodal.

Acoustic detection is cheap and passive. Microphones can stay awake longer than cameras. They do not need a clean line of sight. They can cue the visual system.

Thermal cameras help at night and in low-light conditions, although they are more expensive.

Event cameras may be especially interesting. Unlike normal frame cameras, event cameras detect pixel-level brightness changes asynchronously. They are naturally suited for fast-moving objects, low latency, low bandwidth, and motion against a relatively static background. A tiny drone crossing a field of view may be exactly the kind of target where event cameras become valuable.

Polarimetric imaging is another underexplored angle. Fiber, glass, plastic, propellers, and thin cables may sometimes create glints or polarization patterns. This would not work reliably everywhere, but with enough viewpoints, opportunistic glint detection becomes more plausible.

The point is not that any one sensor is magical.

The point is that a dense field of imperfect sensors can become strong if the fusion layer is good.

The pod should be a tiny outpost, not a webcam

A useful sensor pod would need to be rugged and semi-autonomous.

It has to survive deployment. It has to land or attach in a useful orientation. It has to localize itself. It has to form a network. It has to preserve battery. It has to resist weather. It has to avoid broadcasting too much. It has to handle false positives locally. It has to encrypt data. It has to fail gracefully.

There are several possible form factors.

One version is a self-righting pod, weighted so it lands and rolls upright.

Another is a petal-opening pod that unfolds legs, raises a small mast, and exposes sensors.

Another is a micro-parachute pod for gentler delivery from aircraft or larger drones.

Another is a cling pod, designed to hang from trees, fences, walls, rooftops, or other elevated structures.

For state-level use, the right answer is probably not one pod. It is a family of pods.

A mass pod with acoustic plus event camera.
A premium pod with thermal imaging.
A relay pod with larger battery and stronger communications.
A decoy pod to make the sensor field harder to map and destroy.
A mast pod for vegetation and rubble.
A launch-zone pod optimized to watch roads, clearings, and tree lines.

This becomes a battlefield sensing ecosystem.

The real target may be the launch chain, not only the drone

One of the more interesting aspects of fiber-optic drones is that they are physically constrained.

A fiber drone is not just a flying object. It is part of a chain.

Operator arrives.
Vehicle stops.
Drone is prepared.
Fiber spool is positioned.
Drone launches.
Cable pays out.
Drone crosses terrain.
Target is hit.

Traditional counter-drone systems often focus on the last part of that chain. The drone is already flying toward the target.

A distributed camera and acoustic sensor carpet could potentially watch more of the chain.

Maybe it detects the drone itself. But maybe it detects the launch preparation. Maybe it detects repeated activity at a tree line. Maybe it detects a vehicle that stops in the same field every day. Maybe it detects a person carrying equipment. Maybe it notices the first seconds of launch. Maybe it sees the cable after the fact and helps infer the launch corridor.

That might be more valuable than trying to visually detect the cable mid-flight.

I would not bet the system on cable detection. A fiber cable can be extremely thin, non-metallic, optically subtle, and hard to see. But I would absolutely collect data on glints, thin-line artifacts, ground traces, and post-flight cable remnants. It may be useful in some conditions.

The smarter goal is broader: understand the physical activity pattern around fiber-drone use.

Why this is not everywhere already

The honest answer is that it is hard.

Not conceptually hard. Operationally hard.

A thousand cameras are easy to imagine. A thousand useful field sensors are a logistics program.

The system has to solve power, bandwidth, false positives, ruggedization, camouflage, self-orientation, data security, adversarial spoofing, maintenance, and command integration.

Streaming video from 1,000 cameras is not realistic. Even one megabit per second per camera becomes one gigabit per second. In a contested environment, that is a giant communications signature and a giant power drain.

So edge AI is not optional. It is the system.

Then there is the false-positive problem. A sensor that misses some drones is bad. A sensor that constantly screams is also bad. Soldiers under stress cannot afford another noisy gadget. A useful system needs to be quiet most of the time and right enough when it speaks.

Then there is procurement culture. Militaries are often better at buying 100 expensive systems than 100,000 cheap intelligent objects. A disposable sensor carpet requires a different doctrine. It means accepting loss. It means software updates at scale. It means battery logistics. It means rapid iteration. It means treating perception as a consumable layer.

That is not impossible, but it is culturally different.

It is closer to cloud infrastructure thinking than traditional platform thinking.

The software may be the real weapon system

The sensors are important, but the core strategic asset would be the fusion layer.

A good sensor carpet would need to do several things in real time.

It would ingest weak detections from thousands of nodes.

It would correlate acoustic bearings, visual motion, event-camera spikes, thermal specks, radar tracks, and RF signals if present.

It would maintain uncertainty, not fake certainty.

It would build probable paths.

It would decide which neighboring nodes should wake up.

It would cue higher-end EO/IR systems.

It would identify blind spots.

It would learn the local environment.

It would collect false positives and retrain.

It would build a signature database from recovered drones, known attack events, and field recordings.

It would allow playback, labeling, and model improvement.

This is where the opportunity becomes interesting. The best version of this is not a hardware company that sells cameras. It is a perception operating system for distributed sensing.

Hardware will change. Sensors will get cheaper. Event cameras will improve. Thermal modules will improve. Edge chips will improve. Radios will improve.

But the data layer, fusion layer, and retraining loop become the durable asset.

What the system might look like in practice

A base or brigade identifies likely drone approach corridors.

Roads. Tree lines. Ridges. Abandoned buildings. Fields. Riverbanks. Trenches. Gaps in terrain. Known launch areas. Repeated attack paths.

Aircraft, drones, balloons, or ground teams deploy sensor pods in belts and clusters.

The first layer is mostly acoustic and event-camera pods.

The second layer has fewer thermal nodes.

Relay pods provide low-duty communications backhaul.

Some pods act as decoys.

The pods sleep, listen, and locally classify.

When one pod detects a rotor-like acoustic signature, nearby pods wake. If another pod sees fast visual motion, the system increases confidence. If a thermal node sees a moving speck, confidence increases again. If radar exists nearby, the track can be correlated. If EO/IR towers exist, they are cued automatically.

The operator does not see 800 feeds.

The operator sees a simple alert:

Probable FPV path.
Low altitude.
Moving from north tree line toward logistics road.
Confidence high.
Time to sector crossing, estimated 45 seconds.

For troops, the interface should be even simpler.

Direction. Urgency. Confidence.

Front left. Close. Take cover.

That is enough.

This should coexist with radar, not replace it

This idea is not a replacement for radar, EO/IR towers, RF detection, or interceptors.

It is a missing layer.

Radar is valuable for range, velocity, and all-weather coverage, but small low-flying drones can hide in clutter or terrain.

EO/IR towers are useful for confirmation, but line of sight and field of view are limited.

RF detection is still useful for drones that emit, but fiber drones are designed to reduce that dependence.

Acoustic detection is cheap and passive, but noisy environments and range limitations matter.

A sensor carpet fills the low-altitude, terrain-level gap.

It gives many cheap angles. It pushes perception forward. It watches the terrain the drone has to cross. It gives expensive systems more chances to look in the right place.

Why this idea feels inevitable

The direction of warfare is pushing toward cheap autonomous systems, mass production, and rapid iteration.

If offensive drones become cheap, numerous, and physically small, then defensive sensing probably also needs to become cheap, numerous, and physically distributed.

It is the same logic as cybersecurity.

You do not protect a modern network with one firewall at the edge. You need endpoint detection, telemetry, anomaly detection, segmentation, logs, correlation, and response. The battlefield may need the same shift.

Not one giant sensor at the edge. Many small sensors throughout the environment.

A field of endpoints.

A battlefield EDR.

That analogy is imperfect, but useful. The drone is not just an aircraft. It is an intrusion event crossing a physical network.

The terrain is the network.

The sensor carpet is the telemetry layer.

The most interesting research directions

If I were exploring this seriously, I would test a few things.

First, acoustic plus event camera fusion for small FPV detection. Not regular video first. Acoustic as the wake trigger, event camera as the fast-motion detector.

Second, sparse thermal confirmation. Thermal on every node is expensive. Thermal every 5 to 10 nodes may be enough.

Third, polarized visual detection for fiber, plastic, propeller, and cable glint. I would not assume it works, but it is worth testing.

Fourth, launch-chain detection. Instead of only labeling “drone in flight,” label human and vehicle behavior around launches.

Fifth, self-orienting pod mechanics. The sensor is useless if it lands face-down in mud.

Sixth, false-positive datasets. The fastest way to make this fail is to train on clean drone footage and deploy into real weather, insects, trees, smoke, and battlefield movement.

Seventh, local mesh intelligence. The system should not require constant backhaul. Local clusters should reason together.

Eighth, tamper and deception resistance. Assume the adversary will capture pods, spoof pods, trigger pods, and destroy pods.

Ninth, operator UX. The output needs to be brutally simple. “Possible drone” is not enough. Direction, urgency, confidence, and recommended action.

A reasonable first prototype

A minimal version does not need to be a massive military program.

Start with 50 to 100 ground-deployed pods in a controlled test range.

Each pod has:

Microphone array
Event camera or high-frame-rate camera
Basic RGB camera
Small edge AI board
Battery
Mesh radio
GPS or surveyed position

Then fly different small drones at different heights, speeds, angles, lighting conditions, and backgrounds. Include birds, vehicles, people, wind, rain, grass, insects, smoke, and friendly drones as false positives.

Measure:

Detection range
False positives per hour
Time to alert
Direction accuracy
Battery life
Mesh reliability
Performance under occlusion
Performance at night
Operator usefulness

Then add sparse thermal nodes.

Then test air-droppable packaging.

Then test relay pods.

Then test launch-zone detection.

Then test in real operational terrain.

That is the kind of iteration path that could turn the idea from “cool blog post” into something real.

The uncomfortable part

This is a defense idea, so it sits in an uncomfortable category.

The goal is protection. Early warning. Survivability. Better detection of systems that are already killing people. But every military sensing system can be dual-use. Better detection can become better targeting. Better surveillance can be misused. Distributed sensors raise questions about privacy, battlefield accountability, capture, and escalation.

That does not mean the idea should not be explored. It means it should be explored honestly.

The battlefield is already becoming saturated with cheap autonomous and semi-autonomous systems. Pretending that detection does not need to evolve will not make the problem go away.

If anything, better defensive sensing may be one of the less destructive ways to respond. Detect earlier. Warn troops. Reduce surprise. Reduce the need to shoot blindly. Improve discrimination. Build evidence. Understand attack patterns.

In a world of cheap drones, not seeing may be more dangerous than seeing.

I am not claiming this is easy. I am not claiming it is already solved. I am definitely not claiming involvement in any of this.

I just think the idea is worth taking seriously.

Fiber-optic drones reduce the value of RF-based detection and jamming. Small FPV drones reduce the value of centralized, high-end sensing alone. Terrain reduces the value of line-of-sight systems. Speed reduces the value of slow human observation.

So maybe the answer is not only better interceptors.

Maybe part of the answer is more eyes, more ears, closer to the ground.

Not one perfect sensor.

A living field of imperfect sensors.

A sensor carpet that can be dropped, lost, replaced, retrained, and redeployed.

The drone still has to cross physical space.

That means the defender can instrument the space.

And that, to me, is the interesting idea.

References and cool stuff to read:

AP, “Hezbollah adopts a new weapon: Fiber-optic drones, used widely in the war in Ukraine.” Useful recent reporting on fiber-optic guided drones moving beyond Ukraine and into Hezbollah’s arsenal. (AP News)

The Guardian, “‘They cannot be jammed’: fibre optic drones pose new threat in Ukraine.” Good overview of why fiber-optic FPV drones undermine conventional jamming assumptions. (The Guardian)

Reuters, “US turns to Ukrainian counter-drone tech after Iran attacks, sources say.” Important because it shows Ukrainian sensor-fusion and counter-drone experience being adopted by the U.S. military. (Reuters)

Financial Times, “US and Gulf states hold talks with Ukraine over drone detection.” Useful reporting on Ukraine’s Sky Fortress acoustic detection network and the scale of distributed acoustic sensing. (Financial Times)

Wang, Liu, Song, “Counter-Unmanned Aircraft System(s): State of the Art, Challenges and Future Trends.” A broad research survey covering radar, RF, acoustic, vision, and sensor-fusion approaches to C-UAS. (arXiv)

Unattended Ground Sensor overview. Useful background on the older military concept of distributed field sensors using acoustic, seismic, magnetic, infrared, and imaging modalities. (Wikipedia)

UTAMS acoustic localization system. Relevant precedent for distributed acoustic sensing and localization in military environments. (Wikipedia)

Kestrel persistent surveillance system. Relevant precedent for wide-area EO/IR overwatch from aerostats and persistent surveillance platforms. (Wikipedia)

The Algorithmic Battlefield: A Technical Breakdown of AI Systems Reshaping Modern Warfare

Ariel Rajmaliuk — Mon, 16 Mar 2026 15:52:14 GMT

The modern battlefield is undergoing a systems-level architecture change. Not a UI refresh — a full rewrite. The monolithic, cloud-dependent, human-in-every-loop model of military operations is being replaced by distributed, edge-native, increasingly autonomous systems that process sensor data locally, make decisions under uncertainty, and coordinate without centralized control.

This post breaks down the core technical systems driving that shift, the hard engineering problems that remain unsolved, and the specific layers of the stack where startups can build defensible products.

-----

## 1. Edge Inference: The Fundamental Constraint

The single most important technical challenge in military AI isn’t model quality — it’s *where the model runs*. Commercial AI assumes persistent cloud connectivity, low latency, and unlimited power. Battlefield environments offer none of those things.

Modern tactical AI systems must operate in DDIL environments: Denied, Disrupted, Intermittent, and Limited connectivity. GPS is jammed. Satellite links are targeted. Electronic warfare blankets entire frequency ranges. A drone that loses its comms link to the cloud under a traditional architecture becomes an inert projectile.

**The hardware stack that’s emerging:**

Edge inference is converging on a specific class of hardware — compact AI accelerators that combine CPUs, GPUs, and dedicated neural processing units (NPUs) into tightly integrated, low-power modules optimized for inference. The NVIDIA Jetson Orin Nano has become the de facto platform for drone-mounted AI, delivering 40 TOPS of compute at just 15W. That’s enough to run real-time YOLOv11 object detection at ~5 FPS while leaving headroom for path planning and sensor fusion. Thermal management is essentially free — propeller airflow handles cooling.

But SWaP (Size, Weight, and Power) constraints are ruthless. Military-grade edge compute must fit inside airframes measured in centimeters, run on batteries with finite capacity, and survive vibration, temperature extremes, and electromagnetic interference. This creates a hard optimization problem: model architecture selection, quantization strategy (INT8, INT4, binary), pruning depth, and hardware-model co-design all become first-order engineering decisions.

**The software architecture pattern:**

The pattern that’s winning is hybrid edge-cloud with graceful degradation:

- **Core autonomy stack runs entirely on-device:** Perception (object detection, tracking, terrain classification), navigation (SLAM, visual-inertial odometry, obstacle avoidance), and pre-authorized decision logic.

- **Cloud/HQ offload for non-critical tasks:** Mission reporting, natural language summarization via LLMs, fleet-wide learning updates. These are nice-to-have, not mission-critical.

- **Graceful degradation on link loss:** When comms drop, the drone doesn’t stop — it falls back to pre-loaded mission parameters, threat libraries, and locally cached maps. Think of it like a submarine: computationally self-sufficient, capable of completing objectives without any external input.

This is a meaningful departure from how most AI systems are architected in the commercial world, and it’s a greenfield opportunity for startups that understand offline-first, edge-native design.

**Startup opportunity:** Purpose-built inference runtimes optimized for military SWaP constraints. The TensorRT / ONNX Runtime / TFLite stack wasn’t designed for contested environments. There’s room for runtimes that handle model hot-swapping in the field, encrypted model weights with hardware-backed attestation, and deterministic latency guarantees under thermal throttling.

-----

## 2. Computer Vision Pipelines: From Pixels to Kill Chains

The perception layer is where most of the deployed AI lives today. The core pipeline running on Ukrainian drones and their Western-supplied counterparts looks something like this:

**Detection → Tracking → Classification → Geolocation → Targeting**

Each stage has distinct engineering challenges:

**Detection** uses variants of YOLO (currently v11 in deployed systems) running on edge hardware. The key constraint isn’t accuracy on benchmarks — it’s robustness to real-world degradation: smoke, dust, rain, IR countermeasures, camouflage, and adversarial conditions. Models trained on clean datasets catastrophically underperform in combat.

**Tracking** is where things get interesting. Single-object trackers (KCF, MOSSE) are lightweight but fragile. Multi-object tracking (MOT) approaches like ByteTrack or OC-SORT provide better persistence across occlusions but cost more compute. On a 15W edge device processing live video, every extra millisecond of tracking latency is a tradeoff against detection refresh rate.

**Last-mile autonomous guidance** is the critical capability that’s changing kill rates. Ukrainian forces report that AI-enabled last-mile navigation — where the drone locks onto a target via onboard computer vision and guides itself through the final ~800 meters without any operator input or data link — raises hit rates from 10-20% to 70-80%. This single capability neutralizes electronic warfare jamming, which is the primary drone countermeasure on both sides.

The technical implementation: the drone’s CV model captures and tracks the target using onboard inference, then a PID or model-predictive controller adjusts flight path to maintain lock through terminal approach. No comms link needed. No GPS needed. Just a camera, an accelerometer, and ~5W of compute.

**Where the training data problem lives:**

Ukraine just opened access to millions of annotated frames from active combat — arguably the richest military computer vision dataset ever assembled. But the data engineering challenge is enormous: heterogeneous formats from hundreds of drone types, inconsistent labeling quality across thousands of operators, domain shift between seasons/terrain/weather, and adversarial adaptation by the enemy (camouflage, decoys, civilian-vehicle attacks).

**Startup opportunity:** Military-grade annotation and data pipeline infrastructure. Think: automated labeling with active learning loops, domain adaptation tooling for sim-to-real transfer, and secure federated learning systems that let allies train on shared data without exposing raw imagery. The “Snowflake for defense CV data” doesn’t exist yet.

-----

## 3. Swarm Coordination: The Multi-Agent Problem

Individual autonomous drones are a solved-enough problem. The next technical frontier is *N* drones acting as a coherent system — and this is where the hardest open problems in AI intersect with real-world deployment constraints.

**The architecture: Centralized Training, Decentralized Execution (CTDE)**

The dominant paradigm in multi-agent reinforcement learning for swarms is CTDE: train a global policy using centralized information (full state, all agent observations), then deploy a decentralized version where each agent acts only on local observations. Key algorithms in production and research:

- **MAPPO (Multi-Agent Proximal Policy Optimization):** The workhorse. Stable training, good sample efficiency, handles cooperative and competitive settings. Used for task allocation, formation control, and adversarial engagement.

- **MADDPG (Multi-Agent Deep Deterministic Policy Gradient):** Better for continuous action spaces (flight control), but less stable at scale.

- **Hierarchical RL (HRL):** Army Research Lab work on decomposing swarm control into group-level micro control and swarm-level macro control. Reduces learning time by 80% vs. centralized approaches with only 5% optimality loss. This is the pattern that will scale to hundreds of agents.

**The unsolved engineering challenges:**

*Partial observability.* In reality, each drone sees a fraction of the battlefield. Communication between agents is intermittent and bandwidth-constrained. You can’t share full state. You’re operating in a POMDP (Partially Observable Markov Decision Process), and the observation space is noisy — sensor drift, occlusion, adversarial spoofing.

*Sim-to-real transfer.* Policies trained in simulation break in the real world. Physics engines don’t capture turbulence, sensor noise, or electromagnetic interference accurately. Zero-shot sim-to-real transfer has been demonstrated for small formations (Batra et al., 2022 — quadrotor pursuit-evasion), but scaling to 50+ heterogeneous agents in contested airspace remains an open problem.

*Communication protocol design.* Swarm agents need to share enough information to coordinate without saturating limited bandwidth. This intersects with mesh networking, dynamic topology management, and anti-jamming frequency hopping. A swarm that can’t communicate degrades to N independent agents — better than nothing, but far from optimal.

*Heterogeneous agent coordination.* Real swarms aren’t homogeneous. You might have recon drones, strike drones, EW drones, and relay drones in the same formation. Each has different dynamics, sensors, and objectives. Multi-agent RL for heterogeneous systems (like the HMDRL-UC approach using separate MAPPO for cluster heads and IPPO for cluster members) is an active research area with minimal production deployment.

**Startup opportunity:** Swarm simulation environments with high-fidelity EW modeling, turnkey CTDE training pipelines that handle heterogeneous agent types, and mesh networking stacks purpose-built for adversarial RF environments. Also: formal verification tools for swarm policies — how do you prove a swarm won’t exhibit emergent behavior that violates rules of engagement?

-----

## 4. Sensor Fusion and the Data Integration Problem

A modern autonomous system doesn’t rely on a single sensor. The full stack includes:

- **EO/IR cameras** (visible + thermal imaging)

- **LiDAR** (terrain mapping, obstacle detection)

- **Radar** (all-weather detection, velocity measurement)

- **RF sensors** (electronic warfare detection, signal intelligence)

- **IMU + barometric altimeters** (inertial navigation when GPS is denied)

- **Acoustic sensors** (drone detection, gunfire localization)

Fusing these into a coherent world model is a hard engineering problem. The standard approach is Bayesian sensor fusion — typically extended Kalman filters (EKF) or particle filters for state estimation — but deep learning-based fusion architectures are gaining ground, particularly for combining 2D image data with 3D point clouds.

**The key technical challenge is temporal alignment and conflicting modalities.** An IR sensor might detect a heat signature where the EO camera sees nothing (camouflage). A radar return might indicate a vehicle where LiDAR shows empty terrain (corner reflectors / decoys). The fusion system needs to reason about sensor reliability, environmental conditions, and potential adversarial manipulation — not just average the inputs.

**At the platform level**, the bigger challenge is JADC2 (Joint All-Domain Command and Control): connecting sensors and shooters across *all* military services into a single data mesh. This is essentially a distributed systems problem at continental scale — event-driven architectures, pub/sub messaging, data serialization standards (the military equivalent of choosing between Protobuf and Avro), and latency-aware routing through heterogeneous networks.

Anduril’s Lattice OS is the most mature attempt at this — a middleware layer that ingests data from arbitrary sensor types, runs AI-powered threat classification, and routes actionable intelligence to the right effector. Think of it as Kafka + a real-time inference engine + a targeting system, deployed across air, land, sea, and space.

**Startup opportunity:** Modular sensor fusion SDKs that handle heterogeneous input types with plug-and-play drivers. Middleware for cross-platform data interoperability (the F-22 and F-35 literally can’t talk to each other natively — different datalink standards). And real-time anomaly detection in sensor streams to flag adversarial manipulation or hardware degradation.

-----

## 5. Electronic Warfare: The Adversarial ML Battlefield

Electronic warfare (EW) is the invisible layer that shapes everything above it. Every AI capability on the battlefield has an EW countermeasure, and vice versa.

**GPS jamming** is ubiquitous. Both sides in Ukraine blanket the front lines with GPS denial. The counter: visual-inertial odometry (VIO), terrain-contour matching, celestial navigation via star trackers, and increasingly, AI-based signal-of-opportunity navigation that uses ambient RF signatures (cell towers, broadcast signals) as position references.

**Communications jamming** targets the data links between drones and operators. The counter: autonomous operation (no link needed), frequency-hopping spread spectrum (FHSS), and adaptive waveforms that detect and avoid jammed frequencies in real-time.

**Spoofing and adversarial attacks** are the next frontier. If a drone uses computer vision to identify targets, an adversary can deploy adversarial patches — physical objects designed to fool neural networks (think: a printed pattern on a vehicle roof that makes a tank classify as a civilian car). Defending against this requires adversarial training, input preprocessing (spatial smoothing, JPEG compression), and multi-modal verification (if the CV says “civilian car” but the radar says “60-ton metallic object moving at 40 kph,” trust the radar).

**Startup opportunity:** Adversarial robustness testing platforms for defense CV models. Adaptive electronic counter-countermeasure (ECCM) systems that use RL to learn optimal frequency-hopping strategies in real-time. And encrypted, tamper-evident AI model distribution systems — when you push a model update to 10,000 drones in the field, how do you guarantee integrity?

-----

## 6. The LLM Layer: Where Foundation Models Actually Fit

There’s a common misconception that large language models are the core of military AI. They’re not. The core autonomy stack — perception, navigation, control — runs on specialized, lightweight models. LLMs sit on top as an *interface and analysis layer*.

Where LLMs actually add value in defense:

- **Mission planning acceleration:** Converting natural language objectives into structured mission templates. Pytho AI compresses a 48-step mission analysis process from days to minutes using agent systems.

- **Intelligence summarization:** Processing large volumes of SIGINT, HUMINT, and OSINT reports into actionable briefings. This is a RAG problem — retrieval-augmented generation over classified document stores.

- **Human-machine teaming:** Natural language interfaces for operators to query and task autonomous systems. “Show me all thermal signatures within 2km of grid reference XY that appeared in the last 30 minutes” is easier to say than to program.

- **After-action analysis:** Generating structured summaries from thousands of hours of drone footage and sensor logs.

The Pentagon awarded $200M contracts to Google, xAI, Anthropic, and OpenAI specifically for “agentic AI workflows” — orchestrating multi-step processes that combine tool use, reasoning, and human-in-the-loop checkpoints.

**The constraint:** LLMs are too large and too power-hungry for tactical edge deployment on current hardware. A 7B parameter model quantized to INT4 still needs ~4GB of RAM and draws significant power for inference. The current pattern is LLMs at the command post / base level, with small specialized models at the edge. As model distillation and speculative decoding improve, this boundary will shift.

**Startup opportunity:** Domain-specific fine-tuned models for military planning and intelligence (trained on doctrine, tactics, and operational data — not internet text). Secure RAG architectures for classified environments with air-gapped vector stores. And agentic frameworks that orchestrate multi-step military workflows with formal audit trails and human-in-the-loop gates.

-----

## 7. Manufacturing and Deployment at Scale

The final engineering bottleneck isn’t algorithmic — it’s physical. Ukraine needs 4.5 million drones per year. The EU projects needing 3 million annually just for one small country’s defense. Current production systems can’t scale to these numbers.

**The technical challenges:**

- **Rapid hardware iteration:** Drone designs are evolving on weekly cycles in Ukraine. The production system needs to handle constant BOM changes, firmware updates, and component substitution (when supply chains break).

- **AI model deployment at fleet scale:** Pushing OTA model updates to thousands of fielded drones, each potentially running different hardware variants with different accelerator architectures. This is a harder version of the mobile app deployment problem.

- **Quality assurance for autonomous weapons:** How do you test that a CV model won’t misclassify targets across the full distribution of real-world conditions? Traditional software testing doesn’t cover it. You need systematic adversarial testing, formal verification where possible, and continuous monitoring of deployed model performance.

**Startup opportunity:** CI/CD pipelines for edge AI models that handle hardware-aware compilation, A/B testing in simulation before deployment, and rollback mechanisms. Fleet management platforms for heterogeneous autonomous systems. And automated production lines that combine robotics, additive manufacturing, and machine vision QA for attritable drone manufacturing.

-----

## Where This Is Heading

The trajectory is clear: warfare is becoming a software problem. The platforms are increasingly commoditized (a basic FPV drone costs $400). The differentiation is in the AI stack — perception, decision-making, coordination, and the infrastructure that trains, deploys, and maintains these systems at scale.

For technical founders, the key insight is that defense AI isn’t one market — it’s dozens of hard engineering problems, each with its own constraint set, each representing a potentially massive category. The builders who will win aren’t generalists building “AI for defense.” They’re specialists solving specific, deeply technical problems: edge inference under SWaP constraints, multi-agent coordination in adversarial RF environments, sensor fusion across incompatible platforms, or fleet-scale model deployment for attritable systems.

The stack is being built right now. Most layers are still open

When AI Coding Agents Fight Over Your CPU (And What I Did About It)

Ariel Rajmaliuk — Sun, 15 Feb 2026 19:24:21 GMT

I’m the CTO of diiirect.com — a remote workforce platform and talent recruiting engine. Our product lives and dies by iteration speed: shipping the web app, the API, internal tooling, and shared packages fast, without turning the dev machine into a space heater.

We also lean hard into AI coding agents. I regularly run multiple Claude Code sessions against the same repo: one fixing a UI regression, another refactoring a shared package, another touching backend workflows. It’s insanely productive… until all the agents decide to “verify” at the same time.

If you use Claude Code / Cursor / Copilot / Codex on anything bigger than a toy repo, you’ll run into the same wall.

Why this problem is becoming the new normal

Modern JS/TS teams are converging on a familiar setup:

Monorepo
pnpm
Turborepo (or Nx)
multiple apps + shared packages

This is becoming the default because it’s the cleanest way to:

share code safely (types, UI, utilities)
ship multiple surfaces (web, API, workers, desktop/mobile)
keep CI sane with caching and deterministic builds

But it also creates a new failure mode: the repo is shared, while your dev tooling isn’t coordinated.

Humans coordinate implicitly:

“You run type-check, I’ll wait.”
“Don’t start a heavy build while I’m compiling.”

AI agents don’t have that social layer. They just do the reasonable thing locally:

make changes → verify → repeat

Multiply that by 2–4 agents and “reasonable” turns into resource warfare.

The problem: duplicated heavy work

My stack is a Turborepo + pnpm TypeScript monorepo.

After an agent edits code, it naturally verifies with:

pnpm type-check
which runs tsc --noEmit

That’s correct behavior. The issue is cost: each tsc --noEmit spins up a full TypeScript compiler pass and can easily consume ~800MB+ RAM plus a chunk of CPU.

Now imagine 3 AI sessions finishing around the same time:

Session 1: pnpm type-check → tsc --noEmit → ~800MB RAM, CPU pinned
Session 2: pnpm type-check → tsc --noEmit → ~800MB RAM, CPU pinned
Session 3: pnpm type-check → tsc --noEmit → ~800MB RAM, CPU pinned

Peak: ~2.4GB RAM spike + CPU thrashing
Result: everything slows down, sometimes OOM kills, failed builds, wasted time

And here’s the key:

Running the same type-check 3 times concurrently is completely pointless. They’re checking the same repo state. Runs #2 and #3 produce the same output as run #1.

What you actually want is:

one real run
everyone else waits
then they get a Turborepo cache hit and return instantly

In other words: serialization + caching. Make the second run free.

Solution Part 1: a concurrency guard in front of Turborepo

I wrote a bash script, turbo-guard.sh (~270 lines), that sits between all my pnpm scripts and Turborepo.

It has two layers of defense:

Process-level detection (catches “bypass” scenarios)
Atomic file locking (serializes identical tasks)

Layer 1 — Process-level detection (the hard safety net)

AI agents are creative. Even if you tell them “always run pnpm type-check”, they’ll sometimes run:

npx tsc --noEmit
pnpm turbo build
pnpm --filter app lint

So the guard doesn’t just trust who called it — it looks at what’s actually running.

It scans processes scoped to this repo and checks for heavy commands:

tsc --noEmit
next build

If it finds any, it waits for them to finish.

This makes the system robust even when agents bypass the wrapper.

Layer 2 — Atomic file lock (serialize identical work)

For concurrent invocations of the guard itself, it uses a POSIX-portable atomic lock via mkdir.

The lock name is derived from the command arguments:

turbo-guard.sh lint and turbo-guard.sh build get different locks → they can run in parallel
two turbo-guard.sh type-check calls get the same lock → serialized

Same task = same lock = no redundant work.

Stale lock recovery (the part that matters in practice)

This script exists because processes get killed unexpectedly:

OOM
SIGKILL
crashes
laptop sleep/wake
etc.

When that happens, you can be left with a stale lock (lock directory exists, but no real owner).

So the guard:

reads the PID from the lock
checks it’s alive and still a relevant process (node/tsc/turbo/pnpm)
cleans up stale locks and safely reacquires (including race handling)

Without PID validation, recycled PIDs can deadlock you waiting on an unrelated process.

Integration: make it invisible to the agents

The trick is to make agents do the right thing without knowing anything.

In package.json, route heavy tasks through the guard:

{
  "build": "./scripts/turbo-guard.sh build",
  "lint": "./scripts/turbo-guard.sh lint",
  "type-check": "./scripts/turbo-guard.sh type-check",
  "validate": "./scripts/turbo-guard.sh lint type-check"
}

Then in your agent instructions (e.g. CLAUDE.md), be explicit:

Always use root scripts: pnpm validate, pnpm type-check, pnpm lint, pnpm build
Never run Turbo directly: pnpm turbo ..., npx tsc ..., etc.

That’s the “soft” coordination layer (instructions).
The process scan + locks are the “hard” layer (enforcement).

What it looks like in practice

Without the guard (3 sessions type-checking concurrently)

Session 1: ~18s, ~820MB RAM
Session 2: ~22s, ~810MB RAM (slower due to contention)
Session 3: ~25s, ~830MB RAM (even slower, CPU thrashing)

Total wall time: ~25s
Peak RAM: ~2.4GB
CPU: pegged and thrashing

With the guard

Session 1 acquires lock → ~18s, ~820MB RAM
Session 2 waits → then cache hit → <1s
Session 3 waits → then cache hit → <1s

Total wall time: ~19s
Peak RAM: ~820MB
CPU: normal

Same correctness, one-third the resource hit. The second and third checks still happen — they’re just effectively free because the cache is warm.

Solution Part 2: keep TypeScript warm with a watch agent (and make it agent-friendly)

The concurrency guard fixes the resource problem. But there’s a second problem: speed.

A cold tsc --noEmit often takes 15–25 seconds. In an edit → check → fix → check loop, that’s brutal. Over 10 iterations you can burn minutes just waiting.

TypeScript’s --watch solves this. After the initial compile, incremental rechecks often take ~2–3 seconds because tsc keeps program state in memory.

The catch: tsc --watch streams human-readable text to stdout. It’s made for a developer staring at a terminal, not for an AI agent that needs structured results.

So I built tsc-watch-agent.sh (~450 lines):

runs tsc --watch in the background
parses output
writes structured JSON to a status file
exposes commands agents can call (start, wait, errors, status, stop)

The agent workflow

pnpm tsc:watch:start
pnpm tsc:watch:wait     # blocks until current check completes, returns JSON
pnpm tsc:watch:errors   # returns just the errors array
pnpm tsc:watch:stop

wait is the key interface:

it blocks until tsc finishes the current cycle
then returns structured JSON
no parsing terminal output, no guessing when compilation is done

Architecture (3 background processes)

The script spawns:

tsc process — tsc --watch --noEmit --preserveWatchOutput
Uses exec so the tracked PID is the real node/tsc process (important on macOS).
parser — reads from a FIFO line-by-line, extracts errors with regex, writes status.json after each cycle.
cleanup watcher — if tsc dies unexpectedly, it unblocks the parser so it doesn’t hang forever on a dead pipe.

This is the difference between “cool demo” and “works for months without wedging.”

How the two scripts coexist cleanly

One easy mistake:

If the guard’s process scan detects the watch process, it will wait forever because watch never exits.

So the guard excludes watch mode explicitly:

it looks for tsc --noEmit
but ignores anything containing --watch

The watch agent also uses a separate lock namespace, so it doesn’t collide with guard locks.

The speed difference is the point

Without the watcher (cold checks)

Five iterations:

Total: ~89 seconds spent type-checking

With the watcher

initial compile: ~12s (one-time)
then incremental checks: ~2–3s each

Total (including initial): ~24 seconds

That’s a ~70% reduction in time spent waiting. Over longer sessions, it changes what’s possible.

The full picture

You end up with two “lanes”:

Lane A: one-off validation (serialized + cached + safe)

Use:

pnpm validate
pnpm type-check
pnpm build

These go through turbo-guard.sh.

Lane B: iterative debugging (fast incremental feedback)

Use:

pnpm tsc:watch:*

These go through the watch agent and return structured JSON quickly.

Agents don’t need to understand the machinery. They just use the commands they’re told to use, and the infrastructure coordinates the rest.

What I’d do differently

Honestly: not much. A few things that ended up being essential:

mkdir locks are underrated. Atomic, portable, zero dependencies.
PID validation matters. “PID exists” isn’t enough because PIDs get recycled.
The FIFO + cleanup watcher is necessary if you want the watch agent to survive real-world failures.
Instructions matter. Soft coordination (CLAUDE.md) plus hard enforcement (process scan + locks) is what makes this reliable with AI agents.

If you’re hitting this too

The core insight is simple:

Serialize identical expensive work
Cache results so the second run is free
Keep the compiler warm for iterative loops
Expose results as structured output for agents

It doesn’t have to be Turborepo. Nx, Bazel, plain caching — same idea.

Also: if you publish tooling like this, avoid embedding secrets or internal endpoints in scripts. The approach here is intentionally local-only and dependency-free.

The scripts (sanitized, production-ready)

Below are both scripts in full. They’re dependency-free beyond standard POSIX tooling and are designed to work on macOS and Linux.

Note: Paths under /tmp are intentionally generic to avoid tying tooling artifacts to a specific company name.

`scripts/turbo-guard.sh`

#!/usr/bin/env bash
#
# turbo-guard.sh — Serializes concurrent Turborepo tasks across processes.
#
# PROBLEM:
#   Multiple AI agent sessions (or terminals) may independently run
#   build/lint/type-check at the same time. Heavy tasks like `tsc --noEmit`
#   can consume ~800MB+ RAM each — running 3-4 concurrently causes OOM kills
#   and wastes CPU time thrashing. Worse, instances may bypass the guard by
#   running `npx tsc`, `pnpm turbo build`, or `pnpm --filter app lint` directly.
#
# SOLUTION (two layers):
#   Layer 1 — Process-level: Before starting, check for ANY running
#     tsc/next-build processes (regardless of how they were started).
#     If found, wait for them to finish. This catches bypass scenarios.
#
#   Layer 2 — File lock: Uses mkdir(2) as an atomic POSIX lock so that
#     concurrent invocations of THIS script are serialized. The second
#     caller waits for the first to finish, then runs the same command.
#     Turborepo caches results, so the second run is instant (cache hit).
#
# USAGE:
#   ./scripts/turbo-guard.sh 
#   ./scripts/turbo-guard.sh lint type-check --filter=app
#   ./scripts/turbo-guard.sh build --filter=app
#   ./scripts/turbo-guard.sh --force lint    # Break stale lock and run
#
# LOCK BEHAVIOR:
#   - Lock name derived from args, so different tasks can run in parallel
#   - Same task from multiple processes is serialized
#   - Stale locks (from killed processes) are auto-detected and cleaned
#   - Locks stored in /tmp, cleared on reboot
#
# EXIT CODES:
#   Passes through the exit code from `pnpm turbo`.
#
# --------------------------------------------------------------

set -euo pipefail

# -- Parse --force flag (must be first arg) --------------------

FORCE=false
if [ "${1:-}" = "--force" ]; then
  FORCE=true
  shift
fi

if [ $# -eq 0 ]; then
  echo "Usage: turbo-guard.sh [--force] "
  echo ""
  echo "Examples:"
  echo "  turbo-guard.sh lint type-check --filter=app"
  echo "  turbo-guard.sh build --filter=app"
  echo "  turbo-guard.sh --force type-check   # Break stale lock"
  exit 1
fi

# -- Configuration ---------------------------------------------

MAX_WAIT=600          # Max seconds to wait for another process (10 min)
POLL_INTERVAL=3       # Seconds between liveness checks
LOCK_BASE="/tmp/turbo-guard"
PROJECT_DIR="$(cd "$(dirname "$0")/.." && pwd)"

# -- Derive lock name from turbo args --------------------------
# Normalize args into a filesystem-safe string. Different arg combos get
# different locks, so `lint` and `build` can run in parallel.

LOCK_NAME=$(printf '%s' "$*" | tr ' =/' '-' | tr -cd 'a-zA-Z0-9-')
LOCKDIR="${LOCK_BASE}-${LOCK_NAME}.lock"
PIDFILE="${LOCKDIR}/pid"

# -- Helper functions ------------------------------------------

cleanup() {
  rm -f "$PIDFILE" 2>/dev/null
  rmdir "$LOCKDIR" 2>/dev/null || true
}

# Check if a PID is alive AND is a node/turbo/pnpm process (not a recycled PID).
is_turbo_alive() {
  local pid="$1"
  if ! kill -0 "$pid" 2>/dev/null; then
    return 1  # Process doesn't exist
  fi
  local comm
  comm=$(ps -p "$pid" -o comm= 2>/dev/null || echo "")
  case "$comm" in
    *node*|*turbo*|*pnpm*|*npm*|*tsc*) return 0 ;;
    *) return 1 ;;
  esac
}

# Wait for a specific PID to finish, with timeout.
wait_for_pid() {
  local other_pid="$1"
  local label="${2:-process}"
  local waited=0

  while kill -0 "$other_pid" 2>/dev/null; do
    sleep "$POLL_INTERVAL"
    waited=$((waited + POLL_INTERVAL))
    if [ $waited -ge $MAX_WAIT ]; then
      echo "turbo-guard: timed out after ${MAX_WAIT}s waiting for $label (PID $other_pid)."
      echo "turbo-guard: run with --force to skip waiting, or kill PID $other_pid."
      exit 1
    fi
  done

  return "$waited"
}

# =============================================================
# LAYER 1: Process-level guard
# =============================================================

wait_for_heavy_processes() {
  if [ "$FORCE" = true ]; then
    return 0
  fi

  local found_any=false
  local waited_total=0

  while true; do
    local heavy_pids=""
    heavy_pids=$(
      ps -eo pid,args 2>/dev/null \
        | grep "$PROJECT_DIR" \
        | grep -E '(tsc --noEmit|next build)' \
        | grep -v -- '--watch' \
        | grep -v "grep" \
        | grep -v "turbo-guard" \
        | awk '{print $1}' \
        | tr '\n' ' '
    ) || true

    heavy_pids=$(echo "$heavy_pids" | xargs 2>/dev/null || echo "")

    if [ -z "$heavy_pids" ]; then
      if [ "$found_any" = true ]; then
        echo "turbo-guard: previous heavy process(es) finished (waited ${waited_total}s)."
      fi
      return 0
    fi

    if [ "$found_any" = false ]; then
      found_any=true
      echo "turbo-guard: heavy process(es) already running: $heavy_pids"
      echo "turbo-guard: waiting for them to finish before starting..."
    fi

    sleep "$POLL_INTERVAL"
    waited_total=$((waited_total + POLL_INTERVAL))

    if [ $waited_total -ge $MAX_WAIT ]; then
      echo "turbo-guard: timed out after ${MAX_WAIT}s waiting for heavy processes."
      echo "turbo-guard: still running: $heavy_pids"
      echo "turbo-guard: run with --force to skip waiting."
      exit 1
    fi
  done
}

wait_for_heavy_processes

# =============================================================
# LAYER 2: File-lock guard
# =============================================================

if [ "$FORCE" = true ] && [ -d "$LOCKDIR" ]; then
  echo "turbo-guard: --force used, removing existing lock."
  rm -rf "$LOCKDIR"
fi

if mkdir "$LOCKDIR" 2>/dev/null; then
  trap cleanup EXIT INT TERM HUP
  echo $$ > "$PIDFILE"

  set +e
  pnpm turbo "$@"
  TURBO_EXIT=$?
  set -e

  exit $TURBO_EXIT
fi

OTHER_PID=""
if [ -f "$PIDFILE" ]; then
  OTHER_PID=$(cat "$PIDFILE" 2>/dev/null || echo "")
fi

if [ -n "$OTHER_PID" ] && is_turbo_alive "$OTHER_PID"; then
  echo "turbo-guard: task already running (PID $OTHER_PID). Waiting..."
  wait_for_pid "$OTHER_PID" "lock holder"
  echo "turbo-guard: previous run finished. Running with turbo cache..."
  pnpm turbo "$@"
  exit $?
fi

rm -rf "$LOCKDIR"

if mkdir "$LOCKDIR" 2>/dev/null; then
  trap cleanup EXIT INT TERM HUP
  echo $$ > "$PIDFILE"

  set +e
  pnpm turbo "$@"
  TURBO_EXIT=$?
  set -e

  exit $TURBO_EXIT
fi

echo "turbo-guard: another process acquired the lock first. Waiting..."
sleep 1

OTHER_PID=""
if [ -f "$PIDFILE" ]; then
  OTHER_PID=$(cat "$PIDFILE" 2>/dev/null || echo "")
fi

if [ -n "$OTHER_PID" ] && is_turbo_alive "$OTHER_PID"; then
  wait_for_pid "$OTHER_PID" "lock holder"
  echo "turbo-guard: previous run finished. Running with turbo cache..."
fi

pnpm turbo "$@"
exit $?

`scripts/tsc-watch-agent.sh`

#!/usr/bin/env bash
#
# tsc-watch-agent.sh — Structured tsc --watch wrapper for AI agents.
#
# Runs tsc --watch in background and writes structured JSON that agents
# can read instantly after each recompile (~2-3s incremental vs ~15-25s cold).
#
# COMMANDS:
#   start [app]  - Start watcher (default: app1). Runs in background.
#   stop         - Stop watcher and clean up.
#   status       - Print JSON status to stdout.
#   errors       - Print errors array to stdout.
#   wait         - Block until next check completes (polls every 0.3s, 120s timeout).
#
# OUTPUT: /tmp/tsc-watch-agent/status.json
#
# AGENT WORKFLOW:
#   pnpm tsc:watch:start
#   pnpm tsc:watch:wait
#   pnpm tsc:watch:errors
#   pnpm tsc:watch:stop
#
# PROCESS ARCHITECTURE:
#   cmd_start spawns 3 background processes:
#     1. tsc process   — `exec` replaces subshell so PID IS the real node/tsc
#     2. parser        — reads FIFO line-by-line, writes status.json
#     3. cleanup       — waits for tsc to die, then unblocks parser
#
# --------------------------------------------------------------

set -euo pipefail

# -- Configuration ---------------------------------------------

PROJECT_DIR="$(cd "$(dirname "$0")/.." && pwd)"
OUT_DIR="/tmp/tsc-watch-agent"
STATUS_FILE="${OUT_DIR}/status.json"
RAW_LOG="${OUT_DIR}/raw.log"
TSC_PID_FILE="${OUT_DIR}/tsc.pid"
LOCKDIR="${OUT_DIR}/lock"
FIFO="${OUT_DIR}/tsc.fifo"
WAIT_POLL=0.3
WAIT_TIMEOUT=120

# -- App definitions -------------------------------------------
# Replace these app mappings with your actual monorepo apps if desired.
# The defaults are intentionally generic.

get_app_config() {
  local app="${1:-app1}"
  case "$app" in
    app1)
      APP_DIR="${PROJECT_DIR}/apps/app1"
      TSC_ARGS="--noEmit --watch --preserveWatchOutput"
      NODE_OPTS="--max-old-space-size=8192"
      ;;
    app2)
      APP_DIR="${PROJECT_DIR}/apps/app2"
      TSC_ARGS="--noEmit --watch --preserveWatchOutput"
      NODE_OPTS=""
      ;;
    *)
      echo "{\"error\": \"Unknown app: ${app}. Valid: app1, app2\"}" >&2
      exit 1
      ;;
  esac
}

# -- JSON helpers ----------------------------------------------

write_status() {
  local status="$1" app="$2" error_count="$3" errors_json="$4"
  local check_at="$5" duration="$6" pid="$7"

  local tmp="${STATUS_FILE}.tmp"
  cat > "$tmp" </dev/null; then
    echo $$ > "${LOCKDIR}/pid"
    return 0
  fi

  local other_pid=""
  if [ -f "$TSC_PID_FILE" ]; then
    other_pid=$(cat "$TSC_PID_FILE" 2>/dev/null || echo "")
  fi
  if [ -z "$other_pid" ] && [ -f "${LOCKDIR}/pid" ]; then
    other_pid=$(cat "${LOCKDIR}/pid" 2>/dev/null || echo "")
  fi

  if [ -n "$other_pid" ] && kill -0 "$other_pid" 2>/dev/null; then
    echo "{\"error\": \"Watcher already running (PID ${other_pid}). Use 'stop' first.\"}"
    exit 1
  fi

  rm -rf "$LOCKDIR"
  if mkdir "$LOCKDIR" 2>/dev/null; then
    echo $$ > "${LOCKDIR}/pid"
    return 0
  fi

  echo "{\"error\": \"Failed to acquire lock.\"}"
  exit 1
}

release_lock() {
  rm -f "${LOCKDIR}/pid" 2>/dev/null
  rmdir "$LOCKDIR" 2>/dev/null || true
}

# -- Process management ----------------------------------------

kill_tree() {
  local pid="$1"
  if [ -z "$pid" ]; then return; fi
  local children
  children=$(pgrep -P "$pid" 2>/dev/null || echo "")
  kill "$pid" 2>/dev/null || true
  local child
  for child in $children; do
    kill_tree "$child"
  done
}

kill_tree_wait() {
  local pid="$1"
  kill_tree "$pid"
  local waited=0
  while kill -0 "$pid" 2>/dev/null && [ $waited -lt 5 ]; do
    sleep 0.5
    waited=$((waited + 1))
  done
  if kill -0 "$pid" 2>/dev/null; then
    kill -9 "$pid" 2>/dev/null || true
  fi
}

# -- Commands ---------------------------------------------------

cmd_start() {
  local app="${1:-app1}"
  get_app_config "$app"

  if [ ! -d "$APP_DIR" ]; then
    echo "{\"error\": \"App directory not found: ${APP_DIR}\"}"
    exit 1
  fi

  mkdir -p "$OUT_DIR"
  acquire_lock

  local tsc_bin="${APP_DIR}/node_modules/.bin/tsc"
  if [ ! -f "$tsc_bin" ]; then
    tsc_bin="$(command -v tsc 2>/dev/null || echo "")"
    if [ -z "$tsc_bin" ]; then
      release_lock
      echo "{\"error\": \"tsc not found. Run pnpm install first.\"}"
      exit 1
    fi
  fi

  > "$RAW_LOG"
  rm -f "$FIFO"
  mkfifo "$FIFO"
  write_status "starting" "$app" 0 "[]" "" "" "0"

  # Parser: reads tsc output from FIFO, writes status.json
  (
    local errors_buf="" error_count=0 tsc_pid="0"
    local check_start
    check_start=$(date +%s)

    while IFS= read -r line; do
      if [ "$tsc_pid" = "0" ] && [ -f "$TSC_PID_FILE" ]; then
        tsc_pid=$(cat "$TSC_PID_FILE" 2>/dev/null || echo "0")
      fi

      echo "$line" >> "$RAW_LOG"

      if echo "$line" | grep -qE '(Starting compilation|Starting incremental compilation)'; then
        check_start=$(date +%s)
        errors_buf=""
        error_count=0
        write_status "checking" "$app" 0 "[]" "" "" "$tsc_pid"
        continue
      fi

      if echo "$line" | grep -qE '^.+\([0-9]+,[0-9]+\): error TS[0-9]+:'; then
        local file msg_line msg_col code message
        file=$(echo "$line" | sed -E 's/^(.+)\([0-9]+,[0-9]+\): error TS[0-9]+: .+$/\1/')
        msg_line=$(echo "$line" | sed -E 's/^.+\(([0-9]+),[0-9]+\): error TS[0-9]+: .+$/\1/')
        msg_col=$(echo "$line" | sed -E 's/^.+\([0-9]+,([0-9]+)\): error TS[0-9]+: .+$/\1/')
        code=$(echo "$line" | sed -E 's/^.+\([0-9]+,[0-9]+\): error (TS[0-9]+): .+$/\1/')
        message=$(echo "$line" | sed -E 's/^.+\([0-9]+,[0-9]+\): error TS[0-9]+: (.+)$/\1/')
        file="${file#"${APP_DIR}/"}"
        message=$(echo "$message" | sed 's/\\/\\\\/g; s/"/\\"/g; s/\t/\\t/g')

        local entry="{\"file\": \"${file}\", \"line\": ${msg_line}, \"col\": ${msg_col}, \"code\": \"${code}\", \"message\": \"${message}\"}"
        if [ -z "$errors_buf" ]; then
          errors_buf="$entry"
        else
          errors_buf="${errors_buf}, ${entry}"
        fi
        error_count=$((error_count + 1))
        continue
      fi

      if echo "$line" | grep -qE 'Found [0-9]+ errors?\.'; then
        local now duration_s final_status check_at
        now=$(date +%s)
        duration_s=$((now - check_start))
        check_at=$(date -u +%Y-%m-%dT%H:%M:%SZ)

        if [ "$error_count" -eq 0 ]; then
          final_status="ready"
        else
          final_status="error"
        fi

        write_status "$final_status" "$app" "$error_count" "[${errors_buf}]" "$check_at" "${duration_s}s" "$tsc_pid"
        continue
      fi
    done < "$FIFO"

    write_status "stopped" "$app" 0 "[]" "$(date -u +%Y-%m-%dT%H:%M:%SZ)" "" "0"
    release_lock
  ) &
  local parser_pid=$!

  # tsc process: exec replaces subshell so PID IS the real node/tsc process
  (
    cd "$APP_DIR"
    if [ -n "$NODE_OPTS" ]; then
      NODE_OPTIONS="$NODE_OPTS" exec "$tsc_bin" $TSC_ARGS
    else
      exec "$tsc_bin" $TSC_ARGS
    fi
  ) > "$FIFO" 2>&1 &
  local tsc_pid=$!
  echo "$tsc_pid" > "$TSC_PID_FILE"
  echo "$tsc_pid" > "${LOCKDIR}/pid"

  write_status "checking" "$app" 0 "[]" "" "" "$tsc_pid"

  # Cleanup watcher: when tsc dies, unblock parser
  (
    wait "$tsc_pid" 2>/dev/null || true
    sleep 1
    if kill -0 "$parser_pid" 2>/dev/null; then
      echo "" > "$FIFO" 2>/dev/null || true
    fi
  ) &

  echo "{\"started\": true, \"app\": \"${app}\", \"pid\": ${tsc_pid}, \"statusFile\": \"${STATUS_FILE}\"}"
}

cmd_stop() {
  if [ ! -d "$OUT_DIR" ]; then
    echo "{\"stopped\": true, \"wasRunning\": false}"
    return 0
  fi

  local tsc_pid=""
  if [ -f "$TSC_PID_FILE" ]; then
    tsc_pid=$(cat "$TSC_PID_FILE" 2>/dev/null || echo "")
  fi

  local was_running=false
  if [ -n "$tsc_pid" ] && kill -0 "$tsc_pid" 2>/dev/null; then
    was_running=true
    kill_tree_wait "$tsc_pid"
  fi

  # Kill any processes stuck on the FIFO (best-effort)
  if [ -p "$FIFO" ]; then
    local fifo_pids=""
    fifo_pids=$(lsof -t "$FIFO" 2>/dev/null || fuser "$FIFO" 2>/dev/null || echo "")
    local p
    for p in $fifo_pids; do
      kill "$p" 2>/dev/null || true
    done
  fi

  local app="unknown"
  if [ -f "$STATUS_FILE" ]; then
    app=$(grep -o '"app": "[^"]*"' "$STATUS_FILE" 2>/dev/null | head -1 | sed 's/"app": "//;s/"//' || echo "unknown")
  fi

  rm -f "$TSC_PID_FILE" "$FIFO" 2>/dev/null
  write_status "stopped" "$app" 0 "[]" "$(date -u +%Y-%m-%dT%H:%M:%SZ)" "" "0"
  release_lock

  echo "{\"stopped\": true, \"wasRunning\": ${was_running}}"
}

cmd_status() {
  if [ ! -f "$STATUS_FILE" ]; then
    echo "{\"status\": \"stopped\", \"app\": \"\", \"errorCount\": 0, \"errors\": [], \"lastCheckAt\": \"\", \"lastCheckDuration\": \"\", \"pid\": 0}"
    return 0
  fi

  local tsc_pid=""
  if [ -f "$TSC_PID_FILE" ]; then
    tsc_pid=$(cat "$TSC_PID_FILE" 2>/dev/null || echo "")
  fi

  if [ -n "$tsc_pid" ] && kill -0 "$tsc_pid" 2>/dev/null; then
    cat "$STATUS_FILE"
  else
    echo "{\"status\": \"stopped\", \"app\": \"\", \"errorCount\": 0, \"errors\": [], \"lastCheckAt\": \"\", \"lastCheckDuration\": \"\", \"pid\": 0}"
  fi
}

cmd_errors() {
  if [ ! -f "$STATUS_FILE" ]; then
    echo "[]"
    return 0
  fi

  local content
  content=$(cat "$STATUS_FILE")

  if command -v python3 &>/dev/null; then
    echo "$content" | python3 -c "import sys,json; d=json.load(sys.stdin); print(json.dumps(d.get('errors',[]),indent=2))"
  else
    echo "$content" | sed -n '/"errors":/,/\]/p' | sed '1s/.*"errors": //'
  fi
}

cmd_wait() {
  if [ ! -f "$STATUS_FILE" ] && [ ! -f "$TSC_PID_FILE" ]; then
    echo "{\"error\": \"No watcher running. Start one with: pnpm tsc:watch:start\"}"
    exit 1
  fi

  local start_check_at=""
  if [ -f "$STATUS_FILE" ]; then
    start_check_at=$(grep -o '"lastCheckAt": "[^"]*"' "$STATUS_FILE" 2>/dev/null | sed 's/"lastCheckAt": "//;s/"//' || echo "")
  fi

  local poll_count=0
  local max_polls=400  # 120s / 0.3s

  while true; do
    if [ ! -f "$TSC_PID_FILE" ]; then
      echo "{\"error\": \"Watcher process exited unexpectedly.\"}"
      exit 1
    fi

    local tsc_pid
    tsc_pid=$(cat "$TSC_PID_FILE" 2>/dev/null || echo "")
    if [ -n "$tsc_pid" ] && ! kill -0 "$tsc_pid" 2>/dev/null; then
      echo "{\"error\": \"Watcher process (PID ${tsc_pid}) is no longer running.\"}"
      exit 1
    fi

    if [ -f "$STATUS_FILE" ]; then
      local current_status current_check_at
      current_status=$(grep -o '"status": "[^"]*"' "$STATUS_FILE" 2>/dev/null | head -1 | sed 's/"status": "//;s/"//' || echo "")
      current_check_at=$(grep -o '"lastCheckAt": "[^"]*"' "$STATUS_FILE" 2>/dev/null | sed 's/"lastCheckAt": "//;s/"//' || echo "")

      if [ "$current_status" = "ready" ] || [ "$current_status" = "error" ]; then
        if [ "$current_check_at" != "$start_check_at" ] || [ -z "$start_check_at" ]; then
          cat "$STATUS_FILE"
          return 0
        fi
      fi

      if [ "$current_status" = "stopped" ]; then
        echo "{\"error\": \"Watcher stopped while waiting.\"}"
        exit 1
      fi
    fi

    sleep "$WAIT_POLL"
    poll_count=$((poll_count + 1))

    if [ "$poll_count" -ge "$max_polls" ]; then
      echo "{\"error\": \"Timed out after ${WAIT_TIMEOUT}s waiting for type-check to complete.\"}"
      exit 1
    fi
  done
}

# -- Main dispatch ---------------------------------------------

CMD="${1:-}"
shift || true

case "$CMD" in
  start)  cmd_start "$@" ;;
  stop)   cmd_stop ;;
  status) cmd_status ;;
  errors) cmd_errors ;;
  wait)   cmd_wait ;;
  *)
    echo "Usage: tsc-watch-agent.sh  [args]"
    echo ""
    echo "Commands:"
    echo "  start [app]  Start tsc --watch (default: app1)"
    echo "               Apps: app1, app2"
    echo "  stop         Stop the watcher"
    echo "  status       Print JSON status"
    echo "  errors       Print errors array"
    echo "  wait         Block until next check completes"
    exit 1
    ;;
esac

Final note

This isn’t really about TypeScript. It’s about a new reality:

When multiple automated actors work on the same repo, you need coordination primitives. Task runners with caching (Turborepo/Nx) solve the “do less work” problem, but you still need a thin layer to solve the “don’t do the same work at the same time” problem.

Serialize identical work. Cache it. Keep hot compilers alive for loops. And give agents structured outputs so they don’t waste cycles parsing noise.

That’s how you stop your AI assistants from fighting over your CPU—and turn them into an actual multiplier.