Vannevar Engineering Case Study: Simulating Adversary Behavior With AI

Erik Rozi
Erik Rozi
Forward Deployed Engineer
Vannevar Engineering Case Study: Simulating Adversary Behavior With AI

For years, we’ve been working at the center of threat actor problems, building a deep understanding of how adversaries think, act, deceive, and adapt. That experience gives us a unique vantage point: we know the threat environment intimately, making us especially suited to model it.

That’s why we’ve been building AI/ML systems to simulate adversary behavior. AI systems are able to rapidly ingest massive, diverse datasets and generate plausible possibilities that expose planners to options they might not otherwise consider, rooted in data-driven evidence. These systems aren’t replacements. Our mission is to augment human decision makers and give them the ability to respond with speed, precision, and creativity.

For engineers, this space is full of unsolved problems at the frontier of ML, distributed systems, and data engineering. Solving them requires top-tier technical talent who can invent creative approaches rooted in expert analysis, push models further, and build systems that stand up in the real world.

Mastering AI simulations of adversary behavior strengthens deterrence today, prevents war tomorrow, and ensures preparedness to prevail if conflict occurs.

Defense problems are never the same twice

- Thomas Crosley in Defense is the Vertical for Agentic AI

A Proven Use Case: Talisman Sabre 25

Earlier this year, we deployed these systems during Talisman Sabre 25, the largest bilateral military exercise between Australia and the United States. Our AI systems were tasked with simulating adversary behavior across both gray zone and conventional scenarios. Vannevar’s agents accurately predicted four scenarios that materialized during the exercise, a promising degree of accuracy for early prototypes.

This was a critical milestone. For engineers, it’s proof that we’re not just doing experiments in a lab, but building AI-enabled systems that provide tangible value in real-life scenarios. Being the first to deploy agents in these real-world settings creates a flywheel where every exercise accelerates our ability to design, adapt, and field more capable next-generation models.

Turning Data into Impact

At Vannevar, our advantage starts with data. We work with over a petabyte of the best privately held national security datasets, curated over the course of 6 years with direct access to adversarial environments. This collection is unmatched, spanning long form documents, geospatial information, publicly available feeds, and non-textual sources. It’s also high fidelity; we collect sources that actually matter.

But scale alone doesn’t solve the problem. Simply throwing this data at a model fails completely. To unlock its value, engineers must design pipelines that:

  • Handle multiple data types, from imagery to policy documents, without breaking downstream systems
  • Combine agents and traditional ML systems, leveraging each where they’re strongest
  • Filter massive global collections while preserving authoritative signal and ground-truth accuracy
  • Merge human-verified data with noisy global collection to support realistic modeling

This is mission critical. Our collection and processing pipelines are designed to protect employees and clients operating in dangerous environments, while ensuring the system’s predictions remain grounded in reality and the adversary’s thought process.

These problems require creative technical innovation across data engineering and cutting-edge ML research. For engineers, this data is a unique sandbox, both a differentiator and a proving ground, where immense insight can be drawn and next-generation AI agents can be built.

System Design and Reasoning

The agentic architecture integrates three pillars:

  1. Ingestion and weighting: custom data pipelines built for scale, safety, and adaptability
  2. Reasoning and generation: hybrid use of LLMs where they excel (summarization, search, creative exploration) and deterministic systems where precision is required
  3. Integration and feedback: outputs connected to workflows like wargaming and operator tools, continuously refined by user feedback

The engineering challenge lies in orchestrating these components. It involves knowing where LLMs are strong, where they are brittle, and how to combine them into full agentic systems that are robust, interpretable, and performant. We anticipate that agents such as these are an inflection point for defense.

What it Took to Ship

Getting from prototypes to deployment wasn’t a straight line. For Talisman Sabre alone, data, ML, platform, and UX engineers worked in tandem by scaling ingestion to the petabyte level, pushing model boundaries, building reliable environments, and designing intuitive operator tools.

Within our multi-functional engineering team, we’ve hit dead ends and rebuilt components from scratch. The breakthroughs came when we stopped treating this as a single-model problem and engineered it as a system, with customers at the center. That’s how we moved from demos to tools operators actually trust.

Key Lessons Learned

  • Traceability is critical. Our users can’t solely rely on black boxes. They need to be able to verify themselves.
  • Human-centered design is as important as modeling. Complex adversary scenarios require expressive inputs, intuitive visualizations, and continuous expert feedback.
  • Partnerships drive impact. Unique data access and expert pipelines closed gaps models couldn’t solve alone.

Looking Ahead

Our efforts do not stop here. The next frontier requires creative technical innovation, including richer relational modeling of adversary behaviors, new reasoning frameworks across novel data modalities, distributed infrastructure that can handle orders of magnitude more data without losing speed in deployed environments, and entirely new UX paradigms that make complex models intuitive to use.

Talisman Sabre 25 showed that AI can already deliver operational value. But to go from promising prototypes to systems that reshape decision-making at scale, we need the very best engineers that care for solving impactful problems. Because this isn’t just about better software. It’s about building the tools that help prevent tomorrow’s wars.

If you’re looking for pre-existing playbooks, this isn’t the place. Ordinary problems won’t be found here. But if you want to work at the frontier, we’re building it. The stakes are real, the challenges are unsolved, and the impact is immense.

We need top technical talent to push our models further, invent new approaches, and solve problems nobody else has cracked yet. Come build the future of adversary modeling with us.

See our Open Roles

Our mission is urgent

Join us to meet the challenge

We’re hiring for roles across the company.


Related Posts

EP. 07: Why Reasoning Agents Change Everything in Defense |
EP. 05: Leave Your Ego at the Door |
EP. 02: Hardware vs Software and Building Complete Systems |