Talisman Sabre 2025 (TS-25) marked a quiet milestone in U.S. and coalition defense innovation. For the first time in the biennial exercise’s history, agentic artificial intelligence—AI systems capable of reasoning, proposing, and executing actions in real time—was deployed and tested alongside planners and operators.
As the DoD aims to bring transformative capabilities to the national security mission in compressed timelines, Talisman Sabre offers a unique opportunity to evaluate AI performance under the constraints of operational settings.
At Vannevar, we believe the future of deterrence will be shaped by software that thinks, not just dashboards. That belief was tested in Northern Australia where agentic models worked alongside humans to model threats, simulate responses, and recommend a course of action (COA).
Over the course of the exercise, Vannevar’s mission and engineering teams supported missions across two groups:
- PMW-120 - the U.S. Naval Warfare PEO focused on Battlespace Awareness and Information Warfare capabilities. Since 2024, DIU has been working with INDOPACOM to operationalize AI across the theater with Vannevar’s “ARCHER” platform to support information warfighting key to Admiral Paparo’s “Prevail” concept.
- The Multinational Information Operations Centre (MIOC), a new effort proactively managing IO efforts among partner nations specifically for Exercise Talisman Sabre.
Across both groups, more than a dozen AI-powered products supported daily operations, were evaluated in daily briefings, and accelerated the use of agentic systems for ongoing deterrence in the Indo-Pacific.
Here are a few new workflows and capabilities tested at TS-25:
Deployment on the High Side
During the exercise, Vannevar successfully deployed a JWICS accessible application, a critical step enabling analysts to interact with our models and outputs on a secure and classified network. Vannevar worked with the U.S. Navy’s Battlespace Awareness & Information Operations Program Office (PMW 120) and the U.S. Navy’s Project Overmatch to leverage a cyber security process called Rapid Assess and Incorporate Software Engineering (RAISE), designed to fast-track secure software to classified environments.
The process involved a PMW 120 Baseline Configuration Review to validate Vannevar’s security posture and following a documentation review of our architecture, controls, and dependencies, the application was approved to deploy on the DCGS-N Central Ashore System, with a pre-approved ATO boundary. In less than four weeks, test and evaluation users were able to access the application on JWICS.
A major milestone and a first for Vannevar, high-side deployment comes with the technical challenges of running powerful AI applications in a secure on-prem data server.
Simulating Threats, Calculating Responses
In the lead-up to Talisman Sabre 25, Vannevar deployed agentic AI to simulate red force behavior under a range of gray zone and conventional scenarios. These simulations analyzed a combination of open-source signals, operational patterns, and evolving geopolitical conditions to generate probabilistic assessments of likely adversary actions.
Prior to the exercise, Vannevar’s AI-driven workflows modeled a set of scenarios near sensitive training areas. The agents conducted follow-on tasks to generate event probabilities, risk assessments, and recommendations. During TS-25 users were able to validate the system’s predictions with ground truth assessments, and participants in the exercise ultimately responded faster and more precisely to competitor actions.
Vannevar’s COA agents accurately predicted four scenarios that materialized during the exercise, a promising degree of accuracy for early prototypes. Ground truth data and user feedback will be used to refine future models to improve performance for future operations.
Detecting Dual-Use Threats at Sea
Agentic workflows were also used to analyze adversary maritime traffic patterns in the lead-up to the exercise. By reasoning over multiple sources of raw data, agents surfaced early warning indicators of suspicious traffic patterns.
Dual-use exploitation involves vessels operating with commercial flags, but likely collecting Intelligence, Surveillance, and Reconnaissance (ISR) under the guise of trade. These detections set in motion a series of follow-on actions to verify the identification of vessels and determine their ownership and entity resolution, revealing adversary financial relationships.
Human-Machine Teaming in the Information Environment
In one workflow, Vannevar deployed agentic systems that reasoned over foreign media, language-specific narratives, and propaganda ecosystems to analyze key misinformation strategies leveraged during the exercise. Adversary narratives include positioning TS-25 as a coercion technique aimed at ASEAN countries into participation.
AI-powered workflows then characterize the media landscape of southeast Asia to understand if competitor narratives take root and recommend strategic messages to counter misinformation. In one instance, the reasoning model was used to co-author strategic messaging to highlight the Humanitarian Relief and Disaster Response (HADR) mission undertaken by one ASEAN coalition partner.
Exposing Dark Web Surveillance of U.S. Movements
Finally, an AI workflow was used for force protection, as early signals from illicit forums and dark web activity were discovered to be tracking U.S. and allied aircraft and carrier movements.
The agents surfaced mentions of specific units across OSINT sources, cueing a more targeted analysis of illicit forums and hard-to-access sources to highlight threats posed by non-state actors.
Iterate Like It Matters
The most important outcome wasn’t any single deployment or recommendation. It was the ability to rapidly prototype, test, and improve agentic systems in a high-consequence operational setting. Engineers and operators worked in tight loops, adjusting models based on mission feedback, validating recommendations against ground truth, and building increasing trust with human mission partners.
The faster we can get software into real-world hands on secure networks, the faster we can learn what works and what doesn’t. The overall feedback so far has been positive, as one officer summarized:
“The insights we were able to gather utilizing different AI-driven tools were essential in informing our understanding of narrative threats and opportunities in the information environment. This analysis directly informed our recommendations on strategic communications surrounding the exercise.”
In our experience, AI systems only work for defense with a singular focus on driving mission outcomes. That means holistically developing the entire infrastructure, from mission-relevant datasets for foundational models, built-in verification and military tradecraft guiding LLM outputs, and finally humans remain in the loop for adaptive, forward-deployed airtight integration.
From Exercise to Everyday Use
Talisman Sabre 25 may have ended, but the work continues.
Deployment on JWICS was a critical technical milestone that can unlock AI for thousands of new users. However, more apps need to deploy on the high-side in a production capacity.
Many of the agentic workflows tested during the exercise are now undergoing iteration based on user feedback, incorporating advanced features and improved performance.
Our mission team is hard at work showing that operational AI can improve our daily deterrence posture in the Pacific theater this year instead of this decade.
Closing Thoughts
We are witnessing greater complexity in the Indo-Pacific as traditional analytical techniques struggle to quantify and command deterrence. The next fight may not start with missiles or tanks, but with misattributed gray zone activity or the slow bleed of influence operations.
Agentic AI will not and should not replace humans, but can help us outthink and outmaneuver even the most capable adversaries with the goal of preserving human life and open societies.