IARPA wants new options to spot when large language models exhibit potentially harmful behavior

The intelligence community’s primary research arm is exploring new ways to detect and combat vulnerabilities, biases and threats associated with emerging generative AI and large language model technologies that are increasingly informing U.S. intel analyses. 
This photograph taken in Toulouse, southwestern France, on July 18, 2023 shows a screen displaying the logo of Bard AI, a conversational artificial intelligence software application developed by Google, and ChatGPT. (Photo by LIONEL BONAVENTURE/AFP via Getty Images)

The intelligence community’s primary research arm is exploring new ways to detect and combat vulnerabilities, biases and threats associated with emerging generative AI and large language model (LLM) technologies that are increasingly informing U.S. intel analyses. 

Officials from the Intelligence Advanced Research Projects Activity (IARPA) Office of Analysis detailed their intent to “elicit frameworks to categorize and characterize” such security risks, via a request for information that invites organizations to respond with input by Aug. 21. 

“Recent generative AI/LLM models are complex and powerful, and researchers are only beginning to understand the ramifications of their adoption,” IARPA program manager Tim McKinnon told DefenseScoop in an interview over email on Friday. 

“Through this RFI, IARPA hopes to gain a broad view of the landscape of threats and vulnerabilities posed by this technology, with the ultimate goal of better understanding which aspects are most critical to the Intelligence Community safely adopting this technology,” he explained. 


Since they started being unleashed for broad use by the public late last year, large language models and generative AI-enabled products — like OpenAI’s ChatGPT, Microsoft’s Bing chatbot, or Google’s BardAI — have attracted a great deal of attention around the world. This is “due, among other things, to their human-like interaction with users,” IARPA officials note in their RFI.

Broadly, LLMs refer to deep learning algorithms that are trained with massive, evolving datasets to recognize, summarize, translate, predict and generate convincing, conversational text and other forms of media.

According to McKinnon, “IARPA’s interest in LLMs long predates the public release of ChatGPT.”

“While the colossal scientific achievements in human language technology only entered the public eye over the past few months, this field has been a critical focus area for IARPA and the IC for the past decade. IARPA has been a major driver of LLM technology, with over 600 publications on human language technology in recent years,” McKinnon said. 

Performers on past and present IARPA-led projects (such as REASON, MATERIAL, BETTER and HIATUS) have researched and engineered large language models to address what he called some of the intelligence community’s biggest challenges.


“Research addresses machine translation and summarizing texts from low-resource languages — languages with very little model training to date, like Somali and Pashto — identifying and retrieving personalized, mission-relevant event data from large multilingual news streams; and generating linguistic fingerprints to both attribute authorship of a document and protect an author’s privacy,” McKinnon noted. 

As suggested, these technologies hold a great deal of promise to substantially transform how intelligence analysts work in the forthcoming years, but IC and other U.S. government leaders are also concerned about their potential for harm.

Through additional research, spotlighted in the recently released request for information, McKinnon’s team aims to advance agencies’ capacity to pinpoint and mitigate any threats to their users posed by model-based vulnerabilities.

In IARPA’s new request, respondents are asked to share frameworks their organizations have developed for making sense of large language model threats and vulnerabilities — and approaches for targeting and reducing the trackable risks. 

“LLMs have been shown to exhibit erroneous and potentially harmful behavior, posing threats to the end-users,” officials wrote in the RFI.


Prompt injections, data leakage and unauthorized code execution mark some of the threats and vulnerabilities characterized in existing taxonomies. IARPA is interested in those, as well as others that are more novel and less identifiable. 

Notably, the agency is interested in the characterizations and methods for both white box and black box models. 

“In some cases, the IC will have full access to LLMs — such as by downloading open source LLM models — while in other cases the IC will only be able to interact with a given model through a user interface, like ChatGPT, Bing, and others. White box methods would be applied in the former scenario, since they assume access to model-internal information, while black box methods are used in the latter scenario and assume limited access, including model inputs and outputs,” McKinnon told DefenseScoop. 

He also emphasized that IARPA depends on RFIs like this one to support and drive the development of future innovation-pushing initiatives.

“Understanding state-of-the-art technologies and the potential impacts of disruptive technologies on intelligence analysis is critical when making decisions to pursue high-risk, high-payoff research programs,” McKinnon said.  

Brandi Vincent

Written by Brandi Vincent

Brandi Vincent is DefenseScoop's Pentagon correspondent. She reports on emerging and disruptive technologies, and associated policies, impacting the Defense Department and its personnel. Prior to joining Scoop News Group, Brandi produced a long-form documentary and worked as a journalist at Nextgov, Snapchat and NBC Network. She was named a 2021 Paul Miller Washington Fellow by the National Press Foundation and was awarded SIIA’s 2020 Jesse H. Neal Award for Best News Coverage. Brandi grew up in Louisiana and received a master’s degree in journalism from the University of Maryland.

Latest Podcasts