IARPA wants new options to spot when large language models exhibit potentially harmful behavior

The intelligence community’s primary research arm is exploring new ways to detect and combat vulnerabilities, biases and threats associated with emerging generative AI and large language model technologies that are increasingly informing U.S. intel analyses.

By Brandi Vincent

August 7, 2023

This photograph taken in Toulouse, southwestern France, on July 18, 2023 shows a screen displaying the logo of Bard AI, a conversational artificial intelligence software application developed by Google, and ChatGPT. (Photo by LIONEL BONAVENTURE/AFP via Getty Images)

The intelligence community’s primary research arm is exploring new ways to detect and combat vulnerabilities, biases and threats associated with emerging generative AI and large language model (LLM) technologies that are increasingly informing U.S. intel analyses.

Officials from the Intelligence Advanced Research Projects Activity (IARPA) Office of Analysis detailed their intent to “elicit frameworks to categorize and characterize” such security risks, via a request for information that invites organizations to respond with input by Aug. 21.

“Recent generative AI/LLM models are complex and powerful, and researchers are only beginning to understand the ramifications of their adoption,” IARPA program manager Tim McKinnon told DefenseScoop in an interview over email on Friday.

“Through this RFI, IARPA hopes to gain a broad view of the landscape of threats and vulnerabilities posed by this technology, with the ultimate goal of better understanding which aspects are most critical to the Intelligence Community safely adopting this technology,” he explained.

Since they started being unleashed for broad use by the public late last year, large language models and generative AI-enabled products — like OpenAI’s ChatGPT, Microsoft’s Bing chatbot, or Google’s BardAI — have attracted a great deal of attention around the world. This is “due, among other things, to their human-like interaction with users,” IARPA officials note in their RFI.

Broadly, LLMs refer to deep learning algorithms that are trained with massive, evolving datasets to recognize, summarize, translate, predict and generate convincing, conversational text and other forms of media.

According to McKinnon, “IARPA’s interest in LLMs long predates the public release of ChatGPT.”

“While the colossal scientific achievements in human language technology only entered the public eye over the past few months, this field has been a critical focus area for IARPA and the IC for the past decade. IARPA has been a major driver of LLM technology, with over 600 publications on human language technology in recent years,” McKinnon said.

Performers on past and present IARPA-led projects (such as REASON, MATERIAL, BETTER and HIATUS) have researched and engineered large language models to address what he called some of the intelligence community’s biggest challenges.

“Research addresses machine translation and summarizing texts from low-resource languages — languages with very little model training to date, like Somali and Pashto — identifying and retrieving personalized, mission-relevant event data from large multilingual news streams; and generating linguistic fingerprints to both attribute authorship of a document and protect an author’s privacy,” McKinnon noted.

As suggested, these technologies hold a great deal of promise to substantially transform how intelligence analysts work in the forthcoming years, but IC and other U.S. government leaders are also concerned about their potential for harm.

Through additional research, spotlighted in the recently released request for information, McKinnon’s team aims to advance agencies’ capacity to pinpoint and mitigate any threats to their users posed by model-based vulnerabilities.

In IARPA’s new request, respondents are asked to share frameworks their organizations have developed for making sense of large language model threats and vulnerabilities — and approaches for targeting and reducing the trackable risks.

“LLMs have been shown to exhibit erroneous and potentially harmful behavior, posing threats to the end-users,” officials wrote in the RFI.

Prompt injections, data leakage and unauthorized code execution mark some of the threats and vulnerabilities characterized in existing taxonomies. IARPA is interested in those, as well as others that are more novel and less identifiable.

Notably, the agency is interested in the characterizations and methods for both white box and black box models.

“In some cases, the IC will have full access to LLMs — such as by downloading open source LLM models — while in other cases the IC will only be able to interact with a given model through a user interface, like ChatGPT, Bing, and others. White box methods would be applied in the former scenario, since they assume access to model-internal information, while black box methods are used in the latter scenario and assume limited access, including model inputs and outputs,” McKinnon told DefenseScoop.

He also emphasized that IARPA depends on RFIs like this one to support and drive the development of future innovation-pushing initiatives.

“Understanding state-of-the-art technologies and the potential impacts of disruptive technologies on intelligence analysis is critical when making decisions to pursue high-risk, high-payoff research programs,” McKinnon said.

IARPA wants new options to spot when large language models exhibit potentially harmful behavior

More Like This

Pentagon poised to launch inaugural ‘challenge’ for Global Information Dominance Experiments

Air Force issues presolicitation for next-gen target tracking

DARPA moves to mitigate possible unintended consequences of AI

Top Stories

NGA buys maritime data to help Indo-Pacific Command, via first-ever CSO

Dashboard aims to give commanders increased ability to assess cyber team readiness

Army’s Cyber Quest sought to standardize data from vendors

Air Force plans ‘sprint week’ to experiment with ABMS solutions from industry

Joint force, international partners, contractors test command and control capabilities in Pacific exercise

Navy to reset and reinvigorate Operation Cattle Drive

DOD’s new Arctic strategy calls for better tech to ‘monitor and respond’

More Scoops

Key US Navy shipyard in Japan eyeing large language models, other AI tools

Latest Podcasts

How the Navy is reducing workforce friction to improve mission outcomes

How DARPA is looking to AI to fend off cyber vulnerabilities through a challenge program

How the DOD protects national security interests by monitoring climate change

Splunk’s Paul Kurtz on the power of automation within DOD

Weapons

Cyber

AI

IT