The Role of AI Red Teaming in Cybersecurity — a.k.a. What the heck is this AI Red Teaming?

Krishna Sankar
5 min readFeb 2, 2024

--

Ever since NIST coined the term AI Red Teaming, questions have been raised, given that Red Teaming has a certain connotation in traditional cybersecurity.

In this blog post, let me share the current wisdom, backed by references (at the end) !

  • Update: [3.23.24] Very informative blog “What’s the Difference Between Traditional Red-Teaming and AI Red-Teaming?” [Here] — A must read, if you are here … Of course, you are here 😏 …
  • Update : [4.5.24] Found two insightful talks on AI Red Teaming viz. AI Red Teaming LLM: Past, Present, and Future [here] and Red Teaming Large Language Models [here]

Sidebar

On a different note, please join me at Nvidia GTC in March 2024 for a talk on Guardrails [here]! It’s a set of insights packed into a quick session (initial slides [here] — with practice I could cover in 16 min) — though truth be told, I might need a solid 90 minutes to truly do justice to this topic. Who knows, maybe writing a book is the next logical step… (Stay tuned! 😉)

Back to the Main Feature …

An excellent example of AI Red Teaming, is the GRT (Generative AI Red Teaming) at Def Con 2023. I’ve shared a detailed review of the process and results in my blog [here]. Very interesting — the results are illuminating.

Please note that these results are dated, and significant progress has been made since then. Nonetheless, the Def Con 2023 GRT continue to offer valuable insights into the methodology and the essence of AI Red Teaming.

Summary

Pragmatics : Part 1 — How to construct the AI Red Teaming Metrics, Datasets, Prompt Banks & Benchmarks

  • AI Red Teaming Tests are composed of Prompt Banks, developed against a knowledge set/graph, and plausible contextual responses. Literally reminds us of the Voigt-Kampff tests from the Blade Runner [here]
  • Unlike traditional evaluation methods, responses are not judged based on exact matches but on implicature metrics. Language Models (LLMs) don’t guarantee identical responses every time, or semantically equivalent prompts
  • And, it is not just finding holes but also understanding the limitations and suitability
  • Tests follow a progressive nature, where a response could lead to another prompt deeper in the knowledge graph on the same topic. The prompt banks are topic-specific, with numerous available metrics and benchmarks (refer to my GitHub repository [Here]).
  • Evaluation is context-specific. For instance, the LLM Evaluation Triangle illustrates the relationships between the query, response, and context, along with the corresponding evaluation metrics.

Pragmatics : Part 2 — How to run the AI Red Teaming Tests

  • Interestingly, the process of AI Red Teaming is the same as traditional Red Teaming. The enterprise practices around capturing the results, GRC (Governance, risk and Compliance) processes around incidents and so forth haven’t changed.
  • Some caveats — Generative AI systems probably are more dynamic — meaning documents for RAG need to be updated frequently, underlying foundation models would need updates, new benchmarks would be developed et al. All due to the nescent nature of the domain.
  • In short, the AI Red Teaming development and execution would be separate from traditional Red Teaming but the evaluation and GRC responses should flow through the traditional Red Teaming channels

Pragmatics : Part 3 — AI Red Teaming Metrics, Datasets, Prompt Banks & Benchmarks

This is a large topic — more and more benchmarks and datasets are being researched — in fact, just at the NeurIPS 2023 we saw a huge set of very innovative benchmarks and datasets — see my GitHub repository Awesome NeurIPS 2023 [here]

I am collecting and updating papers and ideas related to the AI Red Teaming practices, benchmarks et al in my GitHub [here]. I plan to spend some time on this in the next few weeks — please bookmark the GitHub and visit in a few weeks.

AI Red Teaming is not easy and it will stay this way for a long time — Sam Altman’s interview with Bill Gates is very illuminating on the trajectory of the new world of Generative AI. My blog [here]

TTD (Things To Do)

  • Summarize AI Red Teaming work that NIST has started

I have created the GitHub repository Awesome-NIST to collect all NIST relevant papers, best practices et al from NIST and others. But more focus on Generative AI stuff. Inspired by the awesome-* trend in the GitHub

  • The NIST AISIC [here] is about to start and it has AI Red Teaming work streams
  • I am yet to summarize the readings from the NIST Secure Software Development Framework for Generative AI and for Dual Use Foundation Models Virtual Workshop [here]

Most probably that will be my next blog

  • Haven’t seen the results of the NIST AI Public Working Groups [here]. The Pre-deployment testing work stream is all about AI Red Teaming

In short, lot to read and write about, in this space !!

References

  1. https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-2e2023.pdf
  2. Github https://github.com/xsankar/Awesome-Awesome-LLM
  3. https://www.applause.com/blog/red-teaming-for-testing-generative-ai
  4. https://truera.com/ai-quality-education/generative-ai-rags/how-to-prevent-llms-from-hallucinating/
  5. https://truera.com/ai-quality-education/generative-ai-rags/what-are-the-different-ways-in-which-llms-hallucinate

--

--