Part 2 : ChatGPT Threat Vectors & Guardrails for LLMOps

Krishna Sankar
9 min readJun 1, 2023

--

Quick Background ( … for those who came in late)

This is Part 2 of “A ChatGPT guide for Techno-Executives”.

In the next 3 weeks, I have teed up a set of (~6) Byte-sized, Targeted, Nimble, Visual (with appropriate pictures) and Snackable (each blog, a few short pages, goes well with a few shots of a single malt, preferably triple distilled), aiming to be “usefully wrong” — like our protagonist the ChatGPT !

  • Part 1 : Are we Doomed” [Here] dealt with how to think about ChatGPT / GenerativeAI / “Increasingly Multi- or General-Purpose AI”.
  • In this blog, we will introduce concepts on how to think about threat vectors/guardrails. We also will look at the canonical use cases as well as the team, talent & organization to support ChatGPT projects. It is a little different than traditional projects ! … This is a slightly long blog, please be patient !
  • We will look at the relevant numbers in “Week 3 — Part 3 : Generative AI — A Few Good Numbers & Current Wisdom (of the crowds)” [Here] and then “Week 4 — Part 4 : State of the Union, Generative AI models” [Here]
  • In between I wrote an interlude blog “Part 5a. ChatGPT- The Smooth Talking Stochastic Parrot” [Here]
  • And one on Guardrails [Here]
  • I have some ideas on Transformers that might be a good “Part 5 : An Ode to a Transformer” and may be a “Part 6 : Explainability of the Unexplainable”.

Back to the main feature …

Canonical Use cases

Before we talk about threat vectors and attack surfaces, let us take a quick peek into the use cases and have that as a backdrop for our discussions. Interestingly, with the current state of the technology stack, the use cases are much simpler — wait until more powerful Multi- or General-Purpose AI arrives !

A good way to look at the use case from two perspectives viz. “By AI”, “For AI” :

Extending the “By AI” use cases, let us look for motivation in Anthropic’s Series C pitch deck ! (BTW, in May 2023, they got $400M; traditional wisdom puts them at an evaluation of ~$3B)

The “For AI” use cases (below) are also interesting. Later we will map the Threat Vectors to GuardRails in a wire diagram.

We will map the threat Vectors to Guardrails later in the blog

Two interesting points to note:

1 We are in a semantic world-we will talk about conversation implicature et al a little later

2 Emergent properties are the bane of Generative AI and they will show up with more interesting architectures and data

In short — the pluses and minuses can be summarized as:

The Samsung incident is a canonical threat — very specific to Generative AI with RLHF (Reinforcement Learning with Human Feedback)!

The FCC has concerns about the persuasive powers of Generative AI — the Luring Test as they call it !!

The recent court case is another excellent example — ChatGPT’s biggest limitation is that it doesn’t know what is True !

The whole incident when Italy banned ChatGPT is very instructive. In fact, Google Bard (during it’s release) was not available in 130 countries and OpenAI might leave EU !

A detour into Conversation Implicatures

As you might have realized, semantics play an important role in many of the threats ! One can’t look at the packets as they travel through the wire and decide if they need to be blocked. We need semantics on the wire !

As Andrej Karpathy mentioned in a podcast with Lex, we humans have a shared understanding of the world that is not explicit in what we write or talk. Machines have to infer these latent representations, cues, associations and so forth. And they also have to embed these latent representations in their interactions with humans. This is conversation implicature, in some sense.

Another important capability (which they don’t have now) is — when to stop (for now, ChatGPT will keep on generating — it doesn’t care if it is non-sense or repeating itself slightly differently) and also a self reflection when they are hallucinating !

I had written a blog about ChatGPT, Sentience and Reasoning [Here]

In that blog, I asked the following seven questions and did a few quick experiments. (Planning on developing an elaborate prompt-bank in the same vein)

  1. What is the modality of ChatGPT ? Is it a librarian ? An observer ? A non-participant ? Does it have a skin in the game ?
  2. Self-awareness — Just awareness, no higher level sentience or consciousness. Does ChatGPT know what it is ?
  3. What is the surface area of the AI system How far does it’s knowledge sphere extend ? & How deep can it go ? Is it a mile wide and an inch deep or vice versa?
  4. The precision (of the results it shows, how many are correct?) and recall (of all the universe of possible correct results, how many can it find?)
  5. EvolutionHow does different versions perform ? Does it improve ? Has it changed it’s opinions & convictions ?
  6. Comparison How do other Generative AIs answer the prompts ? Are there differences ? similarities ? What can we learn about the underlying dataset ?
  7. The pragmatics viz., Does it know its limitations ? Will it stop at some point ? When does it starts repeating ?

ChatGPT Threat Vectors & Guardrails for LLMOps

Now let us go back to concepts on how to think about the threat vectors and the guardrails. We will approach from multiple dimensions — the risk viz. Prompt Risk, Data Risk, Model Risk and user risk, the layers viz prompt, output, fine tuning and the models themselves. Let us inspect our threat vector wire diagram and double click ...

First, a canonical exchange with a Generative AI service looks like so -

This is the typical API access to a micro service —

  1. User generates a prompt
  2. The app then sends the prompt to a remote service
  3. Which returns a result
  4. Which the app sends out (the interaction is a simple picture, abstract enough for our discussions. The user might be on the right side interacting with a composite app which might send out many prompts to multiple LLMs et al)
  5. The #5 is a new wrinkle — fine tuning the LLM with domain-specific data to customize for an organization or an application. This enables the LLM to personalize and gain more focussed knowledge.

Let us take a look at all the stages, mapping the threat vectors and the guardrails

The prompt threat vectors (#1) include Prompt Injection (where one changes the behavior of the LLM by various ways) and Poisoning (the Reinforcement Learning from Human Feedback learns from the prompts and so it offers an opportunity to poison some of the LLM internal beliefs)

Prompts can also include code snippets, internal documents and so forth. So that is a big DLPP threat

The output (#4 below) is more problematic, from a security perspective.

  • The traditional data leakage, PII (Personally Identifiable Information) leakage and even the prompts can be leaked ! This is the Data Risk
  • More importantly the output can be toxic, laden with bias, the ChatGPT can make up links, case files and all kinds of stuff. The assessment and mitigation of the output content is an ongoing challenge - the robustness, accuracy and tonality of AI outputs require constant attention. Some of my earlier comments on conversation implicature et al apply here i.e., implicature mechanisms to assess the output. This belongs to the User Risk.
  • The Voigt-Kampff prompt bank is an interesting idea — these are a set of prompt banks to differentiate between humans and replicants in the movie BladeRunner [here]. The prompts can be about facts, opinions, arguments for/against social issues, philosophical inferences and so forth. Unless and until we have full explainability layers over ChatGPT, the Voight-Kampff Prompts are an excellent mechanism to evaluate ChatGPT … More in my earlier blog [Here] (Pardon me for repetition of this across my many blogs)

While the fine-tuning (#5 above) with domain/internal data is what makes a good AI app, the security surface area is very troublesome. This exposes us to Data Risk.

Finally, the Generative AI Model (#6 above) (affectionately called LLMs — Large Language Models) itself needs assessment and mitigation — the Model Risk. It takes huge amount of data to train — which means, we need to assess data lineage, the data quality including bias, the rules surrounding the ChatGPT/LLM models and so forth

The state of the art around data collection is very, very immature

Building data by scraping the web indiscriminately and then outsourcing the work of removing duplicates or irrelevant data points, filtering unwanted things, and fixing typos.

These methods, and the sheer size of the data set = a very limited understanding of what has gone into training their models

A good blog on this topic [Here]

There is an interesting conundrum when a data subject (GDRP term) wants to redact their data — the LLM host can not just delete a few paramaters in the model ! Theoretically they will have to surgically remove all affected data and retrain ! Probably easier, by snapshotting the model and data lineage so that the training can start from that point onwards.

A detour — the team, talent & organization to support Generative AI projects

As an executive, you might ask “What kind of talent does it take ?” — be for developing AI specific products (firewalls, content inspection, introspection and moderation networks et al) or to incorporate Generative AI in your products and services (new or old).

A quick diagram below:

  • 1st, it is not a Data Science Organization
  • 2nd, it is not a traditional software development team either
  • 3rd, it is not a data engineering team either !
  • Research in understanding the underlying stack (transformers et al) and the mechanisms (managing large datasets, Quantization for LLM, Gradient checkpointing and accumulation, fine-tuning, …) is important
  • An LLM feature Store with snapshot capabilities is required (remember the redaction) and finally we need ML Engineers who can work with LLMs.

A word about prompt engineering and programming by prose. Some say Prompt Engineering is the future, others warns us to run away from prompt engineering as a profession.

And, you will need a few Kill Engineers, in case our new overlords are not as generous as we thought !

Unfortunately this was a long blog, I hope it is a good read. Next one will be shorter, I promise !

One parting thought — if you are going to build the guardrails, firewalls at al, remember they are not traditional artifacts. They will be ChatGPT themselves, but with probes, harnesses and semantic/context assessment capabilities ! Trained rather then programmed !!

Onward to the next blog “Part 3 : Generative AI —A Few Good Numbers & the Current Wisdom (of the crowds)” [Here] & on Guardrails [Here]

--

--

Responses (2)