Generative AIEthics and safety in AIAI safety toolsLLM Guardrails

NeMo Guardrails overview

7 minutes read

NeMo Guardrails is an open-source toolkit developed by NVIDIA that enables developers to add programmable guardrails to LLM-based conversational systems. With the increasing adoption of LLMs in various applications, ensuring they operate within safe and controlled parameters has become critically important. NeMo Guardrails provides a framework to achieve this through multiple mechanisms that control LLM outputs, guide conversational flows, and protect against common vulnerabilities.

The NeMo Guardrails Ecosystem

NeMo Guardrails is designed to place a protective layer between your application code and the LLM. This layer enables you to define specific ways to control the LLM's behavior, such as preventing discussions on certain topics, steering conversations in predefined directions, ensuring responses meet factual accuracy requirements, and more.

The NeMo Guardrails ecosystem supports five main categories of guardrails:

Input rails: Applied to user inputs before they reach the LLM. These can reject or modify problematic inputs (e.g., masking sensitive data).
Dialog rails: These rails influence how conversations evolve by operating on standard form messages. They determine whether custom actions should be executed, whether the LLM should generate responses, and whether predefined responses should be used.
Retrieval rails: Applied in Retrieval Augmented Generation (RAG) scenarios to filter or modify information chunks before they're used to prompt the LLM.
Execution rails: Applied to the inputs and outputs of custom actions (tools) that the LLM might call.
Output rails: Applied to LLM-generated responses before they're returned to the user, allowing for rejection or modification of inappropriate content.

These guardrails work together to provide control and safety mechanisms like jail-break and hallucination detection for your LLM applications.

Setup and Configuration

Let's start building a customer service chatbot for a fictional company. Our chatbot must handle the company’s leave and vacation policy, and avoid discussing out-of-company services.

NeMo Guardrails uses the Annoy library, which requires a C++ compiler and development tools.

pip install nemoguardrails

A typical NeMo Guardrails configuration consists of several components:

General Options: Defined in config.yml.
Rails: Colang flows (Colang is a custom NeMo language for the configurations) implementing the guardrails in .co files.
Actions: Custom actions in Python.
Knowledge Base Documents: For RAG scenarios.
Initialization Code: Custom Python code for additional setup (such as database connections).

Create the configuration structure as below:

nemo_guardrails_demo/
├── config/
│   ├── config.yml
│   └── rails/
│       └── main.co
└── app.py

Now, let's define our config.yml file. If you are trying out with organization credentials, refer to the configuration to define your additional parameters.

Here we have declared our instructions for the agent and some prompts to self-check both input and output to filter inappropriate queries, and how the agent should respond to them.

# config.yml
models:
  - type: main
    engine: openai
    model: gpt-3.5-turbo

instructions:
  - type: general
    content: |
      You are a helpful assistant. Only answer questions about company policy.

# The required self_check_input and self_check_output prompts:
prompts:
  - task: self_check_input
    content: |
      Check if the user input is safe and on-topic. 
      User input: {user_input}
      Answer "yes" if safe and on-topic, otherwise "no".

  - task: self_check_output
    content: |
      Check if the assistant's output is appropriate and safe.
      Assistant output: {bot_response}
      Answer "yes" if appropriate, otherwise "no".

Implementing Guardrails with Colang

Colang is the language used to define guardrails in NeMo Guardrails. It has a Python-like syntax designed specifically for modeling dialogue flows.

Add the below content to handle both input, output, and dialog guardrails in the main.co file and let’s understand this in some time:

# main.co

define user ask greeting
  "hello"
  "hi"
  "hey"

define user ask leave policy
  "what is the vacation policy"
  "tell me about vacation policy"
  "vacation policy"
  "leave policy"

define user ask company policy
  "company policy"
  "what is the company policy"
  "tell me about company policy"

define bot response greeting
  "Hello! How can I help you with company policy?"

define bot response leave_policy
  "Our company provides 20 vacation days per year for full-time employees. You can use these after your probation period."

define bot response company_policy
  "Our company policy focuses on employee well-being, fair work practices, and transparency. For specific policies, please ask about vacation, leave, or other topics."

define bot response fallback
  "I'm sorry, I can only answer questions about company policy."

define flow greeting
  user ask greeting
  bot response greeting

define flow leave_policy
  user ask leave policy
  bot response leave_policy

define flow company_policy
  user ask company policy
  bot response company_policy

define flow fallback
  user *
  bot response fallback

Let’s break down some of the rules of writing the .co files. define is used to declare a condition and the role user ask or bot response of who asked the query. The next terms, like leave policy, etc, define the topic asked.

To understand the response of the bot for a topic, check out the same topic name on how the bot responds. More precisely, the user ask greeting topic accepts the words defined under it from the query, and the bot response greeting is how the responses are handled. These can be predefined and customized as per the user-based preferences.

Using the Guardrails in Your Application

Now that we've defined our guardrails, let's see how to use them in our app.py file:

import os
from nemoguardrails import LLMRails, RailsConfig

os.environ["OPENAI_API_KEY"] = "sk-your-api-key"  # Replace with your key

def main():
    print("Loading NeMo Guardrails configuration...")
    config = RailsConfig.from_path("config") # "config" is the folder of rails
    rails = LLMRails(config)    
    
    print("\\nAssistant ready!\\n")    
    
    test_queries = ["hello", "leave policy", "company policy", "tell me a joke", "explain yourself"]
	
    messages = []    
	
    for user_input in test_queries:
        print(f"You: {user_input}")
        messages.append({"role": "user", "content": user_input})
				
        # generate the response from the agent
        response = rails.generate(messages=messages)  
		
        # To handle the response types
        if isinstance(response, dict):
            print("Assistant:", response.get("content", response))
        else:
            print("Assistant:", response)        

        messages.append({"role": "assistant", "content": response if isinstance(response, str) else response.get("content", "")})
        print('-' * 40)
				
if __name__ == "__main__":    
    main()

Execute the python file using the below command:

python app.py

You should see a somewhat similar outcome of how the chatbot responded with guardrails implemented successfully.

# Output

Loading NeMo Guardrails configuration...
Fetching 5 files: 100%|██████████| 5/5 [00:00<00:00, 5012.31it/s]

Assistant ready!

You: hello
Assistant: Hello! How can I assist you today?
----------------------------------------
You: leave policy
Assistant: Our company provides 20 vacation days per year for full-time employees. You can use these after your probation period.
----------------------------------------
You: company policy
Assistant: Our company policy focuses on employee well-being, fair work practices, and transparency. For specific policies, please ask about vacation, leave, or other topics.
----------------------------------------
You: tell me a joke
Assistant: I'm sorry, I can only answer questions about company policy.
----------------------------------------
You: explain yourself
Assistant: Bot intent: response fallback
Bot message: I'm sorry, I can only answer questions about company policy.
----------------------------------------

Let’s understand the outcomes now. The Bot has responded as intended for queries like policy, greeting, and the out-of-context topics like asking a joke, explaining yourself appropriately. Similarly, we can implement more guardrails to filter out inappropriate queries and make the bot more ethical.

Conclusion

Now that we have reached the end of the topic, let’s recall the content covered. We learnt the basics of NeMo Guardrails, their ecosystem, and installation. We also built a chatbot to implement the input, output, and dialog guardrails for a fictional company for their internal policies.

You can experiment with advanced queries and custom variables, like greeting the user by their names, by referring to the guardrails library.

2 learners liked this piece of theory. 0 didn't like it. What about you?

Report a typo