LLM Moderation Guardrail

Jay Ozer

Output guardrail will assess the LLM's response and block anything scoring a 3 or higher. Setting this threshold is a common area for optimization. This version only uses paths error, moderation_triggerred, continue



Voiceflow APIs used:

No items found.
Created By
Jay Ozer
This is some text inside of a div block.
This is some text inside of a div block.
This is some text inside of a div block.

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5
Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

  1. Item 1
  2. Item 2
  3. Item 3

Unordered list

  • Item A
  • Item B
  • Item C

Text link

Bold text




Based in
This is some text inside of a div block.

Function Code Snippet

export default async function main(args) { const { openaiApiKey, chatResponse } = args.inputVars; if (!chatResponse || !openaiApiKey) { return { next: { path: 'error' }, trace: [{ type: "debug", payload: { message: "Missing required input variable: chatResponse or openaiApiKey" } }] }; } const url = `https://api.openai.com/v1/chat/completions`; const domain = "pediatric dentistry and children's dental health"; const dental_health_criteria = ` Assess the presence of misinformation, harmful advice, or non-related content in the context of pediatric dentistry and children's dental health. The content should: 1. Contain accurate and safe advice about pediatric dentistry and children's dental health. 2. Be relevant to pediatric dentistry and children's dental health.`; const dental_health_steps = ` 1. Read the content and the criteria carefully. 2. Assess how much misinformation, harmful advice, or non-related content is contained in the content. 3. Assign a dental health score from 1 to 5, with 1 being no misinformation, harmful advice, or non-related content, and 5 being multiple instances of misinformation, harmful advice, or non-related content.`; async function checkOutputGuardrail() { const moderationSystemPrompt = `You are a moderation assistant. Your role is to detect content about {domain} in the text provided, and mark the severity of that content. ## {domain} ### Criteria {scoring_criteria} ### Instructions {scoring_steps} ### Content {content} ### Evaluation (score only!)`; const modMessages = [ { "role": "user", "content": moderationSystemPrompt.replace('{domain}', domain) .replace('{scoring_criteria}', dental_health_criteria) .replace('{scoring_steps}', dental_health_steps) .replace('{content}', chatResponse) }, ]; const response = await fetch(url, { method: 'POST', headers: { 'Content-Type': 'application/json', 'Authorization': `Bearer ${openaiApiKey}` }, body: JSON.stringify({ model: "gpt-4o", messages: modMessages, temperature: 0 }) }); if (!response.ok) { throw new Error('HTTP error! status: ' + response.status); } const data = await response.json; return data.choices[0].message.content; } try { const moderationResponse = await checkOutputGuardrail(); const score = parseInt(moderationResponse, 10); let message; if (score >= 3) { message = `Moderation guardrail flagged with a score of ${score}.`; return { next: { path: 'moderation_triggered' }, trace: [{ type: "debug", payload: { message } }] }; } else { message = "Passed moderation."; return { next: { path: 'continue' }, trace: [{ type: "debug", payload: { message } }] }; } } catch (error) { return { next: { path: 'error' }, trace: [{ type: "debug", payload: { message: `Error in moderation guardrail: ${error.message}` } }] }; }}

Explore More Templates

Build and submit a Template to have it featured in the community.

No items found.
No items found.