23 September 2024

I Set Out to Add a Comments Section, Now an LLM Moderates My Site

Written by
Prompt Engineering

I didn’t expect to plug in a comment system and move on—I knew it wouldn't be that. Still, I didn't want to pay for an ongoing SaaS for comment management or add third-party cookies to my site (I'm trying to minimise using cookies at all), so I explored and tested different options.

So, let me share my little excursion...

DIY Comments Section... Really?! but Why?

I spent a fair amount of time trying to avoid it. I tried open-source projects like Remark42, Cusdis, and Isso. I tried privacy-friendly SaaS projects like Hyvor, Commento, and Commentbox.io.

Each option has its strengths, but none fit 'seamlessly well' with my current site, skill level, patience or appetite for SaaS subscriptions.

After spending a couple of days setting up and trying out these tools, I did a 180 and started to build my own simple comments backend. It was worth it.

Side Note on the Comments Backend

It wasn't too hard because, at that stage, I had become familiar with how open-source projects work. My setup is basic and rudimentary (I'm not an engineer), but I'm happy I set it up. It allows me to experiment further, for example, by adding an AI agent to moderate my comments.

Here is what the setup looks like:

Two key components:

  1. Profanity filter: I'm using this 'obscenity' npm package to do some basic filtering before sending it to the LLM

  2. LLM Moderation: I'm using the Google Vertex npm package to interact with a Gemini model.

I know it's missing a key feature: Blocking known spam IP addresses.

The Choice to Use an LLM to Moderate My Site

During my exploration, I learned about established tools and APIs—such as Google Natural Language Text Moderation—which are solid solutions for moderation at scale. Still, I wasn’t looking for an engineered solution. This blog is about AI, so I decided to use an LLM to handle it—to learn by trying.

I went with Google Gemini as the LLM for this project because:

  • I've been using Open AI APIs and wanted to try other providers

  • We use GCP/Vertex at work, so it's good to get acquainted

  • There's a free usage tier for development

Working with Gemini and Google Cloud’s Vertex AI API wasn’t tricky. One notable difference from OpenAI’s API is that Gemini doesn’t provide a ‘thread ID’ to keep track of conversations—so, at the time of writing, you have to store the conversation history yourself if you want continuity.

With that said, it was pretty simple. I installed the SDK, wrote system prompts, and created a JSON schema so that the LLM would respond in the correct format.

I'll explain the implementation at a high level and share my system prompt and process.

Creating the LLM Agent to Moderate Comments

I began by googling articles to see what people had done. Then, I went to the Vertex AI Studio and used the UI to start working on the agent. As I mentioned, I'm currently interested in using Google services for this site to experiment with the Google AI stack, so Vertex made sense.

I set the temperature to 0.1, and added a basic initial system prompt, something along the lines of: "You're a comment moderation assistant. You will respond with either 'approved' or 'disapproved'", and off I went...

Vertex AU Studio has a lot going on. There are many features I don't know how or when to use yet, but it seems to have a lot to offer. The Studio and its features appear to be under heavy development, so I'm sure I'll be getting into it a lot more.

Engineering the System Prompt

This screenshot comes from a tool called Promptfoo, which I would describe as an LLM app testing framework.

I've begun using Promptfoo to work on prompts. It helps me test and score the prompts programmatically and more methodically.

The idea is to create a series of tests and re-run them as you modify and develop the prompt. That way, you can track how changes to the prompt impact the tests' results.

With tools like Promptfoo, we can collect the data to show that our prompts improve over time.

In this case, after some trial and error, I got the Comment Moderation LLM to pass 96% of my tests:

Sidenote: During development, I include extra 'notes' in the LLMs response to better understand the results. I use this information to help me with prompt engineering. This means that the token count is usually higher during development than in production.

More on the Testing Methodology

A 'test' is what it sounds like: you ask the LLM to do what you want and measure the result.

Each test results in a pass or a fail. A pass is when the LLM output provides a valid result, and a fail is when it doesn't.

What makes a 'valid' result will vary, but it could include output accuracy, cost, quality, etc.

In this case, the test is a comment, and the measurable result is whether a comment is correctly approved or disapproved.

The goal is to get the LLM to pass as many tests as possible with each round of iterations.

Creating the Test Dataset

Tests help us quantify and measure our prompt engineering progress, so having good-quality tests is critical.

In a 'real' website, you would look at your comments history to create the testing dataset. Unfortunately, I don't have any comments, so I need to make the dataset from scratch.

Promptfoo can generate test data, but I couldn't get the feature to work. Instead, I wrote some initial examples and then used an LLM to create more. I ensured a broad variety of comment types, styles, and outcomes. I included comments generated using my articles and comments on other websites like mine.

The approval statuses I gave the test comments reflect whether I would have those specific comments approved or not, and I also worked through some edge cases.

For example, due to the nature of this blog, the LLM has to approve comments about system prompts in general but disapprove of comments about its own system prompt.

After an hour or so, I had 111 test comments of various types, lengths, topics, approval statuses and levels of ambiguity.

V0 of the Prompt: 75% Pass

This was the first prompt I ran tests on, it is just a starter with a json schema.

Starting with a basic prompt like the one above is a good idea because it creates a benchmark and helps provide a sense of how far off your objective you are.

Since my moderation guidelines were quite relaxed from the beginning, the LLM had already passed 75% of the tests with just that simple prompt. Not bad! But I want to get it to 95%.

Looking through the failed tests, I could spot that the LLM was too thin-skinned and had flagged several comments as offensive, which I would have wanted approved.

This is an example of a test that failed because the LLM moderator was too quick to take offence.

In the screenshot, the comment is on the left, and the correct answer is in the 'expected' column (the correct answer is true). In the column to the right, you can see that the LLM failed the test because it set the approval status incorrectly (it set it to false).

The LLM rejected the comment because it was offensive, but I would have wanted it to be approved. Even if the comment is rude, offensive, or unconstructive, I'll probably still want to approve it. That's just the flavour of automated moderation I want—I only want to block spam, hate speech, and other dangers.

So, one of the first things I wanted to 'prompt engineer' was to make the LLM moderator a little more chill and open to criticism.

Other examples of failed tests included comments with 'controversial' opinions, for example, this one about hacking LLMs:

After reviewing more failed tests, I updated the prompt in a way that I hoped would improve the results. This is the iterative loop we use to enhance and improve the prompt over time.

Once I updated the prompt, it was time to re-run the tests.

V1 of the Prompt: 88% Pass

In this version of the prompt (v1), I was a lot more deliberate with the types of comments to approve and disapprove and spelled out that the approach to moderation should be pretty relaxed.

For example, I included instructions that allow unconstructive and negative comments.

The result was pretty good—the changes helped the LLM get a thicker skin! It's the internet, after all.

Barely 'offensive' comments, like this one, are now correctly approved:

88% is good, but not quite 95% yet. I started going through the failed tests again.

Even though the new prompt made the LLM Moderator more tolerant of rudeness, it seemed that straight-up insults were still getting disapproved. For example, this comment calling me an idiot was disapproved, even though I wanted comments like it approved:

When I put the tests together, I thought about this for a second before I decided to approve these petty insults. Again, this is the internet. To achieve my desired result, I would need to ask the LLM to allow for even MORE potentially offensive commentary.

Other failed tests I felt I could work on included comments like this one:

I want to accept this type of comment, but the language and the fact that hacking is illegal trigger the LLM Moderator. Even though hacking is illegal, discussions about it make sense in this blogand so do comments about hacking and security in general.

I'd need to make more changes to allow for comments like that.

V2 of the Prompt: 96% Pass

This version of prompt above introduced a few new vital instructions. In particular, we're providing specific context about the blog's focus (AI, LLMs, innovation, tech) and its intended audience.

The prompt allows discussion of topics like prompt engineering, LLM hacking, and AI security. We also ask the LLM to expect memes and internet culture to be a part of the comments.

With this prompt, it started approving comments about LLM red teaming, such as this one:

As well as correctly identifying and allowing hyperbolic internet comments like this:

I also updated the system prompt to allow "potentially offensive and mildly offensive" content. The previous prompt did not explicitly address levels of offensiveness.

Now, people can insult me without fear of getting censored:

With this iteration, I passed my target for the number of tests I had set out to achieve: 95%

I was satisfied and ready to deploy the prompt into the comment moderation agent.

One More Test and Screenshot

The last screenshot I took was 98% at 150 tests:

At this stage, the next step would be to increase the dataset and test more data. I might revisit this and improve the dataset when I can get the Promptfoo data gen feature working. For real REAL data, I'll have to wait until I get some comments or find a source of comments.

The Beginning

Now, I have an AI agent doing 'something' that I can improve on in the future. It has been an excellent project for me and was pretty fun.

  • I learned to work with Google Vertex and set up my first Google Gemini agent.

  • I forced myself to do simple prompts instead of going straight for the chain-of-thought, long, complex setups.

  • I learned how to do some things with Promptfoo, which I didn't know how to do before

  • I made this blog post

  • I avoided the SaaS fee for the comments system and moderation

I plan to work on this some more. I wrote down a much more complex prompt, with CoT and examples, it was more accurate but 2.5K tokens to run each time. For now, I think this one is good enough.

As I mentioned at the beginning of the article, LLM-based comment moderation is not the best solution currently available at scale. But it is the most fun solution I could set up.

Thanks for reading.

I wrote this article using Grammarly, made the image for the article with Midjourney and created this website using v0.dev. I also used ChatGPT throughout the process. That's what this blog is all about.