I went looking for this information this morning, and although it’s nice that everyone wants to give you the entire history of robots.txt and web crawlers and the like, sometimes (okay, most of the time) I just want the info I came for.
Now I’m going to share, without all the extraneous history lessons.
Block OpenAI’s GPTBot
Straight from the source: https://platform.openai.com/docs/gptbot
User-agent: GPTBot
Disallow: /
This goes in the robots.txt file.
I don’t need more than the basics, because I’m not interested in allowing access to certain directories and whatnot. But you can follow the link for more details if you want something else.
On a different page, there was also information on how to block ChatGPT plugins from accessing my site.
Block OpenAI’s ChatGPT plugins
Straight from the source: https://platform.openai.com/docs/plugins/bot
User-agent: ChatGPT-User
Disallow: /
To be honest, I’m not sure whether I want to block the plugins or not, but better safe than sorry. I can always undo it later simply by deleting the lines from my robots.txt file.
Why I’m blocking them
You could say this is a statement from me about copyright and you might be somewhat right. I own the content I write. But, in reality, it’s more about me making a statement about controlling where my writing and ramblings end up.
I ramble here because I want to share, but that doesn’t mean I want everyone in the world to just grab it up and put it wherever. I’ve blocked google’s image bot for years and years, and the same goes for the internet archive (when they honor it).
The fact is, my blog is my house, and I don’t like the idea of people coming onto my lawn, grabbing my signs, and running with them. Especially not so they can build a commercial product that benefits them financially and gives me nada.
Money hungry? Why, yes, I am. They’re more than welcome to pay me $$$ and license my content for training. I’d probably say yes. :D