Blocking Google’s AI bot crawlers

Google released some news about a new token that can be used to block their Bard and Vertex AI crawlers.

Google-ExtendedA standalone product token that web publishers can use to manage whether their sites help improve Bard and Vertex AI generative APIs, including future generations of models that power those products.

Time to edit my robots.txt file again.

(See Here’s how to block OpenAI’s bot crawlers in your robots.txt file for why I’m blocking them.)

Block Google’s AI bot

Straight from the source: https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers

User-agent: Google-Extended
Disallow: /

That’s the important bit. It’s not even an example on the page, but at least the user-agent info is.

Happy times.

I don’t mind opting-in to things I consider helpful to the world at large. But this opting-out business is ridiculous. Businesses take intellectual property seriously when it’s other people trying to benefit from their property. But when they want to benefit commercially from other people’s property, they have no problem skipping the permission phase and hoping no one cares later.

Here’s how to block OpenAI’s bot crawlers in your robots.txt file

I went looking for this information this morning, and although it’s nice that everyone wants to give you the entire history of robots.txt and web crawlers and the like, sometimes (okay, most of the time) I just want the info I came for.

Now I’m going to share, without all the extraneous history lessons.

Block OpenAI’s GPTBot

Straight from the source: https://platform.openai.com/docs/gptbot

User-agent: GPTBot
Disallow: /

This goes in the robots.txt file.

I don’t need more than the basics, because I’m not interested in allowing access to certain directories and whatnot. But you can follow the link for more details if you want something else.

On a different page, there was also information on how to block ChatGPT plugins from accessing my site.

Block OpenAI’s ChatGPT plugins

Straight from the source: https://platform.openai.com/docs/plugins/bot

User-agent: ChatGPT-User
Disallow: /

To be honest, I’m not sure whether I want to block the plugins or not, but better safe than sorry. I can always undo it later simply by deleting the lines from my robots.txt file.

Why I’m blocking them

You could say this is a statement from me about copyright and you might be somewhat right. I own the content I write. But, in reality, it’s more about me making a statement about controlling where my writing and ramblings end up.

I ramble here because I want to share, but that doesn’t mean I want everyone in the world to just grab it up and put it wherever. I’ve blocked google’s image bot for years and years, and the same goes for the internet archive (when they honor it).

The fact is, my blog is my house, and I don’t like the idea of people coming onto my lawn, grabbing my signs, and running with them. Especially not so they can build a commercial product that benefits them financially and gives me nada.

Money hungry? Why, yes, I am. They’re more than welcome to pay me $$$ and license my content for training. I’d probably say yes. :D

Regular posting and this theme

I’m planning to start posting regularly again, but in the meantime, I’m looking for a new theme for the site. I’m not really happy with the current one. I like it a lot as far as aesthetics, but I don’t think it works for easy access to the site’s content. I like columns. Sidebars. On my phone, it works, but most of the time, I’m looking at the site on my computer, and for that, it just wastes a lot of space.

1,499 posts about nothing?

Well, I’m passing a milestone with this post. This is post number 1,500. :)

Most of the posts of Perpetualized.com are just my ramblings about my writing days. Considering it’s been about 7,000 days since I started this site, and about 3,550 since I started writing to self-publish (as opposed to writing to send to a publisher or hobby writing), that’s not so bad.

Most of the posts don’t really have any meaning for anyone but me. There are a few gems scattered around the site, though I admit they’re hard to find.

:)

Feeds are back—even though I didn’t know they were gone

A long time ago, I added a cleanup function to my theme functions file and deactivated the RSS feed links that usually appear in the header of a WordPress site. I didn’t really think much about this but it has been brought to my attention that maybe I shouldn’t have done that.

I checked in Feedly, and sure enough, without those links in the <head> of the site, Feedly doesn’t even think there’s a feed here. I doubt any feed reader is finding the feed.

Oops.

I commented out the line of code that removed the feed links from the <head> of the site and lo and behold, Feedly now recognizes a feed for the site.  :-)

As for why I’m posting this now when I had planned to be writing, I think I’ll skip the admission that I delayed writing so I could read in the sun instead. :D What an awesome way to start the day.

Ah well. I’m ready now to dig in. That’s good enough. ;-)

I also used my time in the sun to read back through my last five-ish pages (I send my book to myself as an EPUB every time I run my backups (which I’ve mentioned I do obsessively)) and highlighted a couple of typos and a paragraph to switch order, so I kinda started the writing already.

Man, this story has really taken off. I’m looking forward to seeing where the heck it’s going! I ended last night on a sudden (shortish) time jump that I wasn’t expecting but that makes total sense. I’m excited for the characters and that’s always a good thing.