schoenwolf-schroeder
  • Communities
  • Create Post
  • Create Community
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
@ctag@lemmy.sdf.org to Selfhosted@lemmy.worldEnglish • 4 months ago

Non-Cloudflare AI blocking?

message-square
40
fedilink
1
message-square

Non-Cloudflare AI blocking?

@ctag@lemmy.sdf.org to Selfhosted@lemmy.worldEnglish • 4 months ago
message-square
40
fedilink

Now that we know AI bots will ignore robots.txt and churn residential IP addresses to scrape websites, does anyone know of a method to block them that doesn’t entail handing over your website to Cloudflare?

  • Mojeek Search Engine
    link
    fedilink
    English
    0•4 months ago

    why MojeekBot? we’re a search engine

    • r00ty
      link
      fedilink
      0•4 months ago

      Hmm, I took an original list and added to it. You got a website I can check? If so I’ll happily remove. I don’t mind slow web crawlers at all.

      • Mojeek Search Engine
        link
        fedilink
        English
        0•4 months ago

        if you have any recall on where the list came from that’s also useful to us. Here’s our Bot page: https://www.mojeek.com/bot.html and some external info: https://en.wikipedia.org/wiki/Mojeek

        • r00ty
          link
          fedilink
          0•4 months ago

          Didn’t have the link to hand. But a search turned this one up: https://reggiodigital.com/blog/nginx-rule-blocking-bad-bots/ it looks to be the same list, and you can see the ones I’ve added to the end of that list.

          • Mojeek Search Engine
            link
            fedilink
            English
            0•4 months ago

            thanks a lot for providing this 🙏

Selfhosted@lemmy.world

!selfhosted@lemmy.world

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !selfhosted@lemmy.world

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don’t control.

Rules:

  1. Be civil: we’re here to support and learn from one another. Insults won’t be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it’s not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don’t duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

  • selfh.st Newsletter and index of selfhosted software and apps
  • awesome-selfhosted software
  • awesome-sysadmin resources
  • Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

  • 152 users / day
  • 337 users / week
  • 1.51K users / month
  • 4.56K users / 6 months
  • 46.7K subscribers
  • 2.9K Posts
  • 51K Comments
  • Modlog
  • mods:
  • Ruud
  • Loki
  • @CannaVet@lemmy.world
  • devve
  • @HybridSarcasm@lemmy.world
  • @HybridSarcasm@lemmy.hybridsarcasm.xyz
  • BE: 0.19.3
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org