Black Mirror AI

fossilesque@mander.xyz · 11 months ago

Black Mirror AI

kassiopaea@lemmy.blahaj.zone · 11 months ago

Wouldn’t Google’s crawlers respect robots.txt though? Is it naive to assume that anything would?

jaschen@lemm.ee · 11 months ago

It’s naive to assume that google crawlers respect robot.txt.

rosco385@lemm.ee · 11 months ago

It’d be more naive to have a robot.txt file on your webserver and be surprised when webcrawlers don’t stay away. 😂

Zexks@lemmy.world · 11 months ago

Lol. And they’ll delist you. Unless you’re really important, good luck with that.

robots.txt

Disallow: /some-page.html

If you disallow a page in robots.txt Google won’t crawl the page. Even when Google finds links to the page and knows it exists, Googlebot won’t download the page or see the contents. Google will usually not choose to index the URL, however that isn’t 100%. Google may include the URL in the search index along with words from the anchor text of links to it if it feels that it may be an important page.