And Wikimedia, in particular, is all about publishing data under open licenses. They want the data to be downloaded and used by others. That’s what it’s for.
At the root of this comment chain is a proposal to have laws passed about this.
People can set up their web servers however they like. It’s on them to do that, it’s their web servers. I don’t think there should be legislation about whether you’re allowed to issue perfectly ordinary HTTP requests to a public server, let the server decide how to respond to them.
Rate limiting in itself requires resources that are not always available. For one thing you can only rate limit individuals you can identify so you need to keep data about past requests in memory and attach counters to them and even then that won’t help if the requests come from IPs that are easily changed.
An HTTP request is a request. Servers are free to rate limit or deny access
And Wikimedia, in particular, is all about publishing data under open licenses. They want the data to be downloaded and used by others. That’s what it’s for.
Even so I think it would be totally reasonable for them to block web scrapers, as they provide better ways to download all their data.
At the root of this comment chain is a proposal to have laws passed about this.
People can set up their web servers however they like. It’s on them to do that, it’s their web servers. I don’t think there should be legislation about whether you’re allowed to issue perfectly ordinary HTTP requests to a public server, let the server decide how to respond to them.
Rate limiting in itself requires resources that are not always available. For one thing you can only rate limit individuals you can identify so you need to keep data about past requests in memory and attach counters to them and even then that won’t help if the requests come from IPs that are easily changed.
Bots lie about who they are, ignore robots.txt, and come from a gazillion different IPs.
That’s what ddos protection is for.