Replacing missing web site content with Varnish
In one of my web server setups, I’m using Varnish as a reverse proxy in front of NGINX. The server hosts multiple web sites and CMSes, and I wanted to make the web server deliver a default robots.txt
if the web site did not already provide one.
Varnish makes this pretty easy, by reviewing the response from each backend. If the backend says it hasn’t got a robots.txt
file, Varnish will create one on the fly.
The subroutine vcl_backend_response
will catch the backend’s response status. In case the HTTP response is 404, Varnish will set the (internal) status code to 700 and call the vcl_backend_error
subroutine.
sub vcl_backend_response {
if (bereq.url ~ "/robots\.txt" && beresp.status == 404) {
return(error(700,"OK"));
}
}
And then in vcl_backend_error
the status code 700 is picked up, creating a text/plain
response containing a generic robots.txt
content. The status code is set back to 200 before serving the response to the client. Some instructions for letting clients and proxies cache the response are also provided. Finally, the object lifetime is set to 1 hour so that Varnish doesn’t need to ask the backend every time.
sub vcl_backend_error {
if (beresp.status == 700) {
set beresp.status = 200;
unset beresp.http.set-cookie;
set beresp.http.cache-control = "Public";
set beresp.ttl = 1h;
set beresp.uncacheable = false;
set beresp.http.Content-Type = "text/plain; charset=utf-8";
set beresp.body = {"User-agent: EvilUserAgent
Disallow: /
User-agent: *
Allow: /
"};
return (deliver);
}
}
The response content from a request for robots.txt
on a web site where it would otherwise not exist now looks like this:
# GET https://example.com/robots.txt
User-agent: EvilUserAgent
Disallow: /
User-agent: *
Allow: /
And these are the response headers, with some details redacted:
# GET -USed https://example.com/robots.txt
200 OK
Cache-Control: Public
Connection: close
Date: Thu, 30 Jan 2025 06:21:49 GMT
Via: 1.1 example.com (Varnish/7.6)
Accept-Ranges: bytes
Age: 1208
Server: Varnish
X-Cacheable: YES
X-Varnish: 360768 360674
The principle can of course be extended to any missing or failing backend content, e.g. for hiding detailed backend responses or branding error pages with a company logo.