In today’s digital landscape, not all website traffic is created equal. The rise of automated bots, from beneficial search engine crawlers to resource-draining scrapers, presents a complex challenge for website owners and hosting platforms alike. Over the past year, some popular bot services have increased by 96 percent

Here at Pantheon, we’re actively working to manage this challenge in partnership with our customers. This post will shed light on our current approach to bot traffic, how we’re listening to your concerns, and what the future holds for traffic management on our platform.

The current state of bots at Pantheon

Pantheon is committed to partnering with all of our customers via support tickets and working proactively to manage the impact of automated traffic on your site portfolio. Pantheon actively removes well-known bots from a site’s billable totals. We have also paused overage charges while we develop a solution that balances Pantheon’s needs to cover bot-related traffic costs with customers’ lack of control over much of this traffic.

Pantheon defines a “well-known bot” as a request that is routinely identifiable through the user-agent found on the request. Today, Pantheon relies on Fastly, our CDN partner, to aid us in identifying these bots as well as a second layer of defense for reflexive response to emerging agents not yet added to Fastly’s catalog of bots. We do not block identified bot traffic from entering the platform. Instead, we detect it and exclude it from billable usage, ensuring customers are not charged for that traffic.

We hear you: the customer’s perspective and the Pantheon approach to bot traffic

We understand our customers’ pain well. In the ever-changing world of traffic and the LLM arms race, we understand that some customers are negatively affected by bot traffic. Some see slower application response times, short outages, and general computing resourcing issues (for example, cache exhaustion from excessive traffic on atypical paths). Being that some bot traffic is considered “good” to one customer, yet that same bot may be considered negative to another, it makes this problem extremely difficult to solve at the platform level. This is something internal engineering and product teams at Pantheon have been planning to address. 

The solution will come in two ways: at the platform level for all customers via our Global CDN, and via our Advanced Global CDN product for customers who want to control and restrict bot traffic more often.

One particularly problematic pattern involves bots that get caught in unintended “spider traps” on sites with faceted search pages. Many sites use facets to provide filters in search results. Think of an online clothing store that lets you limit your search for “beach shirt” by size, color, and gender. In our analyses, humans will rarely go beyond 4 filters in any search, while bots iterate through every possible combination of filters in a series of requests, often with dozens of distinct filters on a single request. These requests are useless to the bot as no new results are provided by the variations, but the bots aren’t smart enough to stop cycling through the available options. Each variation is a request the site has to process. And given the nature of the request, they are particularly resource-intensive and often bypass existing caches that the application has in place. These bots then overwhelm the site’s resources and sometimes take down the site entirely. 

The bots keep their requests well below typical Denial of Service (DOS) attack patterns as they typically crawl at just a handful of requests per second. Rate limiting – one of our go-to solutions for mitigating DOS attacks — is unable to reliably protect a site below 10 requests per second. IP and fingerprint blocking can work in some circumstances, but many of the crawlers operate in a highly distributed fashion, which makes them difficult to target. 

Our best approach to mitigating these outages has come from collaboratively developing a plan with the customer to identify traffic patterns that defy typical site usage and don’t provide tangible business or SEO value. We apply custom edge rules to these patterns to block the unwanted traffic and, by extension, help ensure the site’s uptime.

Looking ahead at the future

Our goal is to identify and manage bot traffic at the Global CDN for all customers. As more bot traffic comes into the platform, we are working to differentiate between beneficial and malicious bots, ensuring the security and performance of the sites we host. 

To achieve this, we are implementing a multi-layered strategy that operates at the CDN layer of our platform. This approach allows us to analyze incoming traffic in real-time, long before it reaches your application. This approach will include some of the following:

  • Request analysis at the Edge: Our GCDN will monitor and identify bot activity based on things like request types, user agents, and query strings. 

  • Taking proactive action on traffic: Depending on the types of traffic, we will be able to either outright block (403) malicious traffic or client challenge suspicious traffic. 

  • Edge Security is a moving target: With the addition of the above outlined strategies, we will be able to adjust our strategy based on the traffic we see at the edge. 

By focusing on the above principles, we are building a more resilient edge against bot traffic on the internet. 

Your feedback is the cornerstone of our product development. As we roll out new features and enhancements, we count on you to let us know what’s working and what can be improved. Your insights are crucial to our growth as a platform and our direction as a product-driven organization. Together, we can build a more secure and efficient web for everyone.

Similar Posts