This guide explains the SemrushBot.
Below, you’ll find out what is SemrushBot, how it works, and ways to control this web crawler on your website using directives that the bot obeys.
Table of Contents
What Is SemrushBot?
SemrushBot is a web crawler that compiles and indexes website data for the Semrush database. The data collected by SemrushBot is used to provide up-to-date information for the backlink index and a variety of SEO and marketing analysis tools in the Semrush software.
The SemrushBot is considered a good bot used for marketing purposes that obeys robots.txt rules and Crawl-Delay directives. You can try Semrush for free using my affiliate link to test out all of its capabilities for your search engine optimization, content marketing, and pay-per-click (PPC) advertising campaigns.
How Does SemrushBot Work?
SemrushBot works by automatically visiting publicly accessible web pages to discover and collect new and updated web data. The process of crawling web pages enables SemrushBot to find new URLs and dead links on the Internet to keep its database fresh with link data.
Based on the crawl rate limit and demand it assigns to the website, SemrushBot will crawl a different number of web pages at set intervals during each visit to fetch the link data according to the current server load. The SemrushBot crawler can be instructed to take intervals of up to 10 seconds between requests to a website; higher values are cut down to this 10-second limit.
SemrushBot is programmed so it does not crawl a website too fast to avoid overloading it, which can lead to timeouts and server errors. SemrushBot also does not trigger ad views or show up as visitor traffic in Google Analytics.
The crawl process for SemrushBot starts with the bot crawling a website and making a list of hyperlinks on each web page to use for further crawling. Semrush refers to this list as the “crawl frontier”, which is repeatedly visited by SemrushBot to discover new web pages, dead links, and content updates.
According to the Imperva Incapsula Bot Traffic Report, SemrushBot is one of the most active web spiders used by commercial enterprises to crawl websites and retrieve information for digital marketing purposes. SemrushBot works continuously to crawl the web to give online marketers better insight into the factors that affect search engine indexing and ranking algorithms so users can better optimize their websites and SEO campaigns.
Tools Powered By SemrushBot
Data collected by SemrushBot is used to power the following tools in the Semrush software:
- Backlink Analytics
- Backlink Audit
- Link Building
- Site Audit
- Content Analyzer
- SEO Writing Assistant
- Post Tracking
- On Page SEO Checker
- SEO Content Template
- Content Outline Builder
- Topic Research
You can test all of these tools at no charge using my affiliate link here: try Semrush for free.
Controlling SemrushBot On Your Website
SemrushBot can be controlled by your website’s robots.txt file to change the frequency of the crawler visiting your website, prevent specific Semrush tools from accessing your website’s data, or block the bot completely from crawling your domain.
Changing SemrushBot Crawl Frequency
SemrushBot crawl frequency can be changed by specifying the minimum acceptable delay between two consecutive requests in the robots.txt file using this markup:
User-agent: SemrushBot Crawl-Delay: [value]
Crawl-Delay value is time in seconds. For example, Crawl-Delay: 5.
Note: SemrushBot can only take intervals of up to 10 seconds between requests to a website. Any Crawl-Delay value that is assigned higher than 10 seconds will be cut down to this limit.
Blocking SemrushBot from Your Website
SemrushBot can only be blocked from crawling your website by adding specific rules to the robots.txt file. You cannot prevent SemrushBot from accessing your website through IP addresses because it does not use consecutive IP blocks.
Also, Semrush uses different User-Agents for the various tools in its software. Therefore, you can choose specific SemrushBot crawlers you want to prevent from accessing your domain or eliminate them all by adding the Disallow rule to each User-Agent in the robots.txt file.
Below, is the example rule you can add to your robots.txt file to stop SemrushBot from crawling your site to build the webgraph of links that get reported in the Backlink Analytics tool:
User-agent: SemrushBot Disallow: /
If you want to prevent any of the other tools in the Semrush software from accessing your website, then see this other guide on how to block SemrushBot. It includes the complete list of rules you can copy and paste into your website’s robots.txt file to stop all of the User-Agents from crawling your site.
You’ll also want to disconnect Google Analytics and Search Console from your Semrush account if you’ve set those up. Otherwise, Semrush can still access your private website data for reporting purposes.
You can also visit this related tutorial on how to block AhrefsBot if you want to prevent that bot from crawling your website. Or read this introductory guide on AhrefsBot to find out how it works.
Note: It can take between one hour to 100 requests for SemrushBot to discover changes made to your robots.txt file and honor those directives for crawling the website. If you want to confirm that SemrushBot is obeying your rules, then you can try Semrush for free and test out the various tools yourself to see if they work or not.
I hope you enjoyed this guide on SemrushBot.
As you discovered, SemrushBot is a web crawler that compiles and indexes website data for the Semrush database that is used to provide up-to-date information for the backlink index and a variety of SEO and marketing analysis tools in the Semrush software. You can control how SemrushBot crawls your site by changing the frequency or preventing its various User-Agents from accessing your site through the robots.txt file.