Am I getting DDOS from crawlers?Is it possible for web crawlers to see static pages without following a link...
Do I really need diodes to receive MIDI?
Accidentally deleted the "/usr/share" folder
What does this colon mean? It is not labeling, it is not ternary operator
SQL Server Management Studio SSMS 18.0 General Availability release (GA) install fails
If Earth is tilted, why is Polaris always above the same spot?
How can I close a gap between my fence and my neighbor's that's on his side of the property line?
Can I get a paladin's steed by True Polymorphing into a monster that can cast Find Steed?
Is induction neccessary for proving that every injective mapping of a finite set into itself is a mapping onto itself?
Identifying my late father's D&D stuff found in the attic
What was the state of the German rail system in 1944?
Junior developer struggles: how to communicate with management?
Casual versus formal jacket
Transpose of product of matrices
Did we get closer to another plane than we were supposed to, or was the pilot just protecting our delicate sensibilities?
How do I tell my manager that his code review comment is wrong?
Why do money exchangers give different rates to different bills?
Would "lab meat" be able to feed a much larger global population
How could a planet have most of its water in the atmosphere?
Roll Dice to get a random number between 1 and 150
Reconstruct a matrix from its traces
Airbnb - host wants to reduce rooms, can we get refund?
Type-check an expression
Are we obligated to aspire to be Talmidei Chachamim?
What are the differences between credential stuffing and password spraying?
Am I getting DDOS from crawlers?
Is it possible for web crawlers to see static pages without following a link to them?How often are sitemap.xml checked for updates by crawlers?Free standalone crawlersDo web crawlers use the “revised” meta tag?multiple encoded requests from crawlersHow to block the most popular spider crawlers via robots.txt?Alexa SEO audit reports that all crawlers are blocked despite “Allow: /” for specific crawlers in robots.txtWhy am I getting bot hits from compute-1.amazonaws.com?How deep do search engine crawlers go?Major Crawlers not honoring robots.txt?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
My website is currently getting hit with spamming bots (Example: 66.249.73.*) that are causing high CPU usage. Is it common for Google/Bing to crawl a website 10/sec?
I have done reverse look ups on the IPs and they appear to be valid crawlers using https://support.google.com/webmasters/answer/80553?hl=en.
Because I have blocked some IP's, Google/Bing Search Console are reporting errors and hurting my index.
This has been happening this month (April). Is this a referral attack? Is it possible someone is spoofing the IPs? Is there something I can do to limit the amount of crawls?
I am currently in the process of creating a server-side rendering for crawlers, but this is a tedious task for something that just started happening randomly.
web-crawlers
New contributor
add a comment |
My website is currently getting hit with spamming bots (Example: 66.249.73.*) that are causing high CPU usage. Is it common for Google/Bing to crawl a website 10/sec?
I have done reverse look ups on the IPs and they appear to be valid crawlers using https://support.google.com/webmasters/answer/80553?hl=en.
Because I have blocked some IP's, Google/Bing Search Console are reporting errors and hurting my index.
This has been happening this month (April). Is this a referral attack? Is it possible someone is spoofing the IPs? Is there something I can do to limit the amount of crawls?
I am currently in the process of creating a server-side rendering for crawlers, but this is a tedious task for something that just started happening randomly.
web-crawlers
New contributor
If your CPU heavily on legit, or even rogue bots, it's time to upgrade your hosting. Your website hosting should be able to handle users, legit bots, and even 3rd bots.
– Simon Hayter♦
4 hours ago
add a comment |
My website is currently getting hit with spamming bots (Example: 66.249.73.*) that are causing high CPU usage. Is it common for Google/Bing to crawl a website 10/sec?
I have done reverse look ups on the IPs and they appear to be valid crawlers using https://support.google.com/webmasters/answer/80553?hl=en.
Because I have blocked some IP's, Google/Bing Search Console are reporting errors and hurting my index.
This has been happening this month (April). Is this a referral attack? Is it possible someone is spoofing the IPs? Is there something I can do to limit the amount of crawls?
I am currently in the process of creating a server-side rendering for crawlers, but this is a tedious task for something that just started happening randomly.
web-crawlers
New contributor
My website is currently getting hit with spamming bots (Example: 66.249.73.*) that are causing high CPU usage. Is it common for Google/Bing to crawl a website 10/sec?
I have done reverse look ups on the IPs and they appear to be valid crawlers using https://support.google.com/webmasters/answer/80553?hl=en.
Because I have blocked some IP's, Google/Bing Search Console are reporting errors and hurting my index.
This has been happening this month (April). Is this a referral attack? Is it possible someone is spoofing the IPs? Is there something I can do to limit the amount of crawls?
I am currently in the process of creating a server-side rendering for crawlers, but this is a tedious task for something that just started happening randomly.
web-crawlers
web-crawlers
New contributor
New contributor
edited 6 hours ago
John Conde♦
83.7k16132230
83.7k16132230
New contributor
asked 8 hours ago
MattMatt
62
62
New contributor
New contributor
If your CPU heavily on legit, or even rogue bots, it's time to upgrade your hosting. Your website hosting should be able to handle users, legit bots, and even 3rd bots.
– Simon Hayter♦
4 hours ago
add a comment |
If your CPU heavily on legit, or even rogue bots, it's time to upgrade your hosting. Your website hosting should be able to handle users, legit bots, and even 3rd bots.
– Simon Hayter♦
4 hours ago
If your CPU heavily on legit, or even rogue bots, it's time to upgrade your hosting. Your website hosting should be able to handle users, legit bots, and even 3rd bots.
– Simon Hayter♦
4 hours ago
If your CPU heavily on legit, or even rogue bots, it's time to upgrade your hosting. Your website hosting should be able to handle users, legit bots, and even 3rd bots.
– Simon Hayter♦
4 hours ago
add a comment |
3 Answers
3
active
oldest
votes
Google uses many IP ranges. From the one you posted, any IP from them in the range 66.249.64.0 - 66.249.95.255 and identifying itself as Googlebot should be a legit bot.
There are many reasons for the increase in crawl rate, maybe some of your content is viral, or their bot wants to refreshen the data from your site sooner. It usually IS a good thing.
I would NEVER block a Google IP range, unless you don't want visitors to reach your site. What you CAN do if it's hammering your resources is to specify a crawl-delay for other search bots in Robots.txt.
Google does not support the Crawl-delay directive. However, Google does support defining a crawl rate in Google Search Console.
New contributor
Thanks for the response. This info was helpful.
– Matt
4 hours ago
add a comment |
It's not likely that Google is crawling your site 10x a second. Google addresses have been used before by hackers for various reasons. See https://www.abuseipdb.com/user/23706.
You can block this range using your htaccess file. See https://stackoverflow.com/questions/18483068/how-to-block-an-ip-address-range-using-the-htaccess-file/18483210
What I don't know is if the range you have is from Google servers or client network.
1
You should NEVER block Google IP addresses, unless you don't want them to send visitors to your site.
– Luis Alberto Barandiaran
6 hours ago
@LuisAlbertoBarandiaran, I would agree. But not every address assigned to Google is assigned to their servers. That's why I said I don't know if the range is from servers or client network.
– Trebor
6 hours ago
add a comment |
If the IP addresses from where the requests are originating are identified as that of Google/Bing then you should not block them as this will impact your SEO. Instead of blocking them you should adjust the crawl rate in their respective Webmaster tools
Both Google and Bing offer the ability to adjust crawl rate with good flexibility.
Change Googlebot crawl rate
Bing Crawl Control
In the case of Yandex crawling your site you can add a Crawl-delay
directive to slow down the crawl rate for yandex
If you think there are spam bots that are affecting your servers (which can be determined by observing the server logs) consider using a Web Application Firewall that will block suspicious IP Addresses. Cloudflare has the ability to allow known bots and block suspicious bots based on the threat score which it calculates. Additionally you can also block certain User-agents from crawling the website
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "45"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Matt is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fwebmasters.stackexchange.com%2fquestions%2f122575%2fam-i-getting-ddos-from-crawlers%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Google uses many IP ranges. From the one you posted, any IP from them in the range 66.249.64.0 - 66.249.95.255 and identifying itself as Googlebot should be a legit bot.
There are many reasons for the increase in crawl rate, maybe some of your content is viral, or their bot wants to refreshen the data from your site sooner. It usually IS a good thing.
I would NEVER block a Google IP range, unless you don't want visitors to reach your site. What you CAN do if it's hammering your resources is to specify a crawl-delay for other search bots in Robots.txt.
Google does not support the Crawl-delay directive. However, Google does support defining a crawl rate in Google Search Console.
New contributor
Thanks for the response. This info was helpful.
– Matt
4 hours ago
add a comment |
Google uses many IP ranges. From the one you posted, any IP from them in the range 66.249.64.0 - 66.249.95.255 and identifying itself as Googlebot should be a legit bot.
There are many reasons for the increase in crawl rate, maybe some of your content is viral, or their bot wants to refreshen the data from your site sooner. It usually IS a good thing.
I would NEVER block a Google IP range, unless you don't want visitors to reach your site. What you CAN do if it's hammering your resources is to specify a crawl-delay for other search bots in Robots.txt.
Google does not support the Crawl-delay directive. However, Google does support defining a crawl rate in Google Search Console.
New contributor
Thanks for the response. This info was helpful.
– Matt
4 hours ago
add a comment |
Google uses many IP ranges. From the one you posted, any IP from them in the range 66.249.64.0 - 66.249.95.255 and identifying itself as Googlebot should be a legit bot.
There are many reasons for the increase in crawl rate, maybe some of your content is viral, or their bot wants to refreshen the data from your site sooner. It usually IS a good thing.
I would NEVER block a Google IP range, unless you don't want visitors to reach your site. What you CAN do if it's hammering your resources is to specify a crawl-delay for other search bots in Robots.txt.
Google does not support the Crawl-delay directive. However, Google does support defining a crawl rate in Google Search Console.
New contributor
Google uses many IP ranges. From the one you posted, any IP from them in the range 66.249.64.0 - 66.249.95.255 and identifying itself as Googlebot should be a legit bot.
There are many reasons for the increase in crawl rate, maybe some of your content is viral, or their bot wants to refreshen the data from your site sooner. It usually IS a good thing.
I would NEVER block a Google IP range, unless you don't want visitors to reach your site. What you CAN do if it's hammering your resources is to specify a crawl-delay for other search bots in Robots.txt.
Google does not support the Crawl-delay directive. However, Google does support defining a crawl rate in Google Search Console.
New contributor
New contributor
answered 6 hours ago
Luis Alberto BarandiaranLuis Alberto Barandiaran
1214
1214
New contributor
New contributor
Thanks for the response. This info was helpful.
– Matt
4 hours ago
add a comment |
Thanks for the response. This info was helpful.
– Matt
4 hours ago
Thanks for the response. This info was helpful.
– Matt
4 hours ago
Thanks for the response. This info was helpful.
– Matt
4 hours ago
add a comment |
It's not likely that Google is crawling your site 10x a second. Google addresses have been used before by hackers for various reasons. See https://www.abuseipdb.com/user/23706.
You can block this range using your htaccess file. See https://stackoverflow.com/questions/18483068/how-to-block-an-ip-address-range-using-the-htaccess-file/18483210
What I don't know is if the range you have is from Google servers or client network.
1
You should NEVER block Google IP addresses, unless you don't want them to send visitors to your site.
– Luis Alberto Barandiaran
6 hours ago
@LuisAlbertoBarandiaran, I would agree. But not every address assigned to Google is assigned to their servers. That's why I said I don't know if the range is from servers or client network.
– Trebor
6 hours ago
add a comment |
It's not likely that Google is crawling your site 10x a second. Google addresses have been used before by hackers for various reasons. See https://www.abuseipdb.com/user/23706.
You can block this range using your htaccess file. See https://stackoverflow.com/questions/18483068/how-to-block-an-ip-address-range-using-the-htaccess-file/18483210
What I don't know is if the range you have is from Google servers or client network.
1
You should NEVER block Google IP addresses, unless you don't want them to send visitors to your site.
– Luis Alberto Barandiaran
6 hours ago
@LuisAlbertoBarandiaran, I would agree. But not every address assigned to Google is assigned to their servers. That's why I said I don't know if the range is from servers or client network.
– Trebor
6 hours ago
add a comment |
It's not likely that Google is crawling your site 10x a second. Google addresses have been used before by hackers for various reasons. See https://www.abuseipdb.com/user/23706.
You can block this range using your htaccess file. See https://stackoverflow.com/questions/18483068/how-to-block-an-ip-address-range-using-the-htaccess-file/18483210
What I don't know is if the range you have is from Google servers or client network.
It's not likely that Google is crawling your site 10x a second. Google addresses have been used before by hackers for various reasons. See https://www.abuseipdb.com/user/23706.
You can block this range using your htaccess file. See https://stackoverflow.com/questions/18483068/how-to-block-an-ip-address-range-using-the-htaccess-file/18483210
What I don't know is if the range you have is from Google servers or client network.
answered 7 hours ago
TreborTrebor
528112
528112
1
You should NEVER block Google IP addresses, unless you don't want them to send visitors to your site.
– Luis Alberto Barandiaran
6 hours ago
@LuisAlbertoBarandiaran, I would agree. But not every address assigned to Google is assigned to their servers. That's why I said I don't know if the range is from servers or client network.
– Trebor
6 hours ago
add a comment |
1
You should NEVER block Google IP addresses, unless you don't want them to send visitors to your site.
– Luis Alberto Barandiaran
6 hours ago
@LuisAlbertoBarandiaran, I would agree. But not every address assigned to Google is assigned to their servers. That's why I said I don't know if the range is from servers or client network.
– Trebor
6 hours ago
1
1
You should NEVER block Google IP addresses, unless you don't want them to send visitors to your site.
– Luis Alberto Barandiaran
6 hours ago
You should NEVER block Google IP addresses, unless you don't want them to send visitors to your site.
– Luis Alberto Barandiaran
6 hours ago
@LuisAlbertoBarandiaran, I would agree. But not every address assigned to Google is assigned to their servers. That's why I said I don't know if the range is from servers or client network.
– Trebor
6 hours ago
@LuisAlbertoBarandiaran, I would agree. But not every address assigned to Google is assigned to their servers. That's why I said I don't know if the range is from servers or client network.
– Trebor
6 hours ago
add a comment |
If the IP addresses from where the requests are originating are identified as that of Google/Bing then you should not block them as this will impact your SEO. Instead of blocking them you should adjust the crawl rate in their respective Webmaster tools
Both Google and Bing offer the ability to adjust crawl rate with good flexibility.
Change Googlebot crawl rate
Bing Crawl Control
In the case of Yandex crawling your site you can add a Crawl-delay
directive to slow down the crawl rate for yandex
If you think there are spam bots that are affecting your servers (which can be determined by observing the server logs) consider using a Web Application Firewall that will block suspicious IP Addresses. Cloudflare has the ability to allow known bots and block suspicious bots based on the threat score which it calculates. Additionally you can also block certain User-agents from crawling the website
add a comment |
If the IP addresses from where the requests are originating are identified as that of Google/Bing then you should not block them as this will impact your SEO. Instead of blocking them you should adjust the crawl rate in their respective Webmaster tools
Both Google and Bing offer the ability to adjust crawl rate with good flexibility.
Change Googlebot crawl rate
Bing Crawl Control
In the case of Yandex crawling your site you can add a Crawl-delay
directive to slow down the crawl rate for yandex
If you think there are spam bots that are affecting your servers (which can be determined by observing the server logs) consider using a Web Application Firewall that will block suspicious IP Addresses. Cloudflare has the ability to allow known bots and block suspicious bots based on the threat score which it calculates. Additionally you can also block certain User-agents from crawling the website
add a comment |
If the IP addresses from where the requests are originating are identified as that of Google/Bing then you should not block them as this will impact your SEO. Instead of blocking them you should adjust the crawl rate in their respective Webmaster tools
Both Google and Bing offer the ability to adjust crawl rate with good flexibility.
Change Googlebot crawl rate
Bing Crawl Control
In the case of Yandex crawling your site you can add a Crawl-delay
directive to slow down the crawl rate for yandex
If you think there are spam bots that are affecting your servers (which can be determined by observing the server logs) consider using a Web Application Firewall that will block suspicious IP Addresses. Cloudflare has the ability to allow known bots and block suspicious bots based on the threat score which it calculates. Additionally you can also block certain User-agents from crawling the website
If the IP addresses from where the requests are originating are identified as that of Google/Bing then you should not block them as this will impact your SEO. Instead of blocking them you should adjust the crawl rate in their respective Webmaster tools
Both Google and Bing offer the ability to adjust crawl rate with good flexibility.
Change Googlebot crawl rate
Bing Crawl Control
In the case of Yandex crawling your site you can add a Crawl-delay
directive to slow down the crawl rate for yandex
If you think there are spam bots that are affecting your servers (which can be determined by observing the server logs) consider using a Web Application Firewall that will block suspicious IP Addresses. Cloudflare has the ability to allow known bots and block suspicious bots based on the threat score which it calculates. Additionally you can also block certain User-agents from crawling the website
answered 1 hour ago
Shahzeb QureshiShahzeb Qureshi
2416
2416
add a comment |
add a comment |
Matt is a new contributor. Be nice, and check out our Code of Conduct.
Matt is a new contributor. Be nice, and check out our Code of Conduct.
Matt is a new contributor. Be nice, and check out our Code of Conduct.
Matt is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Webmasters Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fwebmasters.stackexchange.com%2fquestions%2f122575%2fam-i-getting-ddos-from-crawlers%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
If your CPU heavily on legit, or even rogue bots, it's time to upgrade your hosting. Your website hosting should be able to handle users, legit bots, and even 3rd bots.
– Simon Hayter♦
4 hours ago