Author |
Message |
mercman
Regular


Joined: Nov 29, 2006
Posts: 64
Location: TN, USA
|
Posted:
Wed Dec 20, 2006 4:39 pm |
|
Hi all!
I am running Raven's 76 v2.02.02 distro with Sentinel v.2.5.03 and have been getting abuse-flood reports from Sentinel for what I believe are Google bot crawlers. See sample below.
I reviewed a few posts on this subject, but nothing really seems to fit the bill.
I've checked my "robots.txt" file and it seems ok, but... ??
My "robots.txt":
Code:User-agent: *
Disallow: /abuse/
Disallow: /admin/
Disallow: /blocks/
Disallow: /cgi-bin/
Disallow: /db/
Disallow: /images/
Disallow: /includes/
Disallow: /language/
Disallow: /modules/
Disallow: /themes/
Disallow: /admin.php
Disallow: /config.php
|
Can someone point me in the right direction to correct this please?
Thank you,
-Merc |
|
|
|
 |
Guardian2003
Site Admin

Joined: Aug 28, 2003
Posts: 6799
Location: Ha Noi, Viet Nam
|
Posted:
Wed Dec 20, 2006 6:35 pm |
|
I have yet to see Google flood as site with http requests (unlike Slurp).
What setting is your flood level set too? You might want to knock it down a peg or two. |
|
|
|
 |
mercman

|
Posted:
Wed Dec 20, 2006 6:44 pm |
|
Guardian,
Thank you for your reply.
The Flood delay is set for 2 seconds (default?).
-Merc |
|
|
|
 |
Guardian2003

|
Posted:
Wed Dec 20, 2006 6:49 pm |
|
Out of curiosity, whats the 'default' blocker template set to?
I'm not saying Google didnt cause the flood blocker to trip it's just the first time I have known it to happen. Google is usuallly pretty well behaved.
I'm not sure on the default flood setting, mine has always been set to 3 seconds. |
|
|
|
 |
mercman

|
Posted:
Wed Dec 20, 2006 6:54 pm |
|
Activate:Email Admin
Write to htaccess:No
Forward to:""
IP Block Type:Full IP (127.2.3.4)
Default Page: Flood
Email IP Lookup:Off
Reason:Abuse-Flood
Block Duration:Permanent |
|
|
|
 |
Guardian2003

|
Posted:
Wed Dec 20, 2006 7:06 pm |
|
Interesting. So it might not have been a *real* flood after all as that is the default template.
I would be tempted to disregard the warning email on this occassion. |
|
|
|
 |
mercman

|
Posted:
Wed Dec 20, 2006 7:12 pm |
|
Yeah, I sort of figured to.
I am making some changes in the templates now (Write to htaccess:on, Email IP Lookup:on, etc.).
Do you think I should "open" up the flood delay to maybe 3 or 4 seconds?
-Merc |
|
|
|
 |
evaders99
Former Moderator in Good Standing

Joined: Apr 30, 2004
Posts: 3221
|
Posted:
Wed Dec 20, 2006 10:41 pm |
|
I suggest using the Crawl-delay parameter in robots.txt |
_________________ - Only registered users can see links on this board! Get registered or login! -
Need help? Only registered users can see links on this board! Get registered or login! |
|
|
 |
Guardian2003

|
Posted:
Thu Dec 21, 2006 2:54 am |
|
From the Yahoo help pages
Quote: | Frequency of Access
There is a Yahoo-Blogs/v3.9 specific extension to robots.txt which allows you to set a lower limit on our crawler request rate.
You can add a "Crawl-delay: xx" instruction, where "xx" is the minimum delay in seconds between successive crawler accesses. Our default crawl-delay value is 1 second. If the crawler rate is a problem for your server, you can set the delay up to up to 5 or 20 or a comfortable value for your server.
Setting a crawl-delay of 20 seconds for Yahoo-Blogs/v3.9 would look something like:
User-agent: Yahoo-Blogs/v3.9
Crawl-delay: 20 |
Not all bots will obey the directive but Slurp, Google and MSNBot should.
For Google you would use something like
Code:User-agent: Google
Crawl-delay: 20
|
|
|
|
|
 |
montego
Site Admin

Joined: Aug 29, 2004
Posts: 9457
Location: Arizona
|
Posted:
Thu Dec 21, 2006 9:17 am |
|
Google ignores the Crawl-delay parameter (so says their robot.txt validation checked), BUT, I still use this for other bots, specifically yahoo! It was tripping the flood all the time on me. |
_________________ Only registered users can see links on this board! Get registered or login!
Only registered users can see links on this board! Get registered or login! |
|
|
 |
jakec
Site Admin

Joined: Feb 06, 2006
Posts: 3048
Location: United Kingdom
|
Posted:
Thu Dec 21, 2006 2:46 pm |
|
If you have your site registered with Google and have verified your site, there is an option to set the crawl speed. Normal or Slower.
Once you have selected your domain under webmaster tools the option is located under Diagnostic and then Crawl rate.
I hope this helps.
Jakec |
|
|
|
 |
montego

|
Posted:
Fri Dec 22, 2006 6:48 am |
|
jakec, I had forgotten about that. Thanks!
I am in agreement, though, with Guardian in that I have not had an issue with Google's bot tripping the flood. Maybe it was a one-off issue??? |
|
|
|
 |
mercman

|
Posted:
Fri Dec 22, 2006 5:39 pm |
|
Everyone, thank you so much for all the input and suggestions.
I too, think this was simply a "one-off" issue; I had just that day placed a new Google ad on the site. (Thanks go out to hitwalker for fixing my blocks!)
My site is registered with Google, so I'll check out that setting in the Webmaster tools. Thank you jakec!
I'll definately check/add that setting in the robots.txt file.
Thank you all
-Merc |
|
|
|
 |
montego

|
Posted:
Sun Dec 24, 2006 9:02 am |
|
|
|
 |
|