Ravens PHP Scripts: Forums
 

 

View next topic
View previous topic
Post new topic   Reply to topic    Ravens PHP Scripts And Web Hosting Forum Index -> Security - PHP Nuke
Author Message
Serafim
Worker
Worker



Joined: Mar 25, 2006
Posts: 109
Location: Delaware Usa

PostPosted: Tue May 09, 2006 12:08 pm Reply with quote

Hey all looking for a few suggestions from the vetran nukers. I have been bombarded by the slurp bot. I am becoming very annoyed. I have blocked them at the sentinel level and many of their ips are hitting the .htaccess..
But I have blocked at lease 10 in the past two days and I think they should abide by the robots.txt.. Besides banning them what can I do to stop this..

I have currecntly disallowed all yahoo email accounts all yahoo search and anything yahoo related on my site..

_________________
Image 
View user's profile Send private message Send e-mail Visit poster's website MSN Messenger
Guardian2003
Site Admin



Joined: Aug 28, 2003
Posts: 6799
Location: Ha Noi, Viet Nam

PostPosted: Tue May 09, 2006 12:44 pm Reply with quote

Yahoo has its own bots which I believe uses the useragent 'Slurp', the problem is that almost anyone can use a Yahoo bot for their own purposes - which is why there are so many of the darn things and they will either show with the Slurp useragent or 'inktomisearch.com' as the useragent depending on the IP and what script you are using to monitor them.

There are several things you can do here.

You can slow the crawling process by adding this to your robots.txt file
Code:
User-agent: Slurp

Crawl-delay: 30
This will create a delay between accesses, the higher the number, the greater the 'delay'.

A genuine Slurp bot should obey the directives in htaccess - but don't forget, it may take some time for them to re-crawl that and thus obey any changes in robots.txt that you have made.

You could ban them completely if you feel it is getting out of control with something like this in the top of your htaccess
Code:
 RewriteCond %{HTTP_USER_AGENT} Slurp

RewriteRule ^.*$ X.html [L]
 
View user's profile Send private message Send e-mail
Serafim







PostPosted: Tue May 09, 2006 12:49 pm Reply with quote

The number 30 does that equal days minutes or what.. Its not that its out of control Sentinel gets them and bans them its just annoying.. I can expect at least three perday to be banned I guess beyond adding those lines of code i will just need to set my email to not beforwarded on a harvest.. Thanks for the help..
 
Guardian2003







PostPosted: Tue May 09, 2006 3:13 pm Reply with quote

As far as I can tell, the delay amount refers to minutes.
3 Slurps per day is not actually that many, I have had dozens of them at any given time before I finally got fed up with them and banned them.
If you google for 'Crawl-delay' you will probably find Yahoo's help pages which can give you more information, including an online form requesting them not to crawl your site.
 
Serafim







PostPosted: Tue May 09, 2006 4:05 pm Reply with quote

Ok I added the delay to my robot.txt I know there is not may but thats just an average before I added slurp to my harvest list they would hit the site 20 or so times I figure if even three hit it per day then thats a waste of bandwidth. I don't yahoo anything and I had to email them and request that certain content be removed from their search engine. Its just the principal of the matter. If they cannot play by the rules I do not want them near my site. If they just visited my index page it would be no issue. Im seeing my modules attempts at admin log in and even posts from these forums in their search engine.. Its just creepy lol.. Thanks again
 
guidyy
Worker
Worker



Joined: Nov 22, 2004
Posts: 208
Location: Italy

PostPosted: Tue May 09, 2006 11:07 pm Reply with quote

that's a problem with all search engines
they follow links and do not know what's for admin, what's for posting and so on
the only thing you can do is via the robots.txt.
 
View user's profile Send private message Visit poster's website MSN Messenger
Guardian2003







PostPosted: Wed May 10, 2006 12:41 am Reply with quote

Serafim - please post the contents of your robots.txt file.
 
guidyy







PostPosted: Wed May 10, 2006 5:44 am Reply with quote

This is mine for a googleTapped site.

User-agent: *
Disallow: /admin/
Disallow: /images/
Disallow: /includes/
Disallow: /themes/
Disallow: /blocks/
Disallow: /modules/
Disallow: /language/
Disallow: /admin.php
Disallow: /config.php
Disallow: /cgi-bin/
Disallow: /feedback.html
Disallow: /reccomend.html
Disallow: /members.html
Disallow: /messages.html
Disallow: /account.html
Disallow: /submit.html
Disallow: /top.html
Disallow: /stats.html
Disallow: /fsearch-newposts.html
Disallow: /fsearch-egosearch.html
Disallow: /fsearch-unanswered.html
Disallow: /forums-group6.html
Disallow: /forums-group7.html
Disallow: /forums-groupcp.html
Disallow: /forums-search.html
Disallow: /forum-editprofile.html
Disallow: /message-post
Disallow: /ftopic-new
Disallow: /ftopic-reply
Disallow: /ftopic-quote
Disallow: /forum-userprofile
Disallow: /ratelink-
Disallow: /linkop-AddLink.html
Disallow: /messages-inbox.html
Disallow: /messages-post
Disallow: /modules.php

The last directive is to tell spiders they do not have to care about whatever is not tapped. If you dont use googleTap you need to drop it.
So far, it's working pretty good.
Guido
 
Serafim







PostPosted: Wed May 10, 2006 2:33 pm Reply with quote

User-agent: *
Crawl-delay: 30
Disallow:
/modules.php?name=edited****************************
Disallow: /abuse/
Disallow: /admin/
Disallow: /blocks/
Disallow: /cgi-bin/
Disallow: /db/
Disallow: /images/
Disallow: /includes/
Disallow: /language/
Disallow: /modules/
Disallow: /themes/
Disallow: /admin.php
Disallow: /config.php
Disallow: /conf/
Disallow: /chat/
Disallow: /other/
Disallow: /scripts/
Disallow: /ebot/
Disallow: /botsv/
Disallow: /botsi/


I recieved a letter from Yahoo this afternoon that stated that their robots are incomplience with robots standards... Perhaps it is me who messed up I assumed that if I placed /modules/ that would cover all portions of the modules including modules.php... They said in order to keep the robots from crawling that specific page I would have to place it as modules.php.. Ill place the letter that they sent


I have investigated the issue but was not able to find any exclusions
for the following URLs in your robots.txt:

www.disciplesofcain.com/modules.php?name=Forums
www.disciplesofcain.com/modules.php?name=Forums
www.disciplesofcain.com/modules.php

It seems that Slurp is acting in compliance with the robots.txt
exclusion standard of 1994. Your current robots.txt reads:

see above
 
Serafim







PostPosted: Sun May 14, 2006 12:07 am Reply with quote

Just an update about them spiders.. I added the whole /modules.php to my robots txt.. Funny thing tho after Yahoo contacted me about my loathing of their spiders all the spiders have stopped. Not one trace of them has hit me in days.. Just thought I would share that thanks for all the help again
 
Guardian2003







PostPosted: Sun May 14, 2006 1:27 am Reply with quote

No problem, glad they are leaving you in peace now Wink
 
Display posts from previous:       
Post new topic   Reply to topic    Ravens PHP Scripts And Web Hosting Forum Index -> Security - PHP Nuke

View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You can attach files in this forum
You can download files in this forum


Powered by phpBB © 2001-2007 phpBB Group
All times are GMT - 6 Hours
 
Forums ©