Author |
Message |
Serafim
Worker


Joined: Mar 25, 2006
Posts: 109
Location: Delaware Usa
|
Posted:
Sat Apr 01, 2006 10:05 am |
|
Well being a newb i once again assumed that this file would work.. Problem is I have to set it up.. What I would like to do is get this sites imput.. Since you understand exactly how these things work.. I would like to have a robots.txt that keeps bots out of all areas except the main index.. How would this be accomplished.. TIA |
_________________
|
|
|
 |
Raven
Site Admin/Owner

Joined: Aug 27, 2002
Posts: 17088
|
Posted:
Sat Apr 01, 2006 11:17 am |
|
Here is pretty much the base standard. note that you will only have /abuse/ if you have NukeSentinel(tm) installed.
User-agent: *
Disallow: /abuse/
Disallow: /admin/
Disallow: /blocks/
Disallow: /cgi-bin/
Disallow: /db/
Disallow: /images/
Disallow: /includes/
Disallow: /language/
Disallow: /modules/
Disallow: /themes/
Disallow: /admin.php
Disallow: /config.php |
|
|
|
 |
Serafim

|
Posted:
Sat Apr 01, 2006 11:35 am |
|
Ok this is what i have
User-agent: Mediapartners-Google*
Disallow:
User-agent: *
Disallow: admin.php
Disallow: /admin/
Disallow: /images/
Disallow: /includes/
Disallow: /themes/
Disallow: /blocks/
Disallow: /modules/
Disallow: /language/
Can something be added here to make this better.. Or how can i just disallow all bots in general from everything but index.php.. I would assume that for search engines they would need to have something to probe
And thanks Raven for responding |
|
|
|
 |
kguske
Site Admin

Joined: Jun 04, 2004
Posts: 6437
|
Posted:
Sat Apr 01, 2006 3:22 pm |
|
I'd go with Raven's suggestion. You don't want spiders hitting your admin and config files. Google will respect the robots.txt file, so there's no need to have a special user-agent for that. |
_________________ I search, therefore I exist...
Only registered users can see links on this board! Get registered or login! |
|
|
 |
Serafim

|
Posted:
Sat Apr 01, 2006 3:28 pm |
|
So the code that he posted i just copy and paste that in the file called robots.txt |
|
|
|
 |
kguske

|
Posted:
Sat Apr 01, 2006 3:57 pm |
|
That will work. If you don't have all the directories, it won't matter since you're telling spiders not to look there anyway... |
|
|
|
 |
Serafim

|
Posted:
Sat Apr 01, 2006 4:05 pm |
|
Ok thanks I will use that instead of whats there.. And I assume that when you list something as a folder and want to protect all its contents you use /folder/
I have a few other areas in my root for test sites and things that I wish to include.. |
|
|
|
 |
Guardian2003
Site Admin

Joined: Aug 28, 2003
Posts: 6799
Location: Ha Noi, Viet Nam
|
Posted:
Sat Apr 01, 2006 4:33 pm |
|
Serafim - You may need to look at your file again it is missing the slash before admin.php
If you copied and pasted the example Raven gave, that one is correct.
If you wanted to automatically block bots that ignore the robots.txt file that is slightly more complicated
There are example scripts if you google but a method I have found which is very dirty but effective is to place a url in your robots.txt that will trigger Sentinel - so when a bot ignored the robots.txt instruction not to visit that url, Sentinel is triggered and blocks the IP  |
|
|
|
 |
Serafim

|
Posted:
Sat Apr 01, 2006 6:17 pm |
|
Sweet well since they got dirty and ignored robots.txt then fair is fair.. Can you give an example that I may use.. Thanks for all the help |
|
|
|
 |
Guardian2003

|
Posted:
Sun Apr 02, 2006 3:57 am |
|
|
|
 |
montego
Site Admin

Joined: Aug 29, 2004
Posts: 9457
Location: Arizona
|
Posted:
Sun Apr 02, 2006 7:12 am |
|
That is just too funny! How many "bots" have you caught this way?
I personally would rather know a true exploit vs. this "dirty bot" one, so I'd have to keep a close eye on the bans and if I find it is from a bot, I would probably just add them to my .htaccess file in my "bad bot" section, then unban them.
I would never have thought of that solution! Thanks Guardian  |
_________________ Only registered users can see links on this board! Get registered or login!
Only registered users can see links on this board! Get registered or login! |
|
|
 |
Guardian2003

|
Posted:
Sun Apr 02, 2006 7:35 am |
|
Montego - yes I see your point and a very valid one it is too.
For your own purposes, you can add something like to the end of that string so it can be identified.
i.e. When the url string is emailed via Sentinel or viewed from your server logs, if the url does not have the ?id=TRAPPED at the end it was a true exploit attempt.
I have only 'trapped' about 6 bots using this method. Two of them were mass 'website downloader' type progs so it has certainly been a worthwhile experiment.
I know 6 isnt that any but when you consider that Sentinel is blocking a lot by default, 6 is quite a lot in three months since I have used this.
I suppose one could even create a unique string to an image file which you could trap by creating a 'script blocker' in Sentinel for bad bots that are looking specifically for image extensions.
Hmm now thats a thought............ |
|
|
|
 |
montego

|
Posted:
Sun Apr 02, 2006 7:42 am |
|
This is just so simplistic! I absolutely love it! |
|
|
|
 |
Guardian2003

|
Posted:
Sun Apr 02, 2006 7:53 am |
|
A new tag line.........
Sentinel does the hard work, so you don't have to. |
|
|
|
 |
Serafim

|
Posted:
Sun Apr 02, 2006 9:42 am |
|
LOL ok You lost me but I will add that to my robots.txt and see what happens. And just add the (?id-TRAPPED) to the end of the string and I will know the dirty Bot got trapped... When you do catch one can they be reported or is that a mute point.. |
|
|
|
 |
Guardian2003

|
Posted:
Sun Apr 02, 2006 9:59 am |
|
You can report them if you wish but unless you suddenly start gets lots of bans I wouldnt worry about it.
The main thing is, you are now automatically banning bad bots and saving precious (to some) bandwidth.
If you get the time, it is always worth following up by doing an IP trace and noting the results somewhere.
As I am quite lazy at times, I just open the email notification from Sentinel, and reply to myself adding any notes then save that email into a special folder in my mail software (Outlook). |
|
|
|
 |
Serafim

|
Posted:
Sun Apr 02, 2006 12:26 pm |
|
I wish to thank you for all your helpful tips and tricks.. Within moments of installing that string I busted 2 dirty bots and they were banned.. That is two funny  |
|
|
|
 |
kguske

|
Posted:
Sun Apr 02, 2006 12:33 pm |
|
Elegant idea, Guardian. Well done! |
|
|
|
 |
Guardian2003

|
Posted:
Sun Apr 02, 2006 2:29 pm |
|
Thank you kguske. If it helps this community and other nukers fight back in their war against such 'visitors' then I'm a happy chappy. |
|
|
|
 |
montego

|
Posted:
Wed Apr 05, 2006 7:06 am |
|
A word of caution that I thought of after posting to this thread: You MUST have NukeSentinel's UNION blocker turned ON at ALL times with "Block" turned on. Otherwise, you may have just had that "bot" cache your superadmin password!!!!!!!!!
Use with extreme caution or "dummy down" the "exploit" so that it still trips NukeSentinel but does NOT display anything meaningful to the bot if for some odd reason you accidentally have this turned off. |
|
|
|
 |
Serafim

|
Posted:
Wed Apr 05, 2006 2:25 pm |
|
Dummy Down?? was that a crack at me lol.. No really could you explain the dummydown thing or give some sort of example.. I have union blocker on and set to email block and forward.. The forward goes to pc killer.. |
|
|
|
 |
zzb
New Member


Joined: Jun 05, 2005
Posts: 22
Location: USA
|
Posted:
Sat Apr 29, 2006 7:09 pm |
|
Here is a link that involves using the rewrite engine and two trapping directories... in addition traps mail harvesters....
Only registered users can see links on this board! Get registered or login!
Only registered users can see links on this board! Get registered or login!
I have caught a few with this method as well !!
Cheers.
ZZ |
|
|
|
 |
montego

|
Posted:
Sun Apr 30, 2006 9:04 am |
|
Quote: |
Use with extreme caution or "dummy down" the "exploit" so that it still trips NukeSentinel but does NOT display anything meaningful to the bot if for some odd reason you accidentally have this turned off.
|
Sorry, Serafim, I must have missed your original question above about my comment. What I was referring to was not having the union which shows your nuke_authors table data. Instead of doing a union on that table, I'd try something "benign" like one of your empty tables or create a new table with nothing in it and have the u nion select go against that table instead.
It is just a cautionary measure to ensure you are not inadvertently giving up admin users and passwords to be cached by the search engine. (NOt sure if it will, but why take the change? In fact, a human hacker could use this as an exploit if one happens to forget and leave that blocker off.) |
|
|
|
 |
Serafim

|
Posted:
Sun Apr 30, 2006 9:14 am |
|
No problem Monetego. I have all blockers active except flood protection active. (Still waiting on the fix for that.) I am the only admin that has acess to sentinel so the chances that the blockers will be shut off are slim. However I am still new to the whole PHP NUke world. Do you perhaps have an alternate code that I may add to my robots text that poses less of a threat to my database info.. TIA |
|
|
|
 |
zzb

|
Posted:
Sun Apr 30, 2006 9:14 am |
|
This thread is very interesting. I suspect that combining the power of Apache and some of the trapping methods with violation of robots.txt protocol above one might come up with a method of also notifying the admin eMail when the trap has been sprung. That would confirm a bad robot indeed! Rather than checking through server logs. In otherwords if the robot is trapped it deserves to be banned regardless of what it is used for. At least that is my opinon.
There was an Apache script I recall that I will try and share here at this site. Perhaps those here with a better understanding on the internals of Nuke Sentinel might be able to customize it to set up a fool proof set of traps that would leave no doubt you would want to ban the offending bot. If I can find it I will post the code for you guys. |
|
|
|
 |
|