Author |
Message |
RavenNuke(tm) Development Team

Joined: Mar 18, 2007
Posts: 1242
Thu Apr 22, 2010 11:59 am |
A few weeks ago I blocked "betaBot" for example, but it is still accessing and harvesting - 25 hits today. What am I missing? |
Site Admin

Joined: Aug 29, 2004
Posts: 9457
Location: Arizona
Sun May 09, 2010 10:00 am |
Can't think of a code reason why it wouldn't be blocked unless:
1) you either didn't spell it correctly,
2) these are coming from different IP addresses and you have your blocker set to block at an individual IP address (rather than a higher subnet),
3) you forgot to have the setting set to "Block"
But, knowing you, I am sure none of the above apply, so, yes, that is puzzling. |
_________________ Only registered users can see links on this board! Get registered or login!
Only registered users can see links on this board! Get registered or login! |

Sun May 09, 2010 10:07 am |
Ok, what I do/did:
1. Bring up tracked IP Addresses
2. Tracked Agents
3. Select a UA string
4. Check IP - same IP but don't know why that would be important as it's the "agent" that's being blocked, not the IP - unless I'm wrong on this.
5. Click the block icon (red shield)
Check back a few days later and the supposed blocked agent is in the list again showing more than just a few access points. This happens with more than just one pesky agent.
I also noticed that if I remove an agents from the list, some are still being blocked. I'll have to gather more specifics on this, possibly in the DB to see if those agents are actually removed, etc.
Cheers |

Sun May 09, 2010 10:15 am |
More ....
Just checked the DB for NSNST/harvesters and my example betaBot was spelled in all lowercase - betabot - edited it to betaBot which is the actual spelling in the agent. I did not add this entry manually btw, just clicked on the red shield initially.
I also had php/5.2.1 blocked but just php in the agent was being blocked and "php" alone was NOT in the DB list, I removed the entry in the DB. Interesting ...
Cheers |

Sun May 09, 2010 10:34 am |
I removed betaBot from the DB list and when adding it back again, it came up as "betabot" - lowercase again and noticed in the entry box after clicking on the "gold" shield to block it that it was auto-entered in lowercase and hence the same added into the DB
To verify this a bit further, one of the agents was PycURL .. Clicking on the shield to block it, "pycurl" was the entry in the box and was entered as lowercase in the DB.
Do I need to enter this as a bug or more discussion/testing needed beforehand?
Cheers |

Tue May 11, 2010 7:16 am |
I was pretty certain that the user agent blocker is not case sensitive. Going back to the very beginning, is it possible that these agents were getting blocked but since so many are coming from different IP addresses, that NS is having to block each one it sees as it sees it? I have never blocked a user agent from the tracking. I ALWAYS add it in via the Harvestor Menu. |

Tue May 11, 2010 7:59 am |
betaBot - my example is coming from only one IP address. If I use the agent blocking function it adds it to the harvester database but in lowercase - betabot. Next day, betaBot is listed with more than one or two hits and the agent UA is betaBot, not betabot. Now, if I edit it to betaBot in the DB that takes care of it, no more visits. This is also true of other bots as well. It appears to not be a case of multiple IP addresses but rather the agent string is, in actuality, case sensitive. I'm running FreeBSD if that matters .. it may.
Here's another one:
BabalooSpider is in the string but babaloospider is what is entered in the DB and return visits are not denied, no email.
When either of these two, for instance, access the site, I don't get an email but when I edit the string to change the case to what it is supposed to be then I DO get the email that the bot(s) have been blocked.
Cheers |
Site Admin

Joined: Mar 30, 2006
Posts: 2583
Location: Pittsburgh, Pennsylvania
Tue May 11, 2010 8:44 am |
Sounds like a regex problem. |
_________________ "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." — Brian W. Kernighan. |

Tue May 11, 2010 9:14 am |
In summary ..
The actual UA that shows up in the list of Tracked Agents:
WWW-Mechanize/0.9.2 (
Click on the shield to block this agent
What shows up in the field is:
www-mechanize/0.9.2 (
Click on Save Changes
The above is what is saved to the harvester db
Next time that WWW-Mechanize/0.9.2 ( accesses the site, it will not be blocked.
The end result of this anomaly is that if a bot programmer uses upper/lowercase in the string then it will not be blocked.
Cheers |
Spouse Contemplates Divorce

Joined: Jan 02, 2003
Posts: 2496
Tue May 11, 2010 9:38 am |
I haven't looked at the code at all for a long time but it sure sounds like saved agents are getting strtolower'd. I wouldn't go changing anything let the dev team dig into this a little. Changing one thing can break other integrity tests. |
_________________ [b][size=5]openSUSE 11.4-x86 | Linux i686 | KDE: 4.6.41>=4.7 | XFCE 4.8 | AMD Athlon(tm) XP 3000+ | MSI K7N2 Delta-L | 3GB Black Diamond DDR
| GeForce 6200@433Mhz 512MB | Xorg 1.9.3 | NVIDIA 270.30[/size:2b8 |

Sat May 15, 2010 7:37 am |
The tests are definitely meant to be case insensitive, therefore, it doesn't matter if the string is forced to all lower case. However, I just recently found similar cases on my own site where I have my blocker set to "Write to .htaccess" and at the "12.0.0.*" level and I am seeing several block messages at the last node level and not one immediately after the other (btw, if the accesses come too quickly, the system will take a bit of time to get the IP written and then eventually future ones should get blocked, and this applies to all the blockers in general).
Therefore, I am convinced NS does have an issue here. |

Sat May 15, 2010 7:52 am |
When I look at the auto-entry when choosing to block an agent and if it's all lowercase where some characters should be upper then I manually change it before saving the change to the DB. Although it works it's a band-aid for now.
Thanks for looking at this a little closer.
Cheers |

Sat May 15, 2010 7:56 am |
Odd. My symptoms are I have a string of "webcapture" and I am getting block notices for an agent with "Webcapture" in it. Therefore, the string check is case insensitive, but maybe somehow the actual blocking is the problem... unfortunately I won't have any time to look at this for quite some time.  |

Sat May 15, 2010 8:06 am |
It may be OS related. I'm running FreeBSD which is case-sensitive in most respects. Could be the issue, who knows. But at present I can't see UNIX interferring with the MySQL DB in that respect, dunno at this point.
Haven't seen this issue presented here before, maybe it's going unnoticed or maybe it's so selective that it works in 99% of cases, just that mine fits the 1% case - as usual.
Cheers |
Site Admin

Joined: Aug 28, 2003
Posts: 6799
Location: Ha Noi, Viet Nam
Sun May 16, 2010 2:15 pm |
*nix, for the most part is case sensitive, which seems common with your FreeBSD OS so I'm inclined to write that off as a culprit (for now).
I would imagine that the UA is forced to lower case (strtolower) and stored that way in the DB for consistency. Likewise, I'm pretty sure that data strings for an incoming UA are similarly altered, in effect the 'compare' is being done with the case of all alpha characters in lowercase.
What I find interesting and somewhat puzzling is that if you are manually altering the UA text in the DB to match the original mixed case string it seems to have been beneficial for you; when in reality it shouldn't make any difference.
It is certainly something worth investigating further though. |

Sun May 16, 2010 2:42 pm |
If I block an uppercase UA and it stays lowercase without modification, it will not be blocked. If I edit the UA prior to saving then it is blocked, already verified that. What actually led me to this discovery is the UA with Webcapture in the string. When blocked it is saved as webcapture and subsequent hits are not blocked. Editing the string to Webcapture prior to saving will block any further hits. Strange but that's the case.
Cheers |

Sun May 16, 2010 5:43 pm |
I am not seeing how case would matter no matter what OS you are using since it is PHP that is handling everything. |

Sun May 16, 2010 6:17 pm |
If the UA string contains, for example:
So when the shield is clicked to block the UA containing "RoBot", what actually gets entered in the DB is "robot" and therefore isn't blocked. I would think that since *nix is the underlying handler then case does matter when the DB is polled. All I know is that if the agent blocker is "robot" and the UA is "RoBot", the agent isn't blocked, isn't here, unless it is represented in the DB exactly as it is in the UA string. |

Sun May 16, 2010 6:29 pm |
I understand the problem . Just saying that I don't see how OS could be the culprit. |

Sun May 16, 2010 6:46 pm |
Ok, I may need some instruction here ....
Does MySQL running on FreeBSD adhere to the case sensitive rules of UNIX?
I have Mod_Spel enabled, does that come into play here?
Cheers |
The Mouse Is Extension Of Arm

Joined: Mar 06, 2004
Posts: 1164
Mon May 24, 2010 6:27 pm |
I thought I should mention that UA discovery and isolation is in fact Case-Sensitive.
UA discover and blocking could be made case-insensitive; if (for example) the NC function was utilized in the .htaccess.
I would particularly refer to: Only registered users can see links on this board! Get registered or login! |
_________________ Steph Benoit
100% Section 508 and W3C HTML5 and CSS Compliant (Truly) Code, because I love compliance. |

Thu Jun 03, 2010 3:14 pm |
Well, the real question at hand is how is NukeSentinel(tm) handling this. I have checked the Harvestor code within includes/nukesentinel.php and it is case insensitive using this code here (only a part of the code):
stristr($nsnst_const['user_agent'], $harvest)
Therefore, if the string in the Harvestor list is "robot", it will catch "RoBot" in the User Agent string. I suspect there is a bug somewhere else in the NS code causing the originally noted issue that the RN team is going to have to hunt down. |

Thu Jun 03, 2010 3:53 pm |
Not flying completely blind here but would it depend on the server OS being case-sensitive? I'm running FreeBSD with Apache and if the string is "robot" it does NOT catch RoBot. I'm also running with mod_spel activated if that makes any difference, dunno.
Cheers |

Mon Jun 07, 2010 6:32 am |
No, as Guardian mentioned already, server OS has nothing to do with this. Also as noted previously, I experienced a similar issue leading me to believe this to be a NS issue. |

Mon Jun 07, 2010 6:48 am |
Good, since you have similar issues then either BOTH of us are going nutz together OR there is a real issue. I'm on your side re: real issue !!
Cheers |