Ravens PHP Scripts: Forums
 

 

Search found 10 matches
Author Message
 Topic: Majestic12.co.uk
majestic-12

Replies: 25
Views: 15172

PostForum: General/Other Stuff   Posted: Fri Oct 28, 2005 9:02 am   Subject: Re: Majestic12.co.uk
Its okay - I am not happy with this solution but in short term its probably best -- I _DID_ change url loading software to fix your urls, but some from previous load may have slipped in.

If you do ...
 Topic: Majestic12.co.uk
majestic-12

Replies: 25
Views: 15172

PostForum: General/Other Stuff   Posted: Thu Oct 27, 2005 7:27 pm   Subject: Re: Majestic12.co.uk
Would it be a good idea to ban all your IP´s for the next 1-3 month or how long ?

No, it would not be a good idea because we use distributed model and number of IPs is very high with new ones cons ...
 Topic: Majestic12.co.uk
majestic-12

Replies: 25
Views: 15172

PostForum: General/Other Stuff   Posted: Thu Oct 27, 2005 5:17 pm   Subject: Re: re: Majestic12.co.uk
Susann -- these are likely to be old urls - the change you make on site do not have immediate effect on old crawled data -- note that these urls you referenced exibit same error we discussed above - n ...
 Topic: Majestic12.co.uk
majestic-12

Replies: 25
Views: 15172

PostForum: General/Other Stuff   Posted: Mon Sep 26, 2005 3:53 pm   Subject: Re: Majestic12.co.uk
Yes, I understand, but how do I do this exactly ?

Well, you will need to edit the code that appends those SIDs, from what I can see you must be using some internal re-writing to have nice ".html"' ...
 Topic: Majestic12.co.uk
majestic-12

Replies: 25
Views: 15172

PostForum: General/Other Stuff   Posted: Mon Sep 26, 2005 2:40 pm   Subject: Re: Majestic12.co.uk
just the SID disable it completely That isn t so easy.

You probably right about this -- but you definately need to fix your URLs by changing & to ?, because without it a URL parsing routine wi ...
 Topic: Majestic12.co.uk
majestic-12

Replies: 25
Views: 15172

PostForum: General/Other Stuff   Posted: Mon Sep 26, 2005 1:32 pm   Subject: Re: Majestic12.co.uk
Your rewrite is fine, its just the SID bit that's the problem, if I were you I'd disable it completely because even though my bot understands it (provided URL is properly formatted), but others won't.
 Topic: Majestic12.co.uk
majestic-12

Replies: 25
Views: 15172

PostForum: General/Other Stuff   Posted: Mon Sep 26, 2005 8:49 am   Subject: Re: Majestic12.co.uk
I am back!

As promised I tested my code to see if there was a bug. Now my code was NOT removing your session ID, but there is a good reason for it -- your URL is actually not correct because you us ...
 Topic: Majestic12.co.uk
majestic-12

Replies: 25
Views: 15172

PostForum: General/Other Stuff   Posted: Thu Sep 22, 2005 6:36 pm   Subject: Re: Majestic12.co.uk
Thanks - best wishes to whatever you do in cyber life too! Smile

I did have friendly discussion with Yacy people but they did not agree with me, which is fine -- perhaps I am wrong and P2P is possible ...
 Topic: Majestic12.co.uk
majestic-12

Replies: 25
Views: 15172

PostForum: General/Other Stuff   Posted: Thu Sep 22, 2005 6:20 pm   Subject: Re: Majestic12.co.uk
The bot does NOT ignore robots.txt and it support Crawl-Delay parameter to have bigger than normal (1 sec) delay between requests.

I do have SID filtering implemented, however I am going to recheck ...
 Topic: Majestic12.co.uk
majestic-12

Replies: 25
Views: 15172

PostForum: General/Other Stuff   Posted: Thu Sep 22, 2005 5:55 pm   Subject: Re: re: Majestic12.co.uk
Hi there,

I am the creator of the bot -- found this forum just like you found my bot - from the log file Smile

I am suprised session ID was present in the URL because a few months ago I implemented ...
 

 Jump to:   

Powered by phpBB © 2001-2007 phpBB Group
All times are GMT - 6 Hours
 
Forums ©