Ravens PHP Scripts: Forums
 

 

View next topic
View previous topic
Post new topic   Reply to topic    Ravens PHP Scripts And Web Hosting Forum Index -> Apache
Author Message
NoFantasy
Worker
Worker



Joined: Apr 26, 2005
Posts: 114

PostPosted: Thu Dec 14, 2006 8:11 am Reply with quote

Some time back my web server provider messed up my site, now i'm stuck with loads of links looking like
Code:
backend.php?PHPSESSID=c6ee420bed2deb5b047ac722e46a440f

forums.html?PHPSESSID=c6ee420bed2deb5b047ac722e46a440f

in the google search...and probably the other engines aswell.
Using Redirect 301 this should take care of this, but i don't know how to remove the PHPSESSID from ALL cached links in my site in one.

A "normal" way to implement a 301 is with
Code:
RedirectMatch 301 /someoldlink-([0-9]*).html http://www.domain.com/goodlink-$1.html

...right? Ok, now this won't work when a ? is involved in the link, so i figured i have to use a format similar to
Code:
RewriteCond %{QUERY_STRING} (name=Forums&file=index) [NC]

RewriteRule ^.*$ /forums.html [R=301,L]

...but how should the format be for this to work properly for any cached link containing PHPSESSID=(number) without messing up anything else?
 
View user's profile Send private message
montego
Site Admin



Joined: Aug 29, 2004
Posts: 9457
Location: Arizona

PostPosted: Fri Dec 15, 2006 6:59 am Reply with quote

Maybe try something like this:

RewriteRule ^([a-zA-Z0-9._- ]*)?PHPSESSID=([a-z0-9]*)$ $1 [R=301,L]

It looked like you are using some form of URL rewriting already (such as: GoogleTap, GTNG, ShortLInks). You may need to add more characters to the first () pair. You may have to review your cached links to see if all of them will be covered by this.

_________________
Only registered users can see links on this board! Get registered or login!
Only registered users can see links on this board! Get registered or login! 
View user's profile Send private message Visit poster's website
evaders99
Former Moderator in Good Standing



Joined: Apr 30, 2004
Posts: 3221

PostPosted: Fri Dec 15, 2006 9:05 am Reply with quote

I don't think RewriteRule works with parameters. I think you're going to have to use RewriteCond on QUERY_STRING - that has never really worked for me, so I use THE_REQUEST

_________________
- Only registered users can see links on this board! Get registered or login! -

Need help? Only registered users can see links on this board! Get registered or login! 
View user's profile Send private message Visit poster's website
hitwalker
Sells PC To Pay For Divorce



Joined:
Posts: 5661

PostPosted: Fri Dec 15, 2006 9:11 am Reply with quote

well let me be the messenger of bad news.....
as i (together with a friend) build a new site that comes with new links as well google is now replacing the old with the new...
that took 5 months.....
yeah google is a lazy engine..
Laughing
 
View user's profile Send private message
NoFantasy







PostPosted: Fri Dec 15, 2006 10:06 am Reply with quote

Yes, im using the ShortLinks-mod...which of course does the job very well Razz
Initial problem with the phpsessid was this post:
http://www.ravenphpscripts.com/postt12000.html
...now, i solved that problem, however google did pick up quite a few of those links.

I'll go google for THE_REQUEST, see if any already have a solution, thanks for suggesting.

Oh, lol...5 months..? I have three years old links in google from before i started out with phpnuke, google still belive pages are are around Razz
 
hitwalker







PostPosted: Fri Dec 15, 2006 10:12 am Reply with quote

yeah google is terrible...
 
montego







PostPosted: Sat Dec 16, 2006 6:18 am Reply with quote

Well, Evaders, I am going to have to research that because I could have sworn it worked for me. I even tested it locally and reviewed the Apache access logs to see what error codes were returned, etc. But, one thing I did not do, was actually see if I could still find the "offending links" in Google's cache. Embarassed
 
NoFantasy







PostPosted: Sun Dec 17, 2006 9:07 am Reply with quote

Very Happy Very Happy Very Happy Very Happy
Code:
RewriteCond %{QUERY_STRING} ^phpsessid=.*$ [NC]

RewriteRule .* %{REQUEST_URI}? [R=301,L]

Don't ask me what it actually does, but it does remove the crap from inbound google links! Let's hope i didn't break something else Laughing

Now i'm eager to see if they are gone in the google cache in a month or two!

Btw, how will this addy in robots.txt work?
Code:
Disallow: /*phpsessid

I was thinking, it can't hurt having the block in robots.txt, right..? Or will it work the other way, and refuse to even go to my redirect because of the block in robots.txt?
 
montego







PostPosted: Mon Dec 18, 2006 5:46 am Reply with quote

Quote:

Now i'm eager to see if they are gone in the google cache in a month or two!


Me too.

Regarding robots.txt, I don't know enough about it to know if it can block query strings...
 
hitwalker







PostPosted: Mon Dec 18, 2006 6:35 am Reply with quote

How about ...

// See if the user agent is Googlebot
$isGoogle = stripos($_SERVER['HTTP_USER_AGENT'], 'Googlebot');
// If it is, use ini_set to only allow cookies for the session variable
if ($isGoogle !== false) {
ini_set('session.use_only_cookies', '1');
}
 
hitwalker







PostPosted: Mon Dec 18, 2006 6:46 am Reply with quote

And...

Google’s Hidden Protocol
Google’s URL removal page contains a little bit of handy information that’s not found on their webmaster info pages where it should be.

Google supports the use of “wildcards” in robots.txt files.
This isn’t part of the original 1994 robots.txt protocol, and as far as I know, is not supported by other search engines.
To make it work, you need to add a separate section for Googlebot in your robots.txt file.
An example:

User-agent: Googlebot
Disallow: /*sort=

This would stop Googlebot from reading any URL that included the string “sort=” no matter where that string occurs in the URL.
So if you have a shopping cart, and use a variable called “sort” in some URLs, you can stop Googlebot from reading the sorted (but basically duplicate) content that your site produces for users.
Every search engine should support this. It would make real life a lot easier for folks with dynamic sites, and artificial life a lot easier for spiders.
So you could easely use "phpsessid"
 
NoFantasy







PostPosted: Mon Dec 18, 2006 7:48 am Reply with quote

Hm, thanks Hitwalker, good information.
I did a bit of a research based on this and found these:
http://www.ysearchblog.com/archives/000372.html
http://www.google.com/support/webmasters/bin/answer.py?answer=40367
http://www.seoegghead.com/blog/seo/wildcard-robotstxt-matching-is-now-almost-standard-p158.html
http://www.mcanerin.com/search-engine/robots-txt.htm

Basically it says that wildcards ARE supported (and others) by at least the three bigger engines as Google, Yahoo and MSN. Guess your wish just came trough (happy x-mas, lol)

...shopping cart..? Yeah, and what about reviews, web links and calendar modules? Lol, they suck...now we know how to actually stop them from indexing 12.000 pages when they should only do 100.

Worst part seems to get rid of the duplicates already indexed showing up as supplemental results.
When the times come, and this hopefully works, i really really hope someone (that means Montego Razz ) implement this into a mod_rewrite package...and i'm more than willing to help out as best as i can even if my knowledge in php and programming is rather limited.
 
hitwalker







PostPosted: Mon Dec 18, 2006 7:56 am Reply with quote

nobody said this is easy but fact is,google maybe good but its world nr 1 lazy engine.
others like yahoo are much faster in updating.....
ive seen links of sites that were closed 6 months ago but still exist in google...
but thats what you get when you gamble on one stupid lazy horse... killing me
 
montego







PostPosted: Tue Dec 19, 2006 6:25 am Reply with quote

Quote:

When the times come, and this hopefully works, i really really hope someone (that means Montego ) implement this into a mod_rewrite package


Not quite sure what you are looking for? The problem statement being addressed in this thread is around an Embarassed by your hosting company where they forced all the URL's to show the session var/id. You are the only one that I have heard of this happening too.

The rest of the thread is devoted to trying to get these "bogus" URL's removed from the search engine cache. Again, all due to this one issue.

I'd be glad to discuss specific on any enhancements that you might like to see in ShortLinks. Just add them to my Only registered users can see links on this board! Get registered or login! forum and we'll talk through them.
 
NoFantasy







PostPosted: Tue Dec 19, 2006 8:24 am Reply with quote

...yah, fully aware of i went off-topic somewhere up there, it should have been separate threads. Getting rid of duplicates is a general topic, all who implements any mod_rewrite package will suffer from it.

Anyways, i feel like continue this on your forum, is probably just as good, since all this actually matters the way we rewrite links and related to it.
 
montego







PostPosted: Wed Dec 20, 2006 7:49 am Reply with quote

Quote:

Getting rid of duplicates is a general topic, all who implements any mod_rewrite package will suffer from it.


According to what you have posted even on my site, even without mod rewriting of URL's, you are proposing that there is still an issue with duplicate content.

I don't mind talking about the duplicate content issue here on Raven's site. You had mentioned a possible enhancement for ShortLinks and so that is why I suggested discussing that specifically over at my site. No problem either way. Only that discussing it here on Raven's site will get more traffic and more people to weigh in.
 
NoFantasy







PostPosted: Wed Dec 20, 2006 8:22 pm Reply with quote

montego wrote:
According to what you have posted even on my site, even without mod rewriting of URL's, you are proposing that there is still an issue with duplicate content.

Yes, indeed...with or without rewritten links, it will create duplicates like a mad man, so it's not an issue that comes from ShortLinks, it's an issue in general.
 
Display posts from previous:       
Post new topic   Reply to topic    Ravens PHP Scripts And Web Hosting Forum Index -> Apache

View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You can attach files in this forum
You can download files in this forum


Powered by phpBB © 2001-2007 phpBB Group
All times are GMT - 6 Hours
 
Forums ©