Author |
Message |
NoFantasy
Worker
data:image/s3,"s3://crabby-images/8b787/8b787549c86734a98c61309018e332528520bc6f" alt="Worker Worker"
data:image/s3,"s3://crabby-images/6ea31/6ea3138e9a23822aea960115951a6c1ae34639ea" alt=""
Joined: Apr 26, 2005
Posts: 114
|
Posted:
Thu Dec 14, 2006 8:11 am |
|
Some time back my web server provider messed up my site, now i'm stuck with loads of links looking like
Code:backend.php?PHPSESSID=c6ee420bed2deb5b047ac722e46a440f
forums.html?PHPSESSID=c6ee420bed2deb5b047ac722e46a440f
|
in the google search...and probably the other engines aswell.
Using Redirect 301 this should take care of this, but i don't know how to remove the PHPSESSID from ALL cached links in my site in one.
A "normal" way to implement a 301 is with
Code:RedirectMatch 301 /someoldlink-([0-9]*).html http://www.domain.com/goodlink-$1.html
|
...right? Ok, now this won't work when a ? is involved in the link, so i figured i have to use a format similar to
Code:RewriteCond %{QUERY_STRING} (name=Forums&file=index) [NC]
RewriteRule ^.*$ /forums.html [R=301,L]
|
...but how should the format be for this to work properly for any cached link containing PHPSESSID=(number) without messing up anything else? |
|
|
|
data:image/s3,"s3://crabby-images/74676/7467655c43f84619d5d7cf725b1d668453dba0fe" alt="" |
montego
Site Admin
data:image/s3,"s3://crabby-images/90769/907690f0b3800b7c3631940ce09741fc8d7ec9ba" alt=""
Joined: Aug 29, 2004
Posts: 9457
Location: Arizona
|
Posted:
Fri Dec 15, 2006 6:59 am |
|
Maybe try something like this:
RewriteRule ^([a-zA-Z0-9._- ]*)?PHPSESSID=([a-z0-9]*)$ $1 [R=301,L]
It looked like you are using some form of URL rewriting already (such as: GoogleTap, GTNG, ShortLInks). You may need to add more characters to the first () pair. You may have to review your cached links to see if all of them will be covered by this. |
_________________ Only registered users can see links on this board! Get registered or login!
Only registered users can see links on this board! Get registered or login! |
|
|
data:image/s3,"s3://crabby-images/74676/7467655c43f84619d5d7cf725b1d668453dba0fe" alt="" |
evaders99
Former Moderator in Good Standing
data:image/s3,"s3://crabby-images/c915b/c915ba1715f1389dcc5b042d6c45c550b39402b4" alt=""
Joined: Apr 30, 2004
Posts: 3221
|
Posted:
Fri Dec 15, 2006 9:05 am |
|
I don't think RewriteRule works with parameters. I think you're going to have to use RewriteCond on QUERY_STRING - that has never really worked for me, so I use THE_REQUEST |
_________________ - Only registered users can see links on this board! Get registered or login! -
Need help? Only registered users can see links on this board! Get registered or login! |
|
|
data:image/s3,"s3://crabby-images/74676/7467655c43f84619d5d7cf725b1d668453dba0fe" alt="" |
hitwalker
Sells PC To Pay For Divorce
data:image/s3,"s3://crabby-images/6ea31/6ea3138e9a23822aea960115951a6c1ae34639ea" alt=""
Joined:
Posts: 5661
|
Posted:
Fri Dec 15, 2006 9:11 am |
|
well let me be the messenger of bad news.....
as i (together with a friend) build a new site that comes with new links as well google is now replacing the old with the new...
that took 5 months.....
yeah google is a lazy engine..
data:image/s3,"s3://crabby-images/fcf8e/fcf8e4df77386d78fa661507aa6b2fe6c84e74e7" alt="Laughing" |
|
|
|
data:image/s3,"s3://crabby-images/74676/7467655c43f84619d5d7cf725b1d668453dba0fe" alt="" |
NoFantasy
data:image/s3,"s3://crabby-images/6ea31/6ea3138e9a23822aea960115951a6c1ae34639ea" alt=""
|
Posted:
Fri Dec 15, 2006 10:06 am |
|
Yes, im using the ShortLinks-mod...which of course does the job very well
Initial problem with the phpsessid was this post:
http://www.ravenphpscripts.com/postt12000.html
...now, i solved that problem, however google did pick up quite a few of those links.
I'll go google for THE_REQUEST, see if any already have a solution, thanks for suggesting.
Oh, lol...5 months..? I have three years old links in google from before i started out with phpnuke, google still belive pages are are around data:image/s3,"s3://crabby-images/6a7be/6a7be06ec6689d5d6c657894c0a31f577e30bdbb" alt="Razz" |
|
|
|
data:image/s3,"s3://crabby-images/74676/7467655c43f84619d5d7cf725b1d668453dba0fe" alt="" |
hitwalker
data:image/s3,"s3://crabby-images/6ea31/6ea3138e9a23822aea960115951a6c1ae34639ea" alt=""
|
Posted:
Fri Dec 15, 2006 10:12 am |
|
yeah google is terrible... |
|
|
|
data:image/s3,"s3://crabby-images/74676/7467655c43f84619d5d7cf725b1d668453dba0fe" alt="" |
montego
data:image/s3,"s3://crabby-images/6ea31/6ea3138e9a23822aea960115951a6c1ae34639ea" alt=""
|
Posted:
Sat Dec 16, 2006 6:18 am |
|
Well, Evaders, I am going to have to research that because I could have sworn it worked for me. I even tested it locally and reviewed the Apache access logs to see what error codes were returned, etc. But, one thing I did not do, was actually see if I could still find the "offending links" in Google's cache. data:image/s3,"s3://crabby-images/e0bb8/e0bb8ae632d5403d592207c5f3b606b6fd5d39bf" alt="Embarassed" |
|
|
|
data:image/s3,"s3://crabby-images/74676/7467655c43f84619d5d7cf725b1d668453dba0fe" alt="" |
NoFantasy
data:image/s3,"s3://crabby-images/6ea31/6ea3138e9a23822aea960115951a6c1ae34639ea" alt=""
|
Posted:
Sun Dec 17, 2006 9:07 am |
|
Code:RewriteCond %{QUERY_STRING} ^phpsessid=.*$ [NC]
RewriteRule .* %{REQUEST_URI}? [R=301,L]
|
Don't ask me what it actually does, but it does remove the crap from inbound google links! Let's hope i didn't break something else
Now i'm eager to see if they are gone in the google cache in a month or two!
Btw, how will this addy in robots.txt work?
Code:Disallow: /*phpsessid
|
I was thinking, it can't hurt having the block in robots.txt, right..? Or will it work the other way, and refuse to even go to my redirect because of the block in robots.txt? |
|
|
|
data:image/s3,"s3://crabby-images/74676/7467655c43f84619d5d7cf725b1d668453dba0fe" alt="" |
montego
data:image/s3,"s3://crabby-images/6ea31/6ea3138e9a23822aea960115951a6c1ae34639ea" alt=""
|
Posted:
Mon Dec 18, 2006 5:46 am |
|
Quote: |
Now i'm eager to see if they are gone in the google cache in a month or two!
|
Me too.
Regarding robots.txt, I don't know enough about it to know if it can block query strings... |
|
|
|
data:image/s3,"s3://crabby-images/74676/7467655c43f84619d5d7cf725b1d668453dba0fe" alt="" |
hitwalker
data:image/s3,"s3://crabby-images/6ea31/6ea3138e9a23822aea960115951a6c1ae34639ea" alt=""
|
Posted:
Mon Dec 18, 2006 6:35 am |
|
How about ...
// See if the user agent is Googlebot
$isGoogle = stripos($_SERVER['HTTP_USER_AGENT'], 'Googlebot');
// If it is, use ini_set to only allow cookies for the session variable
if ($isGoogle !== false) {
ini_set('session.use_only_cookies', '1');
} |
|
|
|
data:image/s3,"s3://crabby-images/74676/7467655c43f84619d5d7cf725b1d668453dba0fe" alt="" |
hitwalker
data:image/s3,"s3://crabby-images/6ea31/6ea3138e9a23822aea960115951a6c1ae34639ea" alt=""
|
Posted:
Mon Dec 18, 2006 6:46 am |
|
And...
Google’s Hidden Protocol
Google’s URL removal page contains a little bit of handy information that’s not found on their webmaster info pages where it should be.
Google supports the use of “wildcards” in robots.txt files.
This isn’t part of the original 1994 robots.txt protocol, and as far as I know, is not supported by other search engines.
To make it work, you need to add a separate section for Googlebot in your robots.txt file.
An example:
User-agent: Googlebot
Disallow: /*sort=
This would stop Googlebot from reading any URL that included the string “sort=” no matter where that string occurs in the URL.
So if you have a shopping cart, and use a variable called “sort” in some URLs, you can stop Googlebot from reading the sorted (but basically duplicate) content that your site produces for users.
Every search engine should support this. It would make real life a lot easier for folks with dynamic sites, and artificial life a lot easier for spiders.
So you could easely use "phpsessid" |
|
|
|
data:image/s3,"s3://crabby-images/74676/7467655c43f84619d5d7cf725b1d668453dba0fe" alt="" |
NoFantasy
data:image/s3,"s3://crabby-images/6ea31/6ea3138e9a23822aea960115951a6c1ae34639ea" alt=""
|
Posted:
Mon Dec 18, 2006 7:48 am |
|
|
|
data:image/s3,"s3://crabby-images/74676/7467655c43f84619d5d7cf725b1d668453dba0fe" alt="" |
hitwalker
data:image/s3,"s3://crabby-images/6ea31/6ea3138e9a23822aea960115951a6c1ae34639ea" alt=""
|
Posted:
Mon Dec 18, 2006 7:56 am |
|
nobody said this is easy but fact is,google maybe good but its world nr 1 lazy engine.
others like yahoo are much faster in updating.....
ive seen links of sites that were closed 6 months ago but still exist in google...
but thats what you get when you gamble on one stupid lazy horse... data:image/s3,"s3://crabby-images/1731b/1731bb0f4e1ec269250ad4da75b8903e4c337c34" alt="killing me" |
|
|
|
data:image/s3,"s3://crabby-images/74676/7467655c43f84619d5d7cf725b1d668453dba0fe" alt="" |
montego
data:image/s3,"s3://crabby-images/6ea31/6ea3138e9a23822aea960115951a6c1ae34639ea" alt=""
|
Posted:
Tue Dec 19, 2006 6:25 am |
|
Quote: |
When the times come, and this hopefully works, i really really hope someone (that means Montego ) implement this into a mod_rewrite package
|
Not quite sure what you are looking for? The problem statement being addressed in this thread is around an by your hosting company where they forced all the URL's to show the session var/id. You are the only one that I have heard of this happening too.
The rest of the thread is devoted to trying to get these "bogus" URL's removed from the search engine cache. Again, all due to this one issue.
I'd be glad to discuss specific on any enhancements that you might like to see in ShortLinks. Just add them to my Only registered users can see links on this board! Get registered or login! forum and we'll talk through them. |
|
|
|
data:image/s3,"s3://crabby-images/74676/7467655c43f84619d5d7cf725b1d668453dba0fe" alt="" |
NoFantasy
data:image/s3,"s3://crabby-images/6ea31/6ea3138e9a23822aea960115951a6c1ae34639ea" alt=""
|
Posted:
Tue Dec 19, 2006 8:24 am |
|
...yah, fully aware of i went off-topic somewhere up there, it should have been separate threads. Getting rid of duplicates is a general topic, all who implements any mod_rewrite package will suffer from it.
Anyways, i feel like continue this on your forum, is probably just as good, since all this actually matters the way we rewrite links and related to it. |
|
|
|
data:image/s3,"s3://crabby-images/74676/7467655c43f84619d5d7cf725b1d668453dba0fe" alt="" |
montego
data:image/s3,"s3://crabby-images/6ea31/6ea3138e9a23822aea960115951a6c1ae34639ea" alt=""
|
Posted:
Wed Dec 20, 2006 7:49 am |
|
Quote: |
Getting rid of duplicates is a general topic, all who implements any mod_rewrite package will suffer from it.
|
According to what you have posted even on my site, even without mod rewriting of URL's, you are proposing that there is still an issue with duplicate content.
I don't mind talking about the duplicate content issue here on Raven's site. You had mentioned a possible enhancement for ShortLinks and so that is why I suggested discussing that specifically over at my site. No problem either way. Only that discussing it here on Raven's site will get more traffic and more people to weigh in. |
|
|
|
data:image/s3,"s3://crabby-images/74676/7467655c43f84619d5d7cf725b1d668453dba0fe" alt="" |
NoFantasy
data:image/s3,"s3://crabby-images/6ea31/6ea3138e9a23822aea960115951a6c1ae34639ea" alt=""
|
Posted:
Wed Dec 20, 2006 8:22 pm |
|
montego wrote: | According to what you have posted even on my site, even without mod rewriting of URL's, you are proposing that there is still an issue with duplicate content. |
Yes, indeed...with or without rewritten links, it will create duplicates like a mad man, so it's not an issue that comes from ShortLinks, it's an issue in general. |
|
|
|
data:image/s3,"s3://crabby-images/74676/7467655c43f84619d5d7cf725b1d668453dba0fe" alt="" |
|