PHP Web Host - Quality Web Hosting For All PHP Applications $35/month $250/year (Unlimited) - $25/month - 200,000 impressions - Your Ad Could be Here - Click For Details
  Login or Register
 • Home • Downloads • Your Account • Forums • 

View next topic
View previous topic


Google
 
Web RavenPHPScripts (This Site)
Post new topic   Reply to topic
Author Message
NoFantasy
Worker
Worker


Joined: Apr 26, 2005
Posts: 114

PostPosted: Thu Dec 14, 2006 8:11 am Reply with quote Back to top

Some time back my web server provider messed up my site, now i'm stuck with loads of links looking like
Code:
backend.php?PHPSESSID=c6ee420bed2deb5b047ac722e46a440f
forums.html?PHPSESSID=c6ee420bed2deb5b047ac722e46a440f

in the google search...and probably the other engines aswell.
Using Redirect 301 this should take care of this, but i don't know how to remove the PHPSESSID from ALL cached links in my site in one.

A "normal" way to implement a 301 is with
Code:
RedirectMatch 301 /someoldlink-([0-9]*).html http://www.domain.com/goodlink-$1.html

...right? Ok, now this won't work when a ? is involved in the link, so i figured i have to use a format similar to
Code:
RewriteCond %{QUERY_STRING} (name=Forums&file=index) [NC]
RewriteRule ^.*$ /forums.html [R=301,L]

...but how should the format be for this to work properly for any cached link containing PHPSESSID=(number) without messing up anything else?
View user's profile Send private message
montego
Site Admin


Joined: Aug 29, 2004
Posts: 7487
Location: Arizona

PostPosted: Fri Dec 15, 2006 6:59 am Reply with quote Back to top

Maybe try something like this:

RewriteRule ^([a-zA-Z0-9._- ]*)?PHPSESSID=([a-z0-9]*)$ $1 [R=301,L]

It looked like you are using some form of URL rewriting already (such as: GoogleTap, GTNG, ShortLInks). You may need to add more characters to the first () pair. You may have to review your cached links to see if all of them will be covered by this.
View user's profile Send private message Visit poster's website
evaders99
Moderator


Joined: Apr 30, 2004
Posts: 2853

PostPosted: Fri Dec 15, 2006 9:05 am Reply with quote Back to top

I don't think RewriteRule works with parameters. I think you're going to have to use RewriteCond on QUERY_STRING - that has never really worked for me, so I use THE_REQUEST
View user's profile Send private message Visit poster's website
persona_non_grata



Joined:
Posts: 0

PostPosted: Fri Dec 15, 2006 9:11 am Reply with quote Back to top

well let me be the messenger of bad news.....
as i (together with a friend) build a new site that comes with new links as well google is now replacing the old with the new...
that took 5 months.....
yeah google is a lazy engine..
Laughing
View user's profile Send private message
NoFantasy
Worker
Worker


Joined: Apr 26, 2005
Posts: 114

PostPosted: Fri Dec 15, 2006 10:06 am Reply with quote Back to top

Yes, im using the ShortLinks-mod...which of course does the job very well Razz
Initial problem with the phpsessid was this post:
Only registered users can see links on this board!
Get registered or login to the forums!

...now, i solved that problem, however google did pick up quite a few of those links.

I'll go google for THE_REQUEST, see if any already have a solution, thanks for suggesting.

Oh, lol...5 months..? I have three years old links in google from before i started out with phpnuke, google still belive pages are are around Razz
View user's profile Send private message
persona_non_grata



Joined:
Posts: 0

PostPosted: Fri Dec 15, 2006 10:12 am Reply with quote Back to top

yeah google is terrible...
View user's profile Send private message
montego
Site Admin


Joined: Aug 29, 2004
Posts: 7487
Location: Arizona

PostPosted: Sat Dec 16, 2006 6:18 am Reply with quote Back to top

Well, Evaders, I am going to have to research that because I could have sworn it worked for me. I even tested it locally and reviewed the Apache access logs to see what error codes were returned, etc. But, one thing I did not do, was actually see if I could still find the "offending links" in Google's cache. Embarassed
View user's profile Send private message Visit poster's website
NoFantasy
Worker
Worker


Joined: Apr 26, 2005
Posts: 114

PostPosted: Sun Dec 17, 2006 9:07 am Reply with quote Back to top

Very Happy Very Happy Very Happy Very Happy
Code:
RewriteCond %{QUERY_STRING} ^phpsessid=.*$ [NC]
RewriteRule .* %{REQUEST_URI}? [R=301,L]

Don't ask me what it actually does, but it does remove the crap from inbound google links! Let's hope i didn't break something else Laughing

Now i'm eager to see if they are gone in the google cache in a month or two!

Btw, how will this addy in robots.txt work?
Code:
Disallow: /*phpsessid

I was thinking, it can't hurt having the block in robots.txt, right..? Or will it work the other way, and refuse to even go to my redirect because of the block in robots.txt?
View user's profile Send private message
montego
Site Admin


Joined: Aug 29, 2004
Posts: 7487
Location: Arizona

PostPosted: Mon Dec 18, 2006 5:46 am Reply with quote Back to top

Quote:

Now i'm eager to see if they are gone in the google cache in a month or two!


Me too.

Regarding robots.txt, I don't know enough about it to know if it can block query strings...
View user's profile Send private message Visit poster's website
persona_non_grata



Joined:
Posts: 0

PostPosted: Mon Dec 18, 2006 6:35 am Reply with quote Back to top

How about ...

// See if the user agent is Googlebot
$isGoogle = stripos($_SERVER['HTTP_USER_AGENT'], 'Googlebot');
// If it is, use ini_set to only allow cookies for the session variable
if ($isGoogle !== false) {
ini_set('session.use_only_cookies', '1');
}
View user's profile Send private message
persona_non_grata



Joined:
Posts: 0

PostPosted: Mon Dec 18, 2006 6:46 am Reply with quote Back to top

And...

Google’s Hidden Protocol
Google’s URL removal page contains a little bit of handy information that’s not found on their webmaster info pages where it should be.

Google supports the use of “wildcards” in robots.txt files.
This isn’t part of the original 1994 robots.txt protocol, and as far as I know, is not supported by other search engines.
To make it work, you need to add a separate section for Googlebot in your robots.txt file.
An example:

User-agent: Googlebot
Disallow: /*sort=

This would stop Googlebot from reading any URL that included the string “sort=” no matter where that string occurs in the URL.
So if you have a shopping cart, and use a variable called “sort” in some URLs, you can stop Googlebot from reading the sorted (but basically duplicate) content that your site produces for users.
Every search engine should support this. It would make real life a lot easier for folks with dynamic sites, and artificial life a lot easier for spiders.
So you could easely use "phpsessid"
View user's profile Send private message
NoFantasy
Worker
Worker


Joined: Apr 26, 2005
Posts: 114

PostPosted: Mon Dec 18, 2006 7:48 am Reply with quote Back to top

Hm, thanks persona_non_grata, good information.
I did a bit of a research based on this and found these:
Only registered users can see links on this board!
Get registered or login to the forums!
Only registered users can see links on this board!
Get registered or login to the forums!
Only registered users can see links on this board!
Get registered or login to the forums!
Only registered users can see links on this board!
Get registered or login to the forums!


Basically it says that wildcards ARE supported (and others) by at least the three bigger engines as Google, Yahoo and MSN. Guess your wish just came trough (happy x-mas, lol)

...shopping cart..? Yeah, and what about reviews, web links and calendar modules? Lol, they suck...now we know how to actually stop them from indexing 12.000 pages when they should only do 100.

Worst part seems to get rid of the duplicates already indexed showing up as supplemental results.
When the times come, and this hopefully works, i really really hope someone (that means Montego Razz ) implement this into a mod_rewrite package...and i'm more than willing to help out as best as i can even if my knowledge in php and programming is rather limited.
View user's profile Send private message
persona_non_grata



Joined:
Posts: 0

PostPosted: Mon Dec 18, 2006 7:56 am Reply with quote Back to top

nobody said this is easy but fact is,google maybe good but its world nr 1 lazy engine.
others like yahoo are much faster in updating.....
ive seen links of sites that were closed 6 months ago but still exist in google...
but thats what you get when you gamble on one stupid lazy horse... killing me
View user's profile Send private message
montego
Site Admin


Joined: Aug 29, 2004
Posts: 7487
Location: Arizona

PostPosted: Tue Dec 19, 2006 6:25 am Reply with quote Back to top

Quote:

When the times come, and this hopefully works, i really really hope someone (that means Montego ) implement this into a mod_rewrite package


Not quite sure what you are looking for? The problem statement being addressed in this thread is around an Embarassed by your hosting company where they forced all the URL's to show the session var/id. You are the only one that I have heard of this happening too.

The rest of the thread is devoted to trying to get these "bogus" URL's removed from the search engine cache. Again, all due to this one issue.

I'd be glad to discuss specific on any enhancements that you might like to see in ShortLinks. Just add them to my
Only registered users can see links on this board!
Get registered or login to the forums!
forum and we'll talk through them.
View user's profile Send private message Visit poster's website
NoFantasy
Worker
Worker


Joined: Apr 26, 2005
Posts: 114

PostPosted: Tue Dec 19, 2006 8:24 am Reply with quote Back to top

...yah, fully aware of i went off-topic somewhere up there, it should have been separate threads. Getting rid of duplicates is a general topic, all who implements any mod_rewrite package will suffer from it.

Anyways, i feel like continue this on your forum, is probably just as good, since all this actually matters the way we rewrite links and related to it.
View user's profile Send private message
montego
Site Admin


Joined: Aug 29, 2004
Posts: 7487
Location: Arizona

PostPosted: Wed Dec 20, 2006 7:49 am Reply with quote Back to top

Quote:

Getting rid of duplicates is a general topic, all who implements any mod_rewrite package will suffer from it.


According to what you have posted even on my site, even without mod rewriting of URL's, you are proposing that there is still an issue with duplicate content.

I don't mind talking about the duplicate content issue here on Raven's site. You had mentioned a possible enhancement for ShortLinks and so that is why I suggested discussing that specifically over at my site. No problem either way. Only that discussing it here on Raven's site will get more traffic and more people to weigh in.
View user's profile Send private message Visit poster's website
NoFantasy
Worker
Worker


Joined: Apr 26, 2005
Posts: 114

PostPosted: Wed Dec 20, 2006 8:22 pm Reply with quote Back to top

montego wrote:
According to what you have posted even on my site, even without mod rewriting of URL's, you are proposing that there is still an issue with duplicate content.

Yes, indeed...with or without rewritten links, it will create duplicates like a mad man, so it's not an issue that comes from ShortLinks, it's an issue in general.
View user's profile Send private message
Display posts from previous:       
Post new topic   Reply to topic

View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Forums ©
 

All logos and trademarks in this site are property of their respective owner.
The comments are property of their posters, all the rest © 2002-2008 by Raven
Proud to be listed at Lobo Links Web Directory

You can syndicate our news using the file xml

CSE HTML Validator Helped Clean up This Page! [Valid RSS] valid RSS 2.0 Valid robots.txt Stop Spam Harvesters, Join Project Honey Pot

Website engines core code is © copyright by PHP-Nuke but has been heavily patched and modified by myself and others.
PHP-Nuke is a free software released under the GNU/GPL.


:: fisubice phpbb2 style by Daz :: PHP-Nuke theme by www.nukemods.com ::

:: fisubice Theme Recoded To 100% W3C CSS & HTML 4.01 Transitional Compliance by Raven and 64bitguy ::

:: W3C CSS Compliance Validation :: W3C HTML 4.01 Transitional Compliance Validation ::

zerosum