PDA

View Full Version : Detecting Bad Links


fivefeet8
08-28-07, 04:09 PM
Hello guys.

I need to make the server be able to detect a bad internet link on the webpage. The server is running Apache2 and another server with Apache1, both with PHP/Mysql installed. If the link is bad it redirects to another site. If the link is good, it redirects to the link. Any ideas how to go about this.

I'm open to suggestions from PHP, Javascript, or server side options.

Thanks.

supra
08-29-07, 12:56 AM
look in ur httpd.conf file and check out the ErrorDocument 404 setting.

i spose u could just put some javascript in ur 404 page to redirect to what ever link you want.

fivefeet8
08-29-07, 01:30 AM
look in ur httpd.conf file and check out the ErrorDocument 404 setting.

i spose u could just put some javascript in ur 404 page to redirect to what ever link you want.

Hmm.. I wasn't clear enough. The clickable links will be on a webpage on the servers, but the they link to other pages outside the server. I want to be able to check if a link on the webpage which points to another site works or not.

supra
08-29-07, 01:49 AM
Hmm.. I wasn't clear enough. The clickable links will be on a webpage on the servers, but the they link to other pages outside the server. I want to be able to check if a link on the webpage which points to another site works or not.

ohhhhh. . well i spose u could do something like this

$test = get_meta_tags('http://www.example.com/');


and test if $test is null or not. Null being the website is not working

fivefeet8
08-29-07, 11:01 AM
ohhhhh. . well i spose u could do something like this

$test = get_meta_tags('http://www.example.com/');


and test if $test is null or not. Null being the website is not working

Thanks for your help. I'm think I'm not being specific enough in my question though. PHP is server side, so that function will originate from the server it's running. There's a link that the server will never be able to reach, but the client can depending on where they are connected to the internet.

What I need is a way to detect non working links on the client side. Essentially, if the client can't connect to a certain webpage, they get directed to another. If they can, then they get sent to it.

To be more specific the scenario is this:

A clients connects to the web server, the web server probably needs to send some type of client side script that upon page load, checks if the client can connect to another website. If they can connect to that other website, they get sent there. If they can't then they stay at the original web server they connected to.

I know how to send them to other URLs with javascript, but I need a way to check if they can connect to certain URLs from where they are connected.

evilghost
08-29-07, 11:20 AM
You may run into some security restrictions on the DOM model but you could try loading your target site into a hidden IFRAME, detect if there's body in the IFRAME, and if not, then keep them on the page else parent.window.location.href redirect them.

fivefeet8
08-29-07, 01:10 PM
You may run into some security restrictions on the DOM model but you could try loading your target site into a hidden IFRAME, detect if there's body in the IFRAME, and if not, then keep them on the page else parent.window.location.href redirect them.

Thanks for the help, but the document loaded in the Iframe is from another webserver/domain. I get an access denied error when I try to count the number of body tags in the document.

Is there a way to detect 404 errors in an iframe?

fivefeet8
08-29-07, 04:56 PM
I've figured a way to do it, but I'm not sure if it will work in all instances. I've created a simple javascript that creates and image object. The source points to an image on the server I need to check for connection. Once that's done, I check if the image returned has a height larger than zero. Seems to work so far.

I also welcome other suggestions or a better approach.

evilghost
08-29-07, 06:48 PM
I've figured a way to do it, but I'm not sure if it will work in all instances. I've created a simple javascript that creates and image object. The source points to an image on the server I need to check for connection. Once that's done, I check if the image returned has a height larger than zero. Seems to work so far.

I also welcome other suggestions or a better approach.

That was exactly what I was going to suggest, you were running into the DOM security model which prevents one parent domain from accessing content outside of it's own domain or a child domain of the parent domain.

radekhulan
08-31-07, 11:05 AM
Create a simple table with URL, query time and its status. When query time is older than (e.g.) one week, try to get URL headers via http://us.php.net/manual/en/function.get-headers.php PHP get_headers() function, and see if it exists, ie. return "HTTP/1.1 200 OK" (or is 301 Moved Permanently - in this case your can auto change the link, 404 page not found, etc.). No need to query each link one million times a day via JavaScript, that would be a very silly approach...

evilghost
08-31-07, 11:26 AM
Create a simple table with URL, query time and its status. When query time is older than (e.g.) one week, try to get URL headers via http://us.php.net/manual/en/function.get-headers.php PHP get_headers() function, and see if it exists, ie. return "HTTP/1.1 200 OK" (or is 301 Moved Permanently - in this case your can auto change the link, 404 page not found, etc.). No need to query each link one million times a day via JavaScript, that would be a very silly approach...

He want's a client-side function to see if the client can access the page, not the webserver serving the content.

radekhulan
08-31-07, 11:52 AM
He want's a client-side function to see if the client can access the page, not the webserver serving the content.

Sure. And the best way to do it, if he has PHP/MySQL, is to check the page before it is outputed to the client, once in a week (per page / link) or so.. Even if it must be done via JavaScript (very silly, what about search engines and people with JS off?), the best would be to issue XMLHttpRequest, process URL on the server (cache results), and return back the result. The server could do its verification using get_headers() function.

evilghost
08-31-07, 11:56 AM
Sure. And the best way to do it, if he has PHP/MySQL, is to check the page before it is outputed to the client, once in a week (per page / link) or so.. Even if it must be done via JavaScript (very silly, what about search engines and people with JS off?), the best would be to issue XMLHttpRequest, process URL on the server (cache results), and return back the result.

Please carefully re-read his posts. And allow_url_fopen is a security concern.

There's a link that the server will never be able to reach, but the client can depending on where they are connected to the internet.

I need is a way to detect non working links on the client side.

radekhulan
08-31-07, 11:59 AM
Please carefully re-read his posts. And allow_url_fopen is a security concern.

Ok, that was in 3rd followup (in his first post he was asking for PHP solution, ie. server one). Well, he should better "fix" the server, so that it can access what clients can..

fivefeet8
09-01-07, 08:07 AM
Well, he should better "fix" the server, so that it can access what clients can..

The web server I'm talking about resides on a private intranet LAN behind a firewall. People who are connecting to the internet from this LAN can also reach this server, but anyone outside the LAN can't. I needed a way to detect if the client is on the LAN and send them to the local private web server instead of the public internet server. I did suggest to my IT admin that we could set up a local DNS which routes all request to our hosted internet web server to our local web server, but he doesn't want that.

Essentially, there will be 2 web servers. One public on the internet, the other private on the LAN. They both will host the same PHP/Javascripts, but the private server will have access to private secured DATA from a private Mysql server and public data on another mysql server. The public internet server will have access to public DATA but not the private data.

It was essentially a security issue in that we wanted a way to keep some data private and never accessible from the public internet.

radekhulan
09-01-07, 08:17 AM
Why not add that web server IP address to "trusted IPs"? Then it can access it as well and verify links via PHP..

evilghost
09-01-07, 09:43 AM
You're missing the point, he's intentionally segregating the databases and webservers due to security concerns, leave the man alone.

radekhulan
09-01-07, 02:14 PM
..

Nope, you're missing whole picture ;-) To verify links via JavaScript is the worst possible solution I can think of. Slow, needing too much bandtwith, checking same link (e.g.) thousand time a second, unreliable (JavaScript off), etc... Creating a secure link between web server and internal data servers is not a problem, and it allows him to build the verification in a way, that makes sense.

Building upon a wrong concept cannot bring anything good..

PS: unlike you and your silly personal invectives, I have very strong experience in webdesign: http://hulan.cz/portfolio/

evilghost
09-01-07, 03:28 PM
fivefeet8, sorry to allow this thread to be de-railed. I won't respond unless you have further questions as Radekhulan, our resident troll, will freely proliferate his gems of wisdom concerning your security infrastructure and how you should segregate critical data. You'll have to forgive his lack of experience in the enterprise, he's a Microsoft propaganda agent.