Google’s Official Profanity API

Recently a list of “bad words” were made available through Google’s new website which asks: “What do you love?” (wdyl.com).

So, perhaps I was a bit hasty writing off WhoisX.

The list of profanities was discovered in the Javascript on the website, however Google reacted quickly and switched this to a URL lookup instead of a Javascript one therefore keeping the list away from prying eyes.

However, what this does mean is now we have an API to play with:

http://www.wdyl.com/profanity?q=xxx

This will return a true JSON response:

{“response”: “true”}

While looking up something like “lol” will return a false response:

http://www.wdyl.com/profanity?q=lol

{“response”: “false”}

All in all a pretty simple and easy way to figure out whether Google thinks it is a bad word or not.

On the down side, you can’t easily use it on your site, as, if you try to call it by AJAX you’ll be faced by the following error:

XMLHttpRequest cannot load http://www.wdyl.com/profanity. Origin http://example.org is not allowed by Access-Control-Allow-Origin.

On the upside, we don’t want to do that anyway, we can to use it from PHP.

<?php

$q=isset($_REQUEST['q'])?urlencode($_REQUEST['q']):'';
readfile('http://www.wdyl.com/profanity?q='.$q);

?>

That’s right, that’s all there is to it. Simply copy that into the a file called “profanity.php” and you’re ready to go!

Now you can do a JSON call to your ‘profanity.php’ to tell you whether the word is bad or not.

If that’s not enough, you could try using the Google Profanity API to avoid displaying ads on pages that contain profanities.

Now if only there was a way to avoid the Scunthorp problem

Going back to the reported Adsense problem on WhoisX.co.uk, the example query was:

adult-hardcore-sex.cuntspace.me.uk

Passing this to the API returned false.

But, changing all the non-word characters to spaces has a different effect:

adult hardcore sex cuntspace me uk

{“response”: “true”}

This is done in PHP using a simple regular expression:

echo preg_replace(‘/[\W+]/’,’ ‘,’adult-hardcore-sex.cuntspace.me.uk’);

I’m not sure this exactly solves the Scunthorp problem, but it definitely does identify “sex” as an adult theme which is perfect.

Now to turn it into a usable function:

<?php

function is_profanity($q,$json=0) {
	$q=urlencode(preg_replace('/[\W+]/',' ',$q));
	$p=file_get_contents('http://www.wdyl.com/profanity?q='.$q);
	if ($json) { return $p; }
	$p=json_decode($p);
	return ($p->response=='true')?1:0;
}

$q=isset($_REQUEST['q'])?$_REQUEST['q']:'';
echo is_profanity($q);

?>

We can use this function to check whether the query is considered a profanity or not. If it is, it will return a 1, otherwise a 0 is returned.

This function can now be used to include kittens (or whatever) rather than Google Adsense Ads on a pages with “adult themed” content.

Brilliant!

Now to submit for reconsideration