Innnards: Fixing the Famousness Problem

As the author of the WWW FAQ, I regularly answer questions about the workings of the Web. If a question is frequently asked, I simply add an article to the FAQ. But sometimes a question is more detailed, more in-depth— not really a FAQ, but still of interest to others. You'll find those questions, with my answers, here in Innards along with commentary on other web-technology-related topics.

2003-12-07

As I mentioned recently, there are only a few things a webmaster can do to improve the popularity of a site without spending money. One of them is to provide more and better content, of course. Improvements in design and navigation are another. A third approach is to actively encourage users to link to it or otherwise inform other users about it, rather than passively hoping that will happen.

Today I added a "tell a friend about this" link to my site. You can see it on the suncats page, or nearly any other page of the site. Setting this up would have been trivial if only certain people weren't such bastards.

Unfortunately, those people are out there, looking for anything that can be exploited as a conduit for spam. I designed the "tell a friend" feature accordingly.

Users provide their full name, their email address, the name and email address of the friend in question, and a one-line personal message. Then the validation starts: not just rejecting obvious security risks, not just rejecting commas in email addresses to prevent lists of addresses from being entered, but also rejecting things like these in full names and in the personal message:

www.foo.com (no periods following two or more word characters)
1 800 BUY JUNK (no consecutive digits, at all)
1-8-0-0-B-U-Y-J-U-N-K (clever punctuation doesn't slide through either)

... And a few other pieces of tough love.

With these rules in place, it's difficult to send a really effective spam through the "tell a friend" feature. Unfortunately it's not impossible. I have to allow domain names in the From: email address, obviously. So spammers could send their messages "from" sexypants@naughtywebness.com to attract attention. But failing to allow a return address would just make messages from "tell a friend" look suspicious, and probably be regarded as spam.

So I added the same "type in the four-letter secret code you see to the right" feature that many sites are relying on now for anything that absolutely, positively must be usable by humans only. Surprisingly it wasn't that much of a pain this time, probably because I've implemented it twice already and figured out the easy way.

Generate two random codes. One is the four-letter code that the user will actually see and type in; keep it short to avoid ticking off users. The other is of eight letters. Pass the eight-letter code as a hidden form field; display an image containing the four-letter code, drawing at least one somewhat random line across the image to confuse OCR software.

Create a file using the eight-letter code as the filename, and write the four-letter code there. When the user submits the form, first make damn sure the hidden form field really contains eight letters and not something dangerous, then check for a file by that name and compare the contents to the code the user has typed in. When you get a match, delete the file. Of course many codes will be generated and never used, so set up a scheduled task to delete any such files that are more than an hour old, and the images too if you're not generating those on the fly. And there you are. Of course, you can use the gd library to create the image. Here's the snippet of Perl that outputs the image:

use GD;
my $im = new GD::Image(100, 30, 0);
my $white = $im->colorAllocate(255, 255, 255);
my $black = $im->colorAllocate(0, 0, 0);
$im->stringFT(-$black,
"/usr/share/fonts/truetype/somefont.ttf",
20, 0, 0, 20, $code);
my $lines = 1;
my $i;
for ($i = 0; ($i < $lines); $i++) {
# Endpoints produce a line crossing
# most of the image, almost horizontally
my $x1 = rand(20);
my $y1 = rand(8);
my $x2 = 80 + rand(20);
my $y2 = rand(8) + 22;
$im->line($x1, $y1, $x2, $y2, $black);
}
my $im2 = new GD::Image(60, 20, 1);
$im2->copyResampled($im, 0, 0, 0, 0, 60, 20, 100, 30);
print "Content-type: image/png\r\n\r\n";
print $im2->png;

You may notice that I specified -$black rather than $black for the color of the text. This tells imageStringFT not to antialias the text. I avoid that because I know that the line I will draw across the image will not be antialiased, and I don't want to make it easy for software (as opposed to people) to tell the letters and the line apart. Also, I know I'm going to draw the image at 2x the final size, then use copyResampled to scale it down for a result that is both more attractive and a bit more of a pain to analyze maliciously.

The result is not perfect; there are definitely ways to attack this, if an unfriendly programmer is willing to spend time picking pixels apart. But thwarting the abuse of things like this is all about achieving a cost-benefit ratio that causes a potential attacker to say "no thanks!" and move on. Such an attacker knows their hack wouldn't work for long before I blocked their IP addresses... which takes a lot less effort than developing the attack in the first place. And even if they succeed, what do they get? Lame spams with a site name in the "from" address and less than 64 characters to hype themselves in. Not worth the trouble.

Still, I plan to start rotating each letter a small, random amount off the baseline-- just to be completely obnoxious. Which is one of my strong suits, of course.