MrBishop
Well-Known Member
By Kevin Poulsen| Also by this reporter
02:00 AM Oct, 20, 2006
Six months ago, Wired News launched an investigation of MySpace with the goal of comparing the company's 120 million user profiles against public sex offender registries to see how many matches we could find.
The project began when Wired News contributor Jenn Shreve found a handful of matches based on a random search. How many would you find with a software script that systematically went through those records and compared them all?
We decided to find out. I wrote a series of Perl scripts and began sifting the data.
The technique was crude, like searching for a needle in a haystack. When I began checking ostensible matches by hand, false positives registered in the thousands.
Nevertheless, after several weeks of part-time work on the project, I was led to one suspect whose behavior was so disturbing I contacted New York's Suffolk County Police Department for comment. The suspect, Andrew Lubrano, was arrested earlier this month on attempted child endangerment charges.
Some 700 other matches were also confirmed, though none of those individuals could be linked by public MySpace posts to actual evidence of wrongdoing.
Today, Wired News is releasing the code used in this investigation (click here to download the gzip file). Anyone is free to take the software, look at it, validate (or invalidate) the methodology, discuss, tinker and improve the code.
We're releasing this code completely and utterly unsupported, under a BSD license. We'll happily link to open-source development efforts that pick it up for adoption, if notified.
Warning: These scripts were developed for a one-off project and all admittedly could use a thorough scrubbing.
It's also worth noting what this code is not.
It's a long one...and even includes the code