.comment-link {margin-left:.6em;}

Confederacy of Dunces USA

Welcome to the confederacy of dunces usa. This blog is inspired by the effects of Hurricane Katrina on New Orleans and the Gulf Coast USA and named after the novel A Confederacy of Dunces by New Orleans native John Kennedy Toole. Certainly the disaster response efforts have been led by the dunces....

Friday, May 12, 2006

Data Mining

What the government wants to do with your phone records is called data mining. It is related to statistics and predictive modeling but not exactly the same. It is also often closely related to decision rules and association rules.

Big Brother arrived a long time ago, there are already entities that track your every move.

1. Credit Bureaus. Do you have a credit card, a mortgage, car loan, student loan? Have you ever had a judgement filed against you for not paying a hospital bill, traffic ticket? All of this is tracked by the credit bureaus and the information is available for sale to any of the companies you do business with.

2. How about the IRS? They have your income details, SSN, investment information, name of employer, name of dependents, all sorts of information about you.

3. Warranty cards, magazine subscriptions, charitable contributions, survey responses can all get you on a special interest list available for sale to anyone.

4. Your phone company knows what numbers you have called, when, and in case of cell phones, where. This information is routinely used in criminal investigations

5. Ever searched on the internet? Google/Yahoo or whatever search engine you used has you computer IP address associated with your search terms and may have provided it to the government for a pornography study.

6. East coast residents dont forget about EZ Pass. This information can be easily analysed to determine how often you violate the speed limit and how often you are late to work.

I am sure there are others.

What is the difference between #1 (credit bureaus), #2 (IRS), #3 (vertical lists), and the rest of the cases? The first 3 have clear, enforced, laws and regulations that guide the usage of this data. The credit bureau regulations are enforced by the FDIC and guided by the Fair Credit Act and Sarbones-Oxley. The IRS will share your information with state and local authorities under some circumstances but it is highly regulated. There are various regulations that cover vertical lists and direct mail solicitations.

Now how about #4,5,6? I have to admit I am not that familiar with telco regulations so cannot speak to them. But Quest is familar and they seem to think that the governments request was illegal. It is clear that there is an expectation of privacy except when information is requested by court order. #5 and 6 are uncharted territory where the laws have not kept up with the available technology.

Can the government analyze your phone records and really catch terrorists? Probably not. In order to do so they will need to set up decision rules and compare terrorist phone records to patterns of the overall population. This is very, very difficult and probably a futile effort.

I may not know telco too well but I do know data mining. Very well. Here are some non-useful decision/association rules generated from a hypothetical phone records:

1. If receiving # = Dominos Pizza, mother of caller, sister of caller, best friend of caller, probability of receiving call is significantly greater than random phone #.

2. If sending # is located in Podunk County and # of out of state calls is greater than average for Podunk County and residents at sending # have no known relatives or business interests outside of Podunk County then probability that sending # is a terrorist is STILL no greater than average.

Here are some useful decision rules:
1. If receiving # is a known terrorist, probability that sending # is related to a terrorist sympathizer is greater than random.

2. If receiving # is located in Terrorist Junction, Saudi Arabia, probability that sending # is a terrorist sympathizer is greater than random.

You can see how much information would need to be appended to a list of phone numbers in order to construct the decision rules above. You can also see that in fact in order to get a useful decision rule for detecting terrorist no information is needed from non-terrorists. All you need is phone records for actual terrorists, an overlay of relevant terrorist tracking information, and construction of decision rules based upon # of terrorist calls and distinct terrorists matching a particular pattern.

Trying to compare patterns of terrorist vs non-terrorist is just going to give you a bunch of completely useless information. There may be some usefullness to extracting phone numbers of calls that meet one of the useful decision rule patterns and turning over that information to the government.

Data mining has 2 steps, build and score. You dont need everyones phone calls to do the build step and it is an invasion of privacy to do this. You do need it to do the score step. I think that most Americans would be comfortable with knowing that phone numbers that engage in suspicious activities would be investigated, and that their number would be turned over if certain triggers are hit. But maybe not. That is why we need to have specific guidelines laid out just like we do for the credit bureaus, so that everyone will know what they are signing up for when they pick up the phone. Will this let the terrorists know what we are doing? Not really. The actual decision rules remain proprietary, but the general concepts are known. Putting it out on the table that we are watching will also restrict terrorist ability to communicate for fear that our decision rules will trap them, providing a good deterrant as a bonus.

We need a National Privacy Act to cover internet and telephone communications given current technology advances, similar to the Fair Credit Act, in order to protect our rights and allow our data miners to do their job without violating the law.

1 Comments:

At 5/22/2006 10:01 PM, Liz said...

By the way, you may also hear people talking about social network analysis. Yes this can also be done but is highly unlikely to be fruitful except in tracking down associates of known terrorists. It's not going to detect a terrorist you didn't already know about, without ties to known terrorists in your database. You dont need general phone records to complete this. Any general network analysis is a total waste of time.

 

Post a Comment

Links to this post:

Create a Link

<< Home