An interesting paper was bought to my attention recently by this blog post: The Base Rate Fallacy and its implications for the difficulty of Intrusion Detection. The central question of this paper is: if we have a flow of N packets per day and our network IDS has a false-positive rate of X, what is the probability that we are experiencing a real attack, given that the IDS says that we are? The paper uses Bayes’ theorem (of which you can find a nice explanation here) to put some numbers in and to get horrifying results (many false alerts), and to conclude that such a rate of FPs seriously undermines the credibility of the system.
The issue of false positives is also a concern in the anti-malware industry. And while I rant quite a bit about the AV industry, you have to give this one to them: the number of false positives is really low. For example, in the AV-Comparatives test 20 false positives is considered many, even though the collection is over 1 500 000 samples (so the acceptable FP rate is below 0.0015%!). Update: David Harley was kind enough to correct me, because I was comparing apples (the number of malware samples) to oranges (the number of clean files falsely detected). So here is an updated calculation: the Bit9 Global File Registry has more than 6 billion files indexed (they index clean files). Consider whatever percent from that which is used by AV-Comparatives for FP testing (as David correctly pointed out, the cleanset size of AV-Comparatives is not public information – although I would be surprised if it was less than 1 TB). Some back-of-the-napkin calculations: lets say that AV-Comparatives has only one tenth of one percent of the 6 billion files, which would result in 600 000 files. Even so, 20 files out of 600 000 is just 0.003%.
Now there were (and will be) a couple of big f***-ups by different companies (like detecting files from Windows), but still, consumers have a very good reason to trust them. Compare this with more “chatty” solutions like software firewalls or – why not – the UAC. Any good security solution needs to have at least this level of FPs and much better detection. AV companies with low FP rates – we salute you!
PS. There might be an argument to be made that different false-positives should be weighted differently (for example depending on the popularity of the file) to emphasize the big problems (when out-of-control heuristics start detecting Windows components for example). That is a valid argument which can be analyzed, but the fact remains that FP rates of AV solutions, is very low!
Picture taken from wadem’s photostream with permission.
One response to “The importance of false positives”
this was the best explanation i encountered so far: http://yudkowsky.net/rational/bayes