Via the ESET blog: the guidelines for testing Anti-Malware products were published by AMTSO (the Anti Malware Testing Standards Organization). Go and read them if you are so inclined (each of them consists of only 5 pages – you have to give them props for brevity – although maybe they just wanted to avoid being to specific so there is less of a chance of being wrong ;-)).
In general I feel that testers (no offense) don’t have the necessary technical skills to evaluate in an objective manner the relevance of their tests. Sorry, but someone who never put their hand on an ASM level debugger (like Olly), a disassembler (IDA), who never participated in a crackme contest (with at least some success), who never analyzed shell code, who never unpacked a malware – just doesn’t cut it.
Also, I find some conflicting statements in the two papers. First they sidestepped the question of what constitutes “creating new malware” (this is interesting in the context of the Consumer Reports situation – BTW, my personal opinion on the matter is that CR was justified in creating variants).
Second, they say that “test results should be statistically valid”. First of all, the expression is “statistically relevant”. Statistics (as I found out) is not a black and white game. Usually, limit criteria are selected somewhat arbitrarily (using “well accepted” values is common – however they are more a psychological factor than a mathematical one). Example: what is an acceptable error margin? 5%? 10%? 50%? There is no magic formula which can respond to that, it is largely determined by how you feel about risk.
Now this principle goes against the dynamic testing where it acknowledges that (given the complexity of the situation) only as little as 50 (!) samples might be tested for a particular test. Given that each month more 100 000 new (undetected) samples appear (and this is a conservative number), this sample set is utterly insignificant.
One response to “Anti Malware Testing Guidelines”
Well, there’s nothing wrong with not wanting to be wrong. 😉 There was certainly an element in the review process of not wanting to be too specific or prescriptive: the idea is to improve the general understanding and standard of testing rather than to homogenize it.
I share your concerns about testers who don’t have analytical skills (and other relevant skills), and have been trying to raise interest in some form of realistic certification for testers for some time (Andrew Lee and I presented at Virus Bulletin on that topic this year.)
There is further documentation in preparation about malware creation for testing purposes. That’s a complex issue, and I think the industry hasn’t done itself favours by concentrating on the ethical and safety issues rather than highlighting the technical problems.
Statistical relevance is certainly an issue, but so is statistical validity. There are many instances where incorrect conclusions have been drawn from the data. I’m hoping someone will pick up the gauntlet on documenting stats and detection testing, sooner or later.
Unfortunately, dynamic testing is always likely to attract smaller sample sets, because of the complex and resource-intensive nature of the methodology. It was probably not a good idea to give the impression that a 50 sample test set is generally sufficient, but what constitutes a valid test set is very context-specific. If you were testing Windows CE virus detection for example, that would be too large a sample set… (Yes, that’s an extreme example!) By the way, your figure for monthly samples is way, way too low, though exact numbers depend on how you measure.
Anyway, you raise some very interesting points, and I’ll probably look at them in more detail on the ESET blog in the near future.