The Wild List
The Wild List was invented in 1993 when computer viruses started to become a problem. Back then viruses were simple things and were relatively easy to contain.
The WildList is a compilation of sample viruses that have been submitted by security professionals from around the world. It is published each month to a select group of subscribers. Contributors can be any security professional, but the sample must be submitted by at least two respected sources before it will be included in the list.
As you might imagine, not everyone has the capacity to harvest and identify malware, so the majority of samples on the lists naturally come from anti-virus vendors. And it is undoubtedly a good thing that these vendors participate; they see far more new threats than anyone else.
In the industry, the timing of submissions to the WildList is an issue that causes heated discussion because many people believe that samples may be withheld from the list until the vendor has a solution in place. By submitting samples only after a solution has been prepared, a competitive advantage is created for the vendor.
My point, however, relates specifically to malware testing, and the broad impact of this delay on testing practices.
Because the samples are typically about a month old when published, the validity of conducting testing using the WildList as a basis of real-time or real-world scenario testing is flawed. The WildList is effectively a month out of date (by comparison with the real-world) and two or more of the participating vendors may already have fixes in place for the viruses listed.
An article by Trend Micro states that new threats are now emerging at the rate of one every 1.5 seconds, and as such, testing methodologies should be looking at change to keep up.
I’m not suggesting that the WIldList should be done away with. Many highly respected companies use it, and contribute to it in good ways, and it’s an effective industry tool, but I believe that it would be better used as a regression tool rather than a front line tool – for testing purposes.
Quality over Quantity
One method widely used for malware testing is to select a 50,000 sample repository and run it against the product under test. These results may give some really good marketing outcomes – think; “This product detected 49,995 out of 50k samples.” But, the real question that should be asked is, of those 50,000 samples how many are target specific?
If I am running a test on a Windows 7 64bit OS, are there samples in my list that are designed specifically to circumvent flaws in Windows 2000? If so, what benefit did that test hold in my scenario? One hundred samples that are known to target Windows 7 would give greater credibility to the results than 40,000 random samples.
The Wild
As the Trend Micro article (plus many others) shows, the rate of change for threats is always increasing. If the rate of threat increase is as bad as one every 1.5 seconds the industry needs to look at how they can protect the consumer in the smallest possible time. The consumer needs to know that the anti-malware vendors are looking at providing protection that is right now, rather than threats found a month ago.
Refreshingly, there has been a shift in emphasis from some vendors. They have started to look at the threats’ behaviour instead of the signatures. This is a great step forward because a Trojan (for example) will always be a Trojan and display certain characteristics as it tries to execute on the system - even if the vendor doesn’t have that particular sample on file.
(I accept that this does raise questions about some of the latest worms being able to change themselves to hide and avoid detection, but for this discussion I’m generalising about the majority of threats, not selected exceptions).
Malware testing
Regardless of the method used, any in malware test can only be considered a snapshot in time. A product only passes a specific test at a specific point in time. By the time the test report is generated, hundreds of new threats have found their way in to the wild.
The best way of truly gauging how a product copes in the wild is to keep it running. Continuous testing over a sustained period will give a much better indication of the product’s capabilities. No one product is going to come out on top every day. Different products have different strengths and these will depend on the threats that are targeting that particular machine at that particular time.
This is just one option from a host of possible methodologies. No single test can be the definitive for all scenarios, but I do feel that with the new breed of threats on the horizon we need to move away from using the WildList as the only testing benchmark.
What should the new benchmark be? Answers on a postcard please…