What are rootkits and why are they dangerous? You can read the detailed explanation at Wikipedia which I won’t reproduce, but the basic idea is that they alter the operating system (using either documented or undocumented methods) so that certain objects (processes, directories, files) become invisible. They are very dangerous because they breed new life into old malware. Imagine that you have a security product (Antivirus, Anti-Spyware, etc) which detects 1000 kind of malwares. Now use a rootkit to hide these malware instances and it will be as if you’ve had used new variants of the malwares, because the product won’t detect anything (this is not 100% true, because the product can detect the rootkit and try to work around it, and then it will again detect the malware with one update – the one that added detection for the rootkit – and not 1000, however probably that one update was harder and thus slower to create than a thousand others).
While rootkits are dangerous, the subject has pretty much been beaten to death. Even the Gartner Group has written about it, so just about everybody has heard about it. The problem (from the rootkits point of view) is that they rely on undocumented methods (in memory instruction patching for example) which can be generically detected or if they use documented methods, there are documented methods to detect the usage of those documented methods.
What I would like to introduce here is what I call
hidden code. This is not a new concept and there are several malwares out there which use similar techniques, I just want to raise awareness. I would define it as code which uses fully documented functions to execute and does not actively try to hide itself, however it is very difficult to detect its execution using current tools. One example would be the system threads from kernel mode (these are special threads which can only be created and execute in kernel mode), however this require special code to be written and thus does not present the same problem as the rootkits (does not double the number of existing malware, not even for the short term).
An other problem which I include in this category is a special mode of loading executables. Warning: the following content will be fairly technical. I’ll try to explain the concepts as I go, however probably for non-programmers this will still be unintelligible. Sorry. The main idea is: if you can load DLLs in the
different programs (in the address space of different programs), why can’t you load executables in the address space of other executables? They both use the same file format (PE – Portable Executable). The great thing (from the point of view malware writers) would be that Windows (and probably other operating systems too) use the concept of address space = one executable + many shared libraries. Not only do the included / third party solutions use this equation as basis of operation, but also the internal structures are based on this.
With a little creative coding (however using 100% documented APIs) I’ve managed to transform this into address space = several executables + many shared libraries. What I’ve done is to rewrite from scratch a little part of the Windows loader (the part of the OS which loads and executable and creates a process from it). The (simplified) loading process looks like the following:
- Create an address space
- Create the segments for the executable
- Load / map the segments from the file in memory
- Resolve the imports
- Launch the execution
Now lets suppose that we want to load an arbitrary executable in the address space of an other executable. We would like to do this because in the current design of the OS every important object (like file handles, sockets, etc) are associated with a process. So the trust relation is based on processes (for example a personal firewall makes rules based on processes, it can’t offer better granularity). This makes sense, since in the current architecture if you are inside the address space of a process, you can do almost anything to it. So if some code manages to execute in the address space of a trusted executable, you can kiss good bye your protection (this is what many firewall leak tests to – they inject code in a trusted executable and verify if the firewall detects this or lets the communication take place based on the trust in the application). This again is fairly old, and you can find several articles on the net about it.
As you know (or if you don’t, read the linked articles :-)), the most widely used method for achieving this is the
RemoteLoadLibrary technique (I won’t do into detail here about it, the linked articles do a good job of explaining it). What is common to them is that they require a DLL on the disk, so if your on-access AV product knows the file, it will block it before it gets a chance to load. What I’ve implemented is a direct memory-to-memory injection. That is let’s suppose that a given program contains in encrypted format the code of a malware. Now with this technique it can decrypt it in memory and without writing it to disk, inject it in the address space of an other process, so the on-access AV product never gets a chance to catch it.
This is done in several steps. The first is to find an other process. Something like Internet Explorer or Explorer are the best because they most probably are trusted by the personal firewall, and even if not users would probably let them communicate. One problem, you might say, is that you need SeDebugPrivilege to open the address space of an arbitrary process and write to it. This privilege is easy to get if your code runs in the context of an administrative user, but using less privileged user, it’s impossible to get. However observe the word arbitrary in the previous sentence. A little known fact is that if you launch your own process (with the CreateProcess API), you have full privileges on the child process (even under a Guest account). So the first step would be: launch an innocent looking process with CreateProcess (possibly with SW_HIDE, so that its windows are invisible).
The second step is to decrypt the file / write it in the target process. This can be done by examining the PE headers of the executable and allocating a large enough memory segment in the other process (with VirtualAllocEx) where all the sections fit (remember to mark it MEM_COMMIT to actually allocate memory for it and PAGE_EXECUTE_READWRITE to avoid having problems). One important detail is to write the part from the file which starts at offset 0 (MZ header) until the the end of the PE header (that is the PE header, optional header and the section headers) at the start of the allocated region. This is an important detail, because some API (like the ones that handle resources for example) traverse the header directly to find certain parts of the file.
Now the step that is very important: because you rarely get to load the executable at its preferred base address (given that there is already an executable in the given address space), you must apply the relocation data to the executable. If there is no relocation data (which you can tell by examining the header) and the preferred base address is already occupied by the loaded executable, you’re out of luck. Visual C (versions 6 and 7) don’t generate relocation info by default for executables, while other compilers (like Delphi) do. This may change in the future, as the Vista address space randomization comes into play (when it will be more important to have relocation data, since you can’t be sure anymore where you’ll be loaded) and I’ve heard that the new Visual Studio will turn on by default the relocation data generation for executables.
Finally you must resolve the imports . I’ve implemented it by writing a small relocable code and injecting it in the target thread, where it calls LoadLibrary and GetProcAddress to resolve the imports. When all this is done, you can launch the code with the CreateRemoteThread API. The code is written in Visual C++ 6.0 and working, however I won’t post it because I don’t want to provide help to the
bad guys, I only discuss concepts and ideas, don’t provide working code. If you are interested in the code for testing purposes (like testing different security products or testing other scenarios), contact me and I will send you the code if you’ll publish your results publicly and post a comment on this blog with the link to it.
Some possible countermeasures: This method can be detected generically in a way which is similar to the method which SVV (System Virginity Verifier by Joanna Rutkowska) uses for detecting hooks: one would examine each process and the allocated address spaces. From the list of address spaces one would eliminate the ones which can be accounted for by enumerating the modules from the process and eliminating the sections which are listed in a modules PE header (because they were created by the loader). Now in the remaining regions (which will be stacks, heaps and other stuff manually allocated with VirtualAlloc), one would search for executables by searching for MZ / PE headers. As I’ve described earlier, the
pseudo-loader must write this part in the target process if it wants to execute programs which contain resources for example. Now see if this is an actual mapped executable or just an executable file loaded in memory (by a resource editor for example). If it is, mark the process as suspected.
Some known limitations of the method are:
- It does not create a separate PEB (because it can’t) for the new process, so it will share the PEB with the current process, leading to the fact that both the command line and the environment variables will be shared.
- If the new thread call ExitProcess all the process will terminate (not only the injected code). This could be resolved by patching the import table while resolving imports such that the ExitProcess entry points to a functions which calls TerminateThread on the current thread.
- Probably mixing of console / non-console applications is not a good idea and I don’t know if it will work at all.
- TLS (thread local storages) and the associated initialization code is not processed.
- Files which contain data in their overlays (which is not loaded in memory) will not work because (a) no file is written to disk and (b) because functions like GetModuleFileName will probably return the file name of the originally loaded exe.
Advantages (from the malware writers point of view) – why it is important:
- Uses 100% documented techniques, so it is guaranteed to work on earlier and future versions of Windows (tested with WinXP and Vista RC1).
Gives new lifeto (some) existing code.
- Written entirely in user mode, so it is easier to understand and easier to modify.
- Works with limited users (users with no privileges).
- Currently not handled by many security products and certainly not by rootkit detectors (since this isn’t a rootkit in the classical sense of the word). Probably handled by some products which try to prevent injection in general.