After listening to an other great CyberSpeak podcast, I decided to line up the pros and cons of executable packing for programmers.
First of all, what is executable packing? In short it is similar to self-extracting archives, where as a result of the process an executable is generated which contains some unpacking code and the original code in a compressed form. A free (as in speech) one is UPX, which you can check out over at sourceforge and play around with it.
Why would you use such a thing? First of all it makes your program smaller without any noticeable effect for the end user (that is s/he doesn’t have to extract the executable from an archive before running it, s/he can just double click on it and run as if nothing has changed). Second of all it protects your code from the very beginner reverse engineers (to see all the options available to a more experienced one, just listen to the podcast mentioned above). A third argument used is that the load time of your executable is smaller (it loads faster) because a smaller amount of data must be transferred from disk. This may or may not be true depending on the usage pattern, but as I describe later it definitely has an effect on the memory usage (in the negative way).
Now why shouldn’t you pack your executable? First of all it will look suspicious. One of the best task-manager replacements out there even marks it with a different color. Some AV products will flag your executable as suspicious if you use certain packers to compress it (like Morphine) because these packers are known to primarily be used in malware.
Now for the biggest reason to avoid packing programs and libraries (DLLs): it consumes more memory. Under Windows (and probably under Linux too), portions of the memory can be marked as being a
mapping of a portion of a given file. When such a mapping is created, just the portions of the file which are actually accessed are loaded in memory. The rest is just marked by the memory manager as being available when needed from the file. What this means that during the loading of a normal, non-packed executable, it is mapped to memory, but only the parts are actually loaded from the disk which contain code that needs executing. Now in the case of a packed file the whole file needs to be loaded in memory, because the unpacker needs to go through every bit of packed data to unpack it before it can run the original code. Even worse, when the system is running low on memory, if you have an upacked executable, it can just throw out the memory pages it occupies (if the process is idling), because it knows that it has the exact same data on the disk. In the case of a packed executable it first has to write out the memory pages to the swap file, because they are different from the contents of the file on the disk. Now for the last point (but the most important one in the case of DLLs): the Windows memory manager is able to share memory pages across processes if their content hasn’t changed since their loading. Specifically in the case of DLLs, you will have the same code loaded in each process which uses them. So the memory region in which the DLL is loaded is shared between all the processes using it (there are some exception, but for sake if simplicity we’ll ignore those). How does Windows define unchanged memory pages? Those pages which map to the same area of the same file and hasn’t been written to. This means that as soon as you have a packed DLL, it will have a separate copy for each process using it, because what is in memory doesn’t mach what is on disk and thus Windows can’t share the given memory pages between processes. This means that each process that uses your DLL will have a separate copy of it. This effectively multiplies the memory needed by your DLL by the number of processes which use it! Oops…
Update: fixed typo.