Even though parasitic malware accounts only for a small part of all malware these days, it seems that file infector viruses are making a comeback. File infectors modify existing files by injecting code into them.
When an infected file is started, the virus code takes control and may infect other files. As a rule, control is passed back to the original host program after the virus has done its dirty work, so the user doesn’t notice anything wrong. Sophisticated techniques such as encryption, obfuscation, polymorphism and stealth capabilities are often used by this kind of malware to make detection harder and to hide its presence.
While detection isn’t necessarily more difficult compared to other types of malware, removal is usually a less trivial task (which may be the motivation for malware authors). In many cases, files cannot simply be deleted as this would affect the stability or even basic functionality of the operating system and other software.
Instead, the infected host program must be disinfected by removing the virus code from it and by carefully restoring the original contents and file structure if possible. The threat posed by this type of infection seems greatly underestimated nowadays, as the frequency of trojan infections is much higher. However, one must keep in mind that an infection with a static trojan binary is usually limited to one or very few systems in a networked environment. For file infectors, which nowadays often come with worm-like spreading routines, this is not the case. A full network-wide infection, including network shares and sometimes operation-critical software, can prove to be a much bigger issue to deal with than a single trojanized workstation.
This means detection and removal are still an issue for antivirus software. As an example, this blog entry discusses the removal of the Almanahe virus that appeared in 2007 in different variants. Almanahe is a polymorphic virus that infects Windows executable (PE) files on the local system and spreads via network shares. It has also rootkit capabilities to hide its presence on the infected system. The variant covered here is detected by AntiVir as W32/Alman.BB.
When infecting an executable file, the virus performs the following modifications to the host file:
It overwrites parts of the original code section (about 1400 bytes) and redirects the entry point to the start of the injected virus code. The original code, which has been overwritten, is compressed using a run-length encoding algorithm (RLE) and is appended to the last section, along with the dropped component, which is also compressed (roughly 36 kB in size). It modifies the PE header to reflect the changes made to the file. Since most of the virus code is encrypted, it also sets the writable flag on the code section, so the virus can decrypt itself when it is started. To prevent multiple infection of the same file, the virus inserts an infection marker into the MZ header.

Layout of infected PE file
In order to disinfect a file infected by this virus, the following steps must be performed:
First, the original code, which has been appended to the last section, must be located and decompressed. Then, the original code can be restored by overwriting the virus code in the code section. The entry point has to be redirected to its original location. The data appended to the file is cut from the file and the original size of the last section is restored. Last but not least, the header values need to be adjusted and the infection marker is removed.
Doing this is not as trivial as it may sound at first, because the data is encrypted/compressed and the offsets and sizes are different for each file. So let’s have a closer look on how disinfection works:
The virus entry point code starts with about 200 bytes of randomly generated junk instructions in order to prevent detection by a simple signature. At the end of this non-encrypted block, there is a simple decryption loop that decrypts the remainder of the virus code injected into the code section upon execution of the file. The decryption scheme is a simple SUB, ADD, or XOR operation with a single byte key. So as a first step, we need to decrypt this code.
The decrypted code contains the decompression routine for the data appended to the last section. In the next step, we locate the code where the decompression routine is called:

Call(s) of decompression routine
As we can see, the decompression routine is called twice. With the first call, only the first part of the data appended to the last section is extracted, which is the original code from the code section. In the second call, the dropped file is extracted (which is static). What we need to do is to locate the first call and extract the parameters for the decompression routine. That is the relative start offset (0xFA00h) and the decoding length in bits (0x27AAh).
As already mentioned before, the (de)compressor is some kind of run-length encoder (RLE). If there are recurring byte sequences, only the position and the length of the (previous) byte sequence is stored in the encoded data. When decoding the data, a single bit signals whether the next byte is directly extracted from the encoded data or if an already extracted byte sequence must be copied to the current position in the destination buffer. In the latter case, only the position and the length are encoded in the input stream. The following picture illustrates the decoding mechanism:

Decoding mechanism (output stream)
Now that we have located the offset and size of the compressed original code, we can restore it by decompressing the first part of the data appended to the last section. We write it to its original location in the code section, and thus overwrite the virus code.
We still need to restore the original entry point RVA, which we can also extract from the decrypted virus code:

RVA of original entry point
In the next step, we truncate the file to its original size by cutting off the appended data and adjusting the size of the last section. We also adjust the header values to the appropriate values where possible. Unfortunately, not all header values can be restored to the original values since some information is irretrievably lost. Therefore it is not guaranteed that all repaired executables will run again, although most will. This is particularly the case if the integrity of the binary is checked using a checksum or digital signatures. Finally, the infection marker in the MZ header is removed.
After performing these steps, all parts of the virus code are removed and the original contents of the file are restored except for some of the mentioned header values. Of course, the other modifications to the system caused by the virus, like dropped files or registry entries, must also be revoked. This, however, is beyond the scope of this blog entry.
Although antivirus software today is fairly sophisticated, it should be mentioned that it is not always possible to completely restore a system to its pre-infection state. In general, it’s always recommended to reinstall the system from scratch after a virus infection has been discovered.
Markus Hinderhofer
Engine Core R&D