Hacking with dex-oracle for Android Malware Deobfuscation

About a month or two ago, someone asked me to analyze some obfuscated Android malware. Recently, I finally had a chance to take a look. I ended up using dex-oracle along with some tricks to partially deobfuscate it. In this post, I’m going to explain the tricks and the overall process I used. This post will be useful if you deal with a lot of obfuscated Android apps.

The main problem was dex-oracle didn’t work “out of the box”. It took some “hacking” to make it work. Specifically, I modified an existing deobfuscation plugin to create two new plugins as well as slightly modify the app. It’s really hard to make completely generalized deobfuscation tools, or any kind of advanced tool, so you’ll need to know how it works in order to modify it to suit your needs.

The Sample

Here’s the SHA256:

1
2
$ shasum -a 256 xjmurla.gqscntaej.bfdiays.apk
d3becbee846560d0ffa4f3cda708d69a98dff92785b7412d763f810c51c0b091 xjmurla.gqscntaej.bfdiays.apk

High-Level Analysis

I like to start with a decompilation just to get a high level overview of the package structure. Here’s what the class list:

class list

Some class names have been ProGuard’ed (a, b, c, etc.) but some haven’t (Ceacbcbf). These unobfuscated classes are probably Android components (activity, service, broadcast receiver, etc.) which must be declared in the manifest. Thus, any tool which automatically renames them would also have to rename them in the manifest, which is hard. These may have been manually changed. The obfuscation is probably home-made and partially done by hand. This means it’s probably malicious because a legit developer would probably pull a commercial obfuscator off the shelf and just use that. They wouldn’t waste time changing their class names to something indecipherable like Aeabffdccdac.

The code is obfuscated. Below is a class which shows the obfuscation:

obfuscated method decompilation

You can’t see any strings or class names, which is really annoying. This looks like something Simplify can handle, but, spoilers, it fails miserably. That’s fine. I have many tricks up my sleeve. Let’s take a look at the Smali and see if anything jumps out.

String and Class Obfuscation

The first type of obfuscation which jumped out at me was an “indexed string lookup” type obfuscation.

1
2
3
const v2, 0x320fb26f
invoke-static {v2}, Lxjmurla/gqscntaej/bfdiays/f;->a(I)Ljava/lang/String;
move-result-object v2

This pattern is found hundreds of times in the code. It takes a number, passes it to f.a(int), and gets a string back. This is some basic “level 1” style encryption. There’s probably a big method somewhere which builds an array of strings that the number indexes into.

A second type of obfuscation hides class constants using an identical technique:

1
2
3
const v1, 0x19189b07
invoke-static {v1}, Lxjmurla/gqscntaej/bfdiays/g;->c(I)Ljava/lang/Class;
move-result-object v1

This code passes a number to g.c(int) and gets back a class object (const-class).

You may be thinking you’ll have to reverse engineer the lookup methods, and you’d be wrong. It’s cool and all to deep dive into the complex code and completely master it by writing a decryption routine. But honestly, fuck that. Speed is the name of the game, and I really don’t have time to fuck around with this malware author’s bullshit, retarded, home-made, amateur hour obfuscation. Instead of reversing everything, consider that these “lookup” methods are both static. It should be possible to just execute them with the same inputs from the code to get back the decrypted output. For example, in the case of string decryption, I should be able to execute f.a(0x320fb26f) and get back the decrypted string.

The question is, of course, how do you execute just the target method code? It’s an APK. How can you execute just the method you want with the inputs you want? How do you harness the target methods? There are two paths you can go by:

  1. Convert target DEX to a JAR using dex2jar or enjarify. Then, import the JAR into a Java app and call the decryption code from your Java app.
  2. Create a stub / driver app which takes command line arguments and can reflect methods in a DEX file. Then, execute the driver app + target DEX on an emulator.

As it happens, I’ve already created dex-oracle which does #2. I like #2 more than #1 because it doesn’t rely on decompilers which often introduce subtle logic bugs. However, I’ve used #1 a few times in a pinch, so it’s worth mentioning. I went about adding support for this type of obfuscation to dex-oracle. the plugins were added in Add indexed string + class lookups.

The way dex-oracle works is pretty simple. It contains a collection of plugins which define regular expressions which pull out key bits of information – method calls and arguments. Then, it constructs real method calls with the arguments you pull out and passes them to a driver which executes the original DEX file on an emulator. Finally, the plugin defines how the driver output should be used to modify the method.

For example, the regular expression could look for “a const number, a call to a static method which takes a number and returns a string, and moves the result to a register”. Then, the driver executes that method with the number and returns the decrypt string. Finally, the original string lookup code is replaced with just the decrypted string. You can read more about how it works in TetCon 2016 Android Deobfuscation Presentation.

dex-oracle Before Modification

Unfortunately, even with the new plugins, dex-oracle fails. To keep things simple, I disable all plugins except IndexStringLookup and I only process the d class from the picture example above.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
$ dex-oracle xjmurla.gqscntaej.bfdiays.apk --disable-plugins bitwiseantiskid,stringdecryptor,undexguard,unreflector,indexedclasslookup -i '/d'
Invalid date/time in zip entry
Invalid date/time in zip entry
Invalid date/time in zip entry
Invalid date/time in zip entry
Invalid date/time in zip entry
Invalid date/time in zip entry
Invalid date/time in zip entry
Invalid date/time in zip entry
Invalid date/time in zip entry
Optimizing 11 methods over 23 Smali files.
[WARN] 2017-10-28 12:28:45: Unsuccessful status: failure for Error executing 'static java.lang.String xjmurla.gqscntaej.bfdiays.f.a(int)' with 'I:839889519'
java.lang.reflect.InvocationTargetException
at java.lang.reflect.Method.invokeNative(Native Method)
at java.lang.reflect.Method.invoke(Method.java:515)
at org.cf.oracle.Driver.invokeMethod(Driver.java:71)
at org.cf.oracle.Driver.main(Driver.java:131)
at com.android.internal.os.RuntimeInit.nativeFinishInit(Native Method)
at com.android.internal.os.RuntimeInit.main(RuntimeInit.java:243)
at dalvik.system.NativeStart.main(Native Method)
Caused by: java.lang.NullPointerException
at xjmurla.gqscntaej.bfdiays.f.a(SourceFile:528)
... 7 more
// ** SNIP MANY SIMILAR ERRORS **
Optimizations: string_lookups=13
Invalid date/time in zip entry
// ** SNIP DUMB WARNINGS **
Invalid date/time in zip entry
Time elapsed 1.954255 seconds

The Invalid date/time in zip entry stuff is just noise. Maybe they tried obfuscating the timestamp in the ZIP? I dunno.

What concerns me is the Unsuccessful status: failure for Error executing 'static java.lang.String xjmurla.gqscntaej.bfdiays.f.a(int)' with 'I:839889519'. The error tells me there’s a NullPointerException when it executes f.a(int). Looks like every time it tried to call that method, it failed. So, let’s look at f.a(int).

1
2
3
4
5
6
7
8
9
10
11
.method static a(I)Ljava/lang/String;
.registers 3
sget-object v0, Lxjmurla/gqscntaej/bfdiays/f;->k:[Ljava/lang/String;
const v1, 0x320fb1f0
sub-int v1, p0, v1
aget-object v0, v0, v1
return-object v0
.end method

The entire method is pretty small. Just subtracts the first argument from a big constant and uses that as an index into a string array, Lxjmurla/gqscntaej/bfdiays/f;->k:[Ljava/lang/String;. Well, let’s look out f;->k is initialized.

1
2
3
4
5
6
7
8
$ ag -Q 'Lxjmurla/gqscntaej/bfdiays/f;->k:[Ljava/lang/String;'
xjmurla/gqscntaej/bfdiays/Ceacabcbf.smali
169: sput-object v0, Lxjmurla/gqscntaej/bfdiays/f;->k:[Ljava/lang/String;
245: sget-object v0, Lxjmurla/gqscntaej/bfdiays/f;->k:[Ljava/lang/String;
256: sget-object v0, Lxjmurla/gqscntaej/bfdiays/f;->k:[Ljava/lang/String;
xjmurla/gqscntaej/bfdiays/f.smali
72: sget-object v0, Lxjmurla/gqscntaej/bfdiays/f;->k:[Ljava/lang/String;

There’s only one sput-object and it’s in xjmurla/gqscntaej/bfdiays/Ceacabcbf.smali. By looking for this line in Ceacabcbf, we find private Ceacabcbf;->a()V. This is a big, long, complicated method which contains a HUGE string literal which is processed, chunked up, and stored in f;->k. Hmm, our NullPointerException is caused by this field not getting initialized. This means that Ceacabcbf;->a()V is not getting called during execution of the string decryption method. Well, when is it called?

1
2
3
$ ag -Q 'Lxjmurla/gqscntaej/bfdiays/Ceacabcbf;->a()V'
xjmurla/gqscntaej/bfdiays/Ceacabcbf.smali
1313: invoke-direct {p0}, Lxjmurla/gqscntaej/bfdiays/Ceacabcbf;->a()V

Ahh, it’s only called in Ceacabcbf. Let’s find that.

1
2
3
4
5
6
7
8
9
10
11
.method public onCreate()V
.registers 1
invoke-super {p0}, Landroid/app/Application;->onCreate()V
sput-object p0, Lxjmurla/gqscntaej/bfdiays/Ceacabcbf;->a:Lxjmurla/gqscntaej/bfdiays/Ceacabcbf;
invoke-direct {p0}, Lxjmurla/gqscntaej/bfdiays/Ceacabcbf;->a()V
return-void
.end method

It’s called in Ceacabcbf;->onCreate()V. This class is a subclass of Application. Without looking at the manifest, I’m pretty sure that when the app starts, this component is created, onCreate()V is called, the decrypted string array is built, and most importantly f;->k is initialized. Hmm, how can I make it so that dex-oracle calls this method when decrypting strings?

My first thought is to add a method call to Ceacabcbf;->a()V in f;-><clinit>. This ensures that when the string decryption class f is loaded, it initializes the decrypted string array. BUT, a()V is direct. WHAT TO DO?

Well, this is kind of dumb but it works sometimes. Just create a new public, static method called Ceacabcbf;->init_decrypt()V and copy the code from Ceacabcbf;->a()V. Then, add a line to call this method in f;-><clinit>:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
.method static constructor <clinit>()V
.registers 1
const/4 v0, 0x0
sput v0, Lxjmurla/gqscntaej/bfdiays/f;->a:I
sput v0, Lxjmurla/gqscntaej/bfdiays/f;->d:I
sput v0, Lxjmurla/gqscntaej/bfdiays/f;->e:I
sput v0, Lxjmurla/gqscntaej/bfdiays/f;->f:I
const/4 v0, 0x4
new-array v0, v0, [Ljava/lang/String;
sput-object v0, Lxjmurla/gqscntaej/bfdiays/f;->h:[Ljava/lang/String;
const-string v0, ""
sput-object v0, Lxjmurla/gqscntaej/bfdiays/f;->i:Ljava/lang/Object;
# LOL MONEY, MONEY LOL
invoke-static {}, Lxjmurla/gqscntaej/bfdiays/Ceacabcbf;->init_decrypt()V
return-void
.end method

dex-oracle After Modification

After making some changes which hopefully work, need rebuild the DEX from the modified Smali and try dex-oracle on it.

1
2
3
4
5
$ smali ass out -o xjmurla_mod1.dex
$ dex-oracle xjmurla_mod1.dex --disable-plugins bitwiseantiskid,stringdecryptor,undexguard,unreflector,indexedclasslookup -i '/d'
Optimizing 11 methods over 23 Smali files.
Optimizations: string_lookups=13
Time elapsed 2.034493 seconds

No errors. Let’s see the decompilation.

1
2
3
$ d2j-dex2jar.sh xjmurla_mod1_oracle.dex
dex2jar xjmurla_mod1_oracle.dex -> ./xjmurla_mod1_oracle-dex2jar.jar
$ jd xjmurla_mod1_oracle-dex2jar.jar

deobfuscated strings

Oh, hello there Mr. C&C domain! GET REKT BRO.

get rekt

Ok, but that still leaves the class deobfuscation. That’s still annoying, right? Well, to keep this post short, dex-oracle fails when deobbfuscating classes for the same reason as it originally failed for strings. The same Ceacabcbf;->a()V method needs to be called.

The same trick can be used – just call Ceacabcbf;->init_decrypt()V in g;-><clinit>. However, g doesn’t have a <clinit> so you’ll have to add one:

1
2
3
4
5
6
.method static constructor <clinit>()V
.registers 0
invoke-static {}, Lxjmurla/gqscntaej/bfdiays/Ceacabcbf;->init_decrypt()V
return-void
.end method

Now, rebuild and let dex-oracle do it’s thing:

1
2
3
4
5
$ smali ass out -o xjmurla_mod2.dex
$ dex-oracle xjmurla_mod2.dex -i '/d'
Optimizing 11 methods over 23 Smali files.
Optimizations: string_decrypts=0, class_lookups=13, string_lookups=13
Time elapsed 3.099335 seconds

Let’s see if the decompilation looks any different.

1
2
3
$ d2j-dex2jar.sh xjmurla_mod2_oracle.dex
dex2jar xjmurla_mod1_oracle.dex -> ./xjmurla_mod2_oracle-dex2jar.jar
$ jd xjmurla_mod1_oracle-dex2jar.jar

deobfuscated strings and classes

There’s not much difference for this method, but other methods have a lot more information, especially in the Smali where you can see lots of const-classes. There’s still one call to g.c(int) which isn’t deobfuscated. I found out that this is because the method call succeeds but returns null. Maybe that’s why it’s in a try-catch? Maybe it’s trying to load a class which doesn’t exist on every Android API version?

One final test: run it against the entire DEX file.

1
2
3
4
$ dex-oracle xjmurla_mod2.dex
Optimizing 125 methods over 23 Smali files.
Optimizations: string_decrypts=0, class_lookups=354, string_lookups=330
Time elapsed 3.306326 seconds

It worked. Cool. Now there are lots of strings! This should also make it a lot easier for Simplify to work because there’s less code to execute and fewer places to fail.

Summary

Hopefully after reading this you have better idea of how to bend dex-oracle to suit your needs. It’s pretty flexible and great when you can isolate the code you need to run to a single method. Sometimes you need to make changes to an Android app to help dex-oracle, but modifying Smali is relatively easy to modify and a lot of malware doesn’t bother doing anti-tampering checks.