Doop学习 part 1

Y4er 收录于类别代码审计和系列静态软件分析

2022-09-08 2022-09-08 约 2352 字预计阅读 5 分钟

系列 - 静态软件分析

警告

本文最后更新于 2022-09-08，文中内容可能已过时。

ByteCodeDL只是doop的一个分支，所以我也来看看doop。

# 安装

先装souffle，然后再装doop，doop直接git clone https://github.com/plast-lab/doop-mirror。

运行

1
./doop --help all

# 基本使用

doop仓库是一个gradle项目，./doop其实就是一个bash去调用gradle命令

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
#!/usr/bin/env bash

ARGS=""
while [ "$1" != "" ]; do
    case "$1" in
        # Quote arguments with spaces.
        *\ * )
            ARGS="${ARGS} '$1'"
            ;;
        *)
            ARGS="${ARGS} $1"
            ;;
    esac
    shift
done

# Export number of terminal columns for help display.
if command -v 'tput' &> /dev/null
then
    export COLUMNS=`tput cols`
fi
eval "./gradlew :run -Pargs=\"$ARGS\""

还提供了一个./doopOffline，就是离线模式的doop。通常在每次调用Doop时，底层构建系统都会检查所有依赖库的更新版本。有时可能需要在离线模式下调用doop。为此目的有一个替代脚本。

1
2
#!/bin/bash
eval './gradlew :run -Pargs="'$@'" --offline'

其实就是gradle加了一个offline参数

Doop 执行流程大致可以分为几步：

使用soot生成jimple文件
使用--generate-jimple参数可以输出jimple文件，在output/$(uuid)/database/jimple文件夹下
将jimple文件转换为datalog引擎的输入事实（.facts）
使用souffle引擎执行选定的分析，将关系输出为.csv，即分析结果

以长城杯b4bycoffee为例，解压springboot项目，将class文件打包为jar包

1
2
3
unzip b4bycoffee.jar
cd BOOT-INF/classes
jar -cvf classes.jar *

然后运行

1
./doop -a context-insensitive --information-flow spring --fact-gen-cores 16 --souffle-jobs 16  -i /tmp/BOOT-INF/lib/classes.jar --stats none

解释一下各个参数

-a context-insensitive 指定分析模式为上下文无关
--information-flow spring 使用 P/Taint 进行污点分析，指定为spring项目
--fact-gen-cores 16 --souffle-jobs 16 fact生成和souffle任务并发数
-i /tmp/BOOT-INF/lib/classes.jar 指定输入文件
--stats none 关闭统计

如果是第一次执行会比较慢，因为他会去http://centauri.di.uoa.gr:8081/拉一些jar包，等着就行了，第二次就快了。

构建完之后可见整个构建过程分为几部分

先编译项目
运行soot-facts-generator生成事实文件在cache文件夹并且拷贝了一份到out/uuid/database文件夹
analysis分析阶段调用souffle运行规则
输出结果并且将最新的结果软连接一份到last-analysis目录

分析输出的结果如图

1
2
3
4
5
6
ubuntu@ubuntu:~/doop$ cat  last-analysis/MockObject.csv
javax.servlet.http.HttpServletRequest::MockObject       javax.servlet.http.HttpServletRequest
javax.servlet.http.HttpServletResponse::MockObject      javax.servlet.http.HttpServletResponse
com.example.b4bycoffee.controller.indexController::MockObject   com.example.b4bycoffee.controller.indexController
com.example.b4bycoffee.controller.coffeeController::MockObject  com.example.b4bycoffee.controller.coffeeController
com.example.b4bycoffee.model.CoffeeRequest::MockObject  com.example.b4bycoffee.model.CoffeeRequest

可见spring的controller会被自动进行污点分析。

除去--information-flow指定为spring以外还支持一些其他的比如webapps javaee项目，这里不再演示了。

# 添加自己的规则

doop的规则是基础规则，只给你了脚手架，针对我们的实际应用我们不得不写一些自定义规则，比如我们想要调用图，那么可以将如下规则保存为my.dl

1
2
3
4
5
6
7
.decl CG(?caller:Method, ?callee:Method)

CG(?caller, ?callee) :-
  mainAnalysis.AnyCallGraphEdge(?invocation, ?callee),
  Instruction_Method(?invocation, ?caller).

.output CG

然后加上参数--extra-logic my.dl重新构建，查看last-analysis下的CG.csv即可。

# 另一种自定义规则的方式

加参数--extra-logic仍然会很慢，会更新依赖包、重新编译、生成facts等固定步骤，有没有更快的方式？

在doop分析的时候可见是用gcc编译成二进制文件来分析

这个二进制文件是souffle编译完的可执行文件。

可以直接运行这个可执行文件会按照之前的规则重新输出结果，规则文件是gen_xxx.dl

我们写的my.dl被追加到最后面去执行了。

那么我们想要改自定义的规则可以直接编辑这个gen_xx.dl，在最后面追加即可。然后用souffle去运行，毕竟facts事实文件都有了。

1
souffle -F database/ gen_1755044251944027223.dl -j32

这里提到bytecodedl是把doop的几部分拆出来做了。用soot-facts-generator生成facts，编写规则之后直接用souffle进行查询。

而doop的好处就是内置的规则比bytecodedl多。

缺点很明显就是慢，每次查询都需要下依赖并重新生成facts。

官方也提到了可以生成facts之后用souffle运行自定义规则，我直接复制过来。

在文件 temp.dl 中放入代码：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
#!java
.decl Var_DeclaringMethod(v: symbol, m: symbol)
.input Var_DeclaringMethod(IO="file", filename="Var-DeclaringMethod.facts", delimiter="\t")

.decl VarPointsTo(c1: symbol, h: symbol, c2: symbol, v: symbol)
.input VarPointsTo(IO="file", filename="VarPointsTo.csv", delimiter="\t")

.decl Temp(v: symbol, h: symbol)
Temp(v, h) :-
  VarPointsTo(_, h, _, v),
  Var_DeclaringMethod(v, "<Example: void test(int)>").

.output Temp

复制 Var-DeclaringMethod.facts，使它们与输出关系 VarPointsTo 位于同一目录中（替换$id为您的分析 ID）：

1
2
#!bash
$ cp out/$id/facts/Var-DeclaringMethod.facts out/$id/database/

运行查询并查看其结果：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
#!bash
$ souffle -F out/$id/database/ temp.dl
$ cat Temp.csv
<Example: void test(int)>/@this <Example: void main(java.lang.String[])>/new Example/0
<Example: void test(int)>/l0#_0 <Example: void main(java.lang.String[])>/new Example/0
<Example: void test(int)>/l3#_32        <Example: void test(int)>/new Cat/1
<Example: void test(int)>/l4#_33        <Example: void test(int)>/new Cat/2
<Example: void test(int)>/$stack5       <Example: void test(int)>/new Dog/0
<Example: void test(int)>/$stack6       <Example: void test(int)>/new Cat/1
<Example: void test(int)>/$stack7       <Example: void test(int)>/new Cat/2
<Example: void test(int)>/$stack8       <Example: void test(int)>/new Cat/0
<Example: void test(int)>/l2#_26        <Example: void test(int)>/new Cat/0
<Example: void test(int)>/l2_$$A_1#_28  <Example: void test(int)>/new Dog/0
<Example: void test(int)>/l2_$$A_2#_29  <Example: void test(int)>/new Cat/0
<Example: void test(int)>/l2_$$A_2#_29  <Example: void test(int)>/new Dog/0

这种方式需要你写import facts的规则，比较麻烦。bytecodedl就是这样。

# 可能会碰到的报错

soot生成facts事实的时候可能会报oom异常，这是因为内存给小了，修改build.gradle中

1
2
def factGenXmx='32G'
def factGenStack='16G'

给大一点就行了。

# 文末

其实doop最精华的应该是他的dl规则，我再看明白一点再写文章把。

说是最牛逼的指针分析框架，但实际学习的时候文档不全、资料太少、莫名其妙的报错等各种原因导致学习门槛太高了。

对实际挖洞而言，具体怎么用得再学一学再写，个人倾向于像bytecodedl那样拆出来改一改，加上危险函数sink规则，配合doop原有的spring、webapp的source，输出一条准确的污点分析过后的路径图是最好的。

学无止境啊，共勉吧。

# 参考

下面的链接建议全部看一遍。

文笔垃圾，措辞轻浮，内容浅显，操作生疏。不足之处欢迎大师傅们指点和纠正，感激不尽。