深入理解计算机系统 bomblab(上)
datalab https://www.cnblogs.com/vv123/p/17154464.html
简介
CS:APP3e, Bryant and O'Hallaron (cmu.edu)
该实验有6个关卡,要求学生对一个 Linux/x86-64 binary bomb 进行逆向分析,输入正确的6个字符串,以拆除炸弹。
A "binary bomb" is a program provided to students as an object code file.
When run, it prompts the user to type in 6 different strings. If any of these is incorrect, the bomb "explodes," printing an error message and logging the event on a grading server.
Students must "defuse" their own unique bomb by disassembling and reverse engineering the program to determine what the 6 strings should be.
The lab teaches students to understand assembly language, and also forces them to learn how to use a debugger. It's also great fun.
将 bomb.tar
解压后,只有三个文件。

bomb
是一个可执行二进制文件, bomb.c
是它的源文件,但包含的头文件并未给出,因而无法直接编译。
执行 ./bomb
后,需要依次输入6个字符串,如果输入错误就会“引爆炸弹”。

前置知识:C语言,gdb,CSAPP前三章。
个人环境:wsl2(ubuntu 20.04), 用vscode ssh连接,直接看反汇编即可。
常用的东西:
GDB命令
x86_64寄存器表
Parse 1 字符串比较
让我们先来看看 bomb.c
,它非常短,几乎得不到什么有用的信息。
/***************************************************************************
* Dr. Evil's Insidious Bomb, Version 1.1
* Copyright 2011, Dr. Evil Incorporated. All rights reserved.
*
* LICENSE:
*
* Dr. Evil Incorporated (the PERPETRATOR) hereby grants you (the
* VICTIM) explicit permission to use this bomb (the BOMB). This is a
* time limited license, which expires on the death of the VICTIM.
* The PERPETRATOR takes no responsibility for damage, frustration,
* insanity, bug-eyes, carpal-tunnel syndrome, loss of sleep, or other
* harm to the VICTIM. Unless the PERPETRATOR wants to take credit,
* that is. The VICTIM may not distribute this bomb source code to
* any enemies of the PERPETRATOR. No VICTIM may debug,
* reverse-engineer, run "strings" on, decompile, decrypt, or use any
* other technique to gain knowledge of and defuse the BOMB. BOMB
* proof clothing may not be worn when handling this program. The
* PERPETRATOR will not apologize for the PERPETRATOR's poor sense of
* humor. This license is null and void where the BOMB is prohibited
* by law.
***************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#include "support.h" //没有给
#include "phases.h" //没有给
/*
* Note to self: Remember to erase this file so my victims will have no
* idea what is going on, and so they will all blow up in a
* spectaculary fiendish explosion. -- Dr. Evil
*/
FILE *infile;
int main(int argc, char *argv[])
{
char *input;
/* Note to self: remember to port this bomb to Windows and put a
* fantastic GUI on it. */
/* When run with no arguments, the bomb reads its input lines
* from standard input. */
if (argc == 1) {
infile = stdin;
}
/* When run with one argument <file>, the bomb reads from <file>
* until EOF, and then switches to standard input. Thus, as you
* defuse each phase, you can add its defusing string to <file> and
* avoid having to retype it. */
else if (argc == 2) {
if (!(infile = fopen(argv[1], "r"))) {
printf("%s: Error: Couldn't open %s\n", argv[0], argv[1]);
exit(8);
}
}
/* You can't call the bomb with more than 1 command line argument. */
else {
printf("Usage: %s [<input_file>]\n", argv[0]);
exit(8);
}
/* Do all sorts of secret stuff that makes the bomb harder to defuse. */
initialize_bomb();
printf("Welcome to my fiendish little bomb. You have 6 phases with\n");
printf("which to blow yourself up. Have a nice day!\n");
/* Hmm... Six phases must be more secure than one phase! */
input = read_line(); /* Get input */
phase_1(input); /* Run the phase */
phase_defused(); /* Drat! They figured it out!
* Let me know how they did it. */
printf("Phase 1 defused. How about the next one?\n");
/* The second phase is harder. No one will ever figure out
* how to defuse this... */
input = read_line();
phase_2(input);
phase_defused();
printf("That's number 2. Keep going!\n");
/* I guess this is too easy so far. Some more complex code will
* confuse people. */
input = read_line();
phase_3(input);
phase_defused();
printf("Halfway there!\n");
/* Oh yeah? Well, how good is your math? Try on this saucy problem! */
input = read_line();
phase_4(input);
phase_defused();
printf("So you got that one. Try this one.\n");
/* Round and 'round in memory we go, where we stop, the bomb blows! */
input = read_line();
phase_5(input);
phase_defused();
printf("Good work! On to the next...\n");
/* This phase will never be used, since no one will get past the
* earlier ones. But just in case, make this one extra hard. */
input = read_line();
phase_6(input);
phase_defused();
/* Wow, they got it! But isn't something... missing? Perhaps
* something they overlooked? Mua ha ha ha ha! */
return 0;
}
首先,调用 input_line()
读取输入的字符串,然后执行 phase_1()
, 如果input
正确,就执行“拆弹”,否则程序中止。我们需要做的就是通过反汇编推测phase_1()
的内容,得到正确的 input
。
使用objdump进行反汇编
objdump -d ./bomb >> bomb.s
找到 phase_1
,位于346行。
0000000000400ee0 <phase_1>:
400ee0: 48 83 ec 08 sub $0x8,%rsp
400ee4: be 00 24 40 00 mov $0x402400,%esi
400ee9: e8 4a 04 00 00 callq 401338 <strings_not_equal>
400eee: 85 c0 test %eax,%eax
400ef0: 74 05 je 400ef7 <phase_1+0x17>
400ef2: e8 43 05 00 00 callq 40143a <explode_bomb>
400ef7: 48 83 c4 08 add $0x8,%rsp
400efb: c3 retq
我们发现将0x402400
存入%esi
,然后调用 strings_not_equal
,并测试返回值是否为0。如果不为0就执行 explode_bomb
。于是找到 strings_not_equal
:
0000000000401338 <strings_not_equal>:
401338: 41 54 push %r12
40133a: 55 push %rbp
40133b: 53 push %rbx
40133c: 48 89 fb mov %rdi,%rbx
40133f: 48 89 f5 mov %rsi,%rbp
401342: e8 d4 ff ff ff callq 40131b <string_length>
401347: 41 89 c4 mov %eax,%r12d
40134a: 48 89 ef mov %rbp,%rdi
40134d: e8 c9 ff ff ff callq 40131b <string_length>
401352: ba 01 00 00 00 mov $0x1,%edx
401357: 41 39 c4 cmp %eax,%r12d
40135a: 75 3f jne 40139b <strings_not_equal+0x63>
40135c: 0f b6 03 movzbl (%rbx),%eax
40135f: 84 c0 test %al,%al
401361: 74 25 je 401388 <strings_not_equal+0x50>
401363: 3a 45 00 cmp 0x0(%rbp),%al
401366: 74 0a je 401372 <strings_not_equal+0x3a>
401368: eb 25 jmp 40138f <strings_not_equal+0x57>
40136a: 3a 45 00 cmp 0x0(%rbp),%al
40136d: 0f 1f 00 nopl (%rax)
401370: 75 24 jne 401396 <strings_not_equal+0x5e>
401372: 48 83 c3 01 add $0x1,%rbx
401376: 48 83 c5 01 add $0x1,%rbp
40137a: 0f b6 03 movzbl (%rbx),%eax
40137d: 84 c0 test %al,%al
40137f: 75 e9 jne 40136a <strings_not_equal+0x32>
401381: ba 00 00 00 00 mov $0x0,%edx
401386: eb 13 jmp 40139b <strings_not_equal+0x63>
401388: ba 00 00 00 00 mov $0x0,%edx
40138d: eb 0c jmp 40139b <strings_not_equal+0x63>
40138f: ba 01 00 00 00 mov $0x1,%edx
401394: eb 05 jmp 40139b <strings_not_equal+0x63>
401396: ba 01 00 00 00 mov $0x1,%edx
40139b: 89 d0 mov %edx,%eax
40139d: 5b pop %rbx
40139e: 5d pop %rbp
40139f: 41 5c pop %r12
4013a1: c3 retq
这个函数略长,但通过133c、133f、1372、1376等关键行可以逐步分析可得,该函数的作用是比较内存%rdi
和%rsi
处的字符串是否相等(先比长度,后逐位比较),若相等则返回0,否则返回1。而%rsi
和前面出现的%esi
是同一个寄存器分别以四字和双字访问(实际上,%rsi是该函数的第二个参数,见P120),因此 strings_not_equal
实际上比较了0x402400
处的字符串与输入的字符串是否相等。
使用GDB查看0x402400
处的字符串,得到答案。phase1就做完了。
(gdb) x/s 0x402400
0x402400: "Border relations with Canada have never been better."
Phase 2 循环
查看phase_2的代码,进行分析:
0000000000400efc <phase_2>:
400efc: 55 push %rbp
400efd: 53 push %rbx
400efe: 48 83 ec 28 sub $0x28,%rsp #分配0x28字节的栈空间
400f02: 48 89 e6 mov %rsp,%rsi
400f05: e8 52 05 00 00 callq 40145c <read_six_numbers>
#下面比较(%rsp)与1的值,若不相等直接explode bomb,否则跳到400f30(其实我们不必关心explode以后的行为)
400f0a: 83 3c 24 01 cmpl $0x1,(%rsp)
400f0e: 74 20 je 400f30 <phase_2+0x34>
400f10: e8 25 05 00 00 callq 40143a <explode_bomb>
400f15: eb 19 jmp 400f30 <phase_2+0x34>
#-----------------------------这一部分看起来是个for循环--------------------------------
400f17: 8b 43 fc mov -0x4(%rbx),%eax # 上一个整数 -> %eax
400f1a: 01 c0 add %eax,%eax # %eax = %eax * 2
400f1c: 39 03 cmp %eax,(%rbx) # 比较上一个整数乘2 与 当前整数 的大小关系
400f1e: 74 05 je 400f25 <phase_2+0x29>
400f20: e8 15 05 00 00 callq 40143a <explode_bomb> # 如果不相等,explode
400f25: 48 83 c3 04 add $0x4,%rbx # 下一个整数的地址 -> %rbx
400f29: 48 39 eb cmp %rbp,%rbx # 比较%rbx与%rpb(end指针)
400f2c: 75 e9 jne 400f17 <phase_2+0x1b> # 如果下一个地址等于%rbp,说明已处理完6个整数,
400f2e: eb 0c jmp 400f3c <phase_2+0x40> # 跳出循环,否则跳回400f17
# 下面是for循环的init expr
400f30: 48 8d 5c 24 04 lea 0x4(%rsp),%rbx # 第二个整数的地址 -> %rbx
400f35: 48 8d 6c 24 18 lea 0x18(%rsp),%rbp # %rbp 指向“第7个”整数的位置
400f3a: eb db jmp 400f17 <phase_2+0x1b> # 无条件跳转(只执行一次)
#--------------------------------------------------------------------------
400f3c: 48 83 c4 28 add $0x28,%rsp
400f40: 5b pop %rbx
400f41: 5d pop %rbp
400f42: c3 retq
经过分析,这是一段for循环代码,其对应的C代码大致如下
if (a[1] != 1) explode_bomb();
for (int i = 2; i < 7; i++) {
if (a[i] != a[i - 1] * 2) {
explode_bomb();
}
}
显然,应该输入的整数序列为 1 2 4 8 16 32
这样就完成了Phase 2.
Phase 3 switch case
0000000000400f43 <phase_3>:
400f43: 48 83 ec 18 sub $0x18,%rsp
400f47: 48 8d 4c 24 0c lea 0xc(%rsp),%rcx
400f4c: 48 8d 54 24 08 lea 0x8(%rsp),%rdx
400f51: be cf 25 40 00 mov $0x4025cf,%esi #第二个参数?
400f56: b8 00 00 00 00 mov $0x0,%eax
#调用sscanf
#sscanf的函数原型是: int sscanf(const char *str, const char *format, ...)
#可见$0x4025cf是format的地址
#在gdb执行 print (char *) 0x4025cf
#得到输入格式为"%d %d"
400f5b: e8 90 fc ff ff callq 400bf0 <__isoc99_sscanf@plt>
400f60: 83 f8 01 cmp $0x1,%eax #如果返回值小于1,explode
400f63: 7f 05 jg 400f6a <phase_3+0x27>
400f65: e8 d0 04 00 00 callq 40143a <explode_bomb>
400f6a: 83 7c 24 08 07 cmpl $0x7,0x8(%rsp) #如果第一个数大于8,explode
400f6f: 77 3c ja 400fad <phase_3+0x6a>
#下面是一个switch case结构,
#根据输入的第一个数,计算跳转地址,将对应的数存入%eax,然后跳转到 400fbe
400f71: 8b 44 24 08 mov 0x8(%rsp),%eax
400f75: ff 24 c5 70 24 40 00 jmpq *0x402470(,%rax,8)
#这是一个间接跳转指令,表示取出0x402470的值,+8*%rax作为跳转地址
#用GDB看到,0x402470处的值就是0x400f7c,与猜测一致
400f7c: b8 cf 00 00 00 mov $0xcf,%eax
400f81: eb 3b jmp 400fbe <phase_3+0x7b>
400f83: b8 c3 02 00 00 mov $0x2c3,%eax
400f88: eb 34 jmp 400fbe <phase_3+0x7b>
400f8a: b8 00 01 00 00 mov $0x100,%eax
400f8f: eb 2d jmp 400fbe <phase_3+0x7b>
400f91: b8 85 01 00 00 mov $0x185,%eax
400f96: eb 26 jmp 400fbe <phase_3+0x7b>
400f98: b8 ce 00 00 00 mov $0xce,%eax
400f9d: eb 1f jmp 400fbe <phase_3+0x7b>
400f9f: b8 aa 02 00 00 mov $0x2aa,%eax
400fa4: eb 18 jmp 400fbe <phase_3+0x7b>
400fa6: b8 47 01 00 00 mov $0x147,%eax
400fab: eb 11 jmp 400fbe <phase_3+0x7b>
400fad: e8 88 04 00 00 callq 40143a <explode_bomb>
400fb2: b8 00 00 00 00 mov $0x0,%eax
400fb7: eb 05 jmp 400fbe <phase_3+0x7b>
400fb9: b8 37 01 00 00 mov $0x137,%eax
400fbe: 3b 44 24 0c cmp 0xc(%rsp),%eax #判断输入的第二个数是否等于eax中的数
400fc2: 74 05 je 400fc9 <phase_3+0x86>
400fc4: e8 71 04 00 00 callq 40143a <explode_bomb>
400fc9: 48 83 c4 18 add $0x18,%rsp
400fcd: c3 retq
Phase 3是一个switch case跳转表结构,根据输入的第一个数计算跳转地址,与然后与相对应的数作比较。
我们可以选择Switch的第一个分支,输入0和0xcf十进制下的表示207即可。
