深入理解计算机系统 bomblab(上)

_vv123 / 2023-08-26 / 原文

datalab https://www.cnblogs.com/vv123/p/17154464.html

简介

CS:APP3e, Bryant and O'Hallaron (cmu.edu)

该实验有6个关卡,要求学生对一个 Linux/x86-64 binary bomb 进行逆向分析,输入正确的6个字符串,以拆除炸弹。

A "binary bomb" is a program provided to students as an object code file.

When run, it prompts the user to type in 6 different strings. If any of these is incorrect, the bomb "explodes," printing an error message and logging the event on a grading server.

Students must "defuse" their own unique bomb by disassembling and reverse engineering the program to determine what the 6 strings should be.

The lab teaches students to understand assembly language, and also forces them to learn how to use a debugger. It's also great fun.

bomb.tar 解压后,只有三个文件。

image-20230810150917598

bomb 是一个可执行二进制文件, bomb.c 是它的源文件,但包含的头文件并未给出,因而无法直接编译。

执行 ./bomb 后,需要依次输入6个字符串,如果输入错误就会“引爆炸弹”。

image-20230825175137805

前置知识:C语言,gdb,CSAPP前三章。

个人环境:wsl2(ubuntu 20.04), 用vscode ssh连接,直接看反汇编即可。

常用的东西:

GDB命令

img

x86_64寄存器表

img

Parse 1 字符串比较

让我们先来看看 bomb.c ,它非常短,几乎得不到什么有用的信息。

/***************************************************************************
 * Dr. Evil's Insidious Bomb, Version 1.1
 * Copyright 2011, Dr. Evil Incorporated. All rights reserved.
 *
 * LICENSE:
 *
 * Dr. Evil Incorporated (the PERPETRATOR) hereby grants you (the
 * VICTIM) explicit permission to use this bomb (the BOMB).  This is a
 * time limited license, which expires on the death of the VICTIM.
 * The PERPETRATOR takes no responsibility for damage, frustration,
 * insanity, bug-eyes, carpal-tunnel syndrome, loss of sleep, or other
 * harm to the VICTIM.  Unless the PERPETRATOR wants to take credit,
 * that is.  The VICTIM may not distribute this bomb source code to
 * any enemies of the PERPETRATOR.  No VICTIM may debug,
 * reverse-engineer, run "strings" on, decompile, decrypt, or use any
 * other technique to gain knowledge of and defuse the BOMB.  BOMB
 * proof clothing may not be worn when handling this program.  The
 * PERPETRATOR will not apologize for the PERPETRATOR's poor sense of
 * humor.  This license is null and void where the BOMB is prohibited
 * by law.
 ***************************************************************************/

#include <stdio.h>
#include <stdlib.h>
#include "support.h" 	//没有给
#include "phases.h"		//没有给

/* 
 * Note to self: Remember to erase this file so my victims will have no
 * idea what is going on, and so they will all blow up in a
 * spectaculary fiendish explosion. -- Dr. Evil 
 */

FILE *infile;

int main(int argc, char *argv[])
{
    char *input;

    /* Note to self: remember to port this bomb to Windows and put a 
     * fantastic GUI on it. */

    /* When run with no arguments, the bomb reads its input lines 
     * from standard input. */
    if (argc == 1) {  
	infile = stdin;
    } 

    /* When run with one argument <file>, the bomb reads from <file> 
     * until EOF, and then switches to standard input. Thus, as you 
     * defuse each phase, you can add its defusing string to <file> and
     * avoid having to retype it. */
    else if (argc == 2) {
	if (!(infile = fopen(argv[1], "r"))) {
	    printf("%s: Error: Couldn't open %s\n", argv[0], argv[1]);
	    exit(8);
	}
    }

    /* You can't call the bomb with more than 1 command line argument. */
    else {
	printf("Usage: %s [<input_file>]\n", argv[0]);
	exit(8);
    }

    /* Do all sorts of secret stuff that makes the bomb harder to defuse. */
    initialize_bomb();

    printf("Welcome to my fiendish little bomb. You have 6 phases with\n");
    printf("which to blow yourself up. Have a nice day!\n");

    /* Hmm...  Six phases must be more secure than one phase! */
    input = read_line();             /* Get input                   */
    phase_1(input);                  /* Run the phase               */
    phase_defused();                 /* Drat!  They figured it out!
				      * Let me know how they did it. */
    printf("Phase 1 defused. How about the next one?\n");

    /* The second phase is harder.  No one will ever figure out
     * how to defuse this... */
    input = read_line();
    phase_2(input);
    phase_defused();
    printf("That's number 2.  Keep going!\n");

    /* I guess this is too easy so far.  Some more complex code will
     * confuse people. */
    input = read_line();
    phase_3(input);
    phase_defused();
    printf("Halfway there!\n");

    /* Oh yeah?  Well, how good is your math?  Try on this saucy problem! */
    input = read_line();
    phase_4(input);
    phase_defused();
    printf("So you got that one.  Try this one.\n");
    
    /* Round and 'round in memory we go, where we stop, the bomb blows! */
    input = read_line();
    phase_5(input);
    phase_defused();
    printf("Good work!  On to the next...\n");

    /* This phase will never be used, since no one will get past the
     * earlier ones.  But just in case, make this one extra hard. */
    input = read_line();
    phase_6(input);
    phase_defused();

    /* Wow, they got it!  But isn't something... missing?  Perhaps
     * something they overlooked?  Mua ha ha ha ha! */
    
    return 0;
}

首先,调用 input_line() 读取输入的字符串,然后执行 phase_1() , 如果input 正确,就执行“拆弹”,否则程序中止。我们需要做的就是通过反汇编推测phase_1()的内容,得到正确的 input

使用objdump进行反汇编

objdump -d ./bomb >> bomb.s

找到 phase_1 ,位于346行。

0000000000400ee0 <phase_1>:
  400ee0:	48 83 ec 08          	sub    $0x8,%rsp
  400ee4:	be 00 24 40 00       	mov    $0x402400,%esi
  400ee9:	e8 4a 04 00 00       	callq  401338 <strings_not_equal>
  400eee:	85 c0                	test   %eax,%eax
  400ef0:	74 05                	je     400ef7 <phase_1+0x17>
  400ef2:	e8 43 05 00 00       	callq  40143a <explode_bomb>
  400ef7:	48 83 c4 08          	add    $0x8,%rsp
  400efb:	c3                   	retq   

我们发现将0x402400存入%esi,然后调用 strings_not_equal,并测试返回值是否为0。如果不为0就执行 explode_bomb。于是找到 strings_not_equal

0000000000401338 <strings_not_equal>:
  401338:	41 54                	push   %r12
  40133a:	55                   	push   %rbp
  40133b:	53                   	push   %rbx
  40133c:	48 89 fb             	mov    %rdi,%rbx
  40133f:	48 89 f5             	mov    %rsi,%rbp
  401342:	e8 d4 ff ff ff       	callq  40131b <string_length>
  401347:	41 89 c4             	mov    %eax,%r12d
  40134a:	48 89 ef             	mov    %rbp,%rdi
  40134d:	e8 c9 ff ff ff       	callq  40131b <string_length>
  401352:	ba 01 00 00 00       	mov    $0x1,%edx
  401357:	41 39 c4             	cmp    %eax,%r12d
  40135a:	75 3f                	jne    40139b <strings_not_equal+0x63>
  40135c:	0f b6 03             	movzbl (%rbx),%eax
  40135f:	84 c0                	test   %al,%al
  401361:	74 25                	je     401388 <strings_not_equal+0x50>
  401363:	3a 45 00             	cmp    0x0(%rbp),%al
  401366:	74 0a                	je     401372 <strings_not_equal+0x3a>
  401368:	eb 25                	jmp    40138f <strings_not_equal+0x57>
  40136a:	3a 45 00             	cmp    0x0(%rbp),%al
  40136d:	0f 1f 00             	nopl   (%rax)
  401370:	75 24                	jne    401396 <strings_not_equal+0x5e>
  401372:	48 83 c3 01          	add    $0x1,%rbx
  401376:	48 83 c5 01          	add    $0x1,%rbp
  40137a:	0f b6 03             	movzbl (%rbx),%eax
  40137d:	84 c0                	test   %al,%al
  40137f:	75 e9                	jne    40136a <strings_not_equal+0x32>
  401381:	ba 00 00 00 00       	mov    $0x0,%edx
  401386:	eb 13                	jmp    40139b <strings_not_equal+0x63>
  401388:	ba 00 00 00 00       	mov    $0x0,%edx
  40138d:	eb 0c                	jmp    40139b <strings_not_equal+0x63>
  40138f:	ba 01 00 00 00       	mov    $0x1,%edx
  401394:	eb 05                	jmp    40139b <strings_not_equal+0x63>
  401396:	ba 01 00 00 00       	mov    $0x1,%edx
  40139b:	89 d0                	mov    %edx,%eax
  40139d:	5b                   	pop    %rbx
  40139e:	5d                   	pop    %rbp
  40139f:	41 5c                	pop    %r12
  4013a1:	c3                   	retq   

这个函数略长,但通过133c、133f、1372、1376等关键行可以逐步分析可得,该函数的作用是比较内存%rdi%rsi处的字符串是否相等(先比长度,后逐位比较),若相等则返回0,否则返回1。而%rsi和前面出现的%esi是同一个寄存器分别以四字和双字访问(实际上,%rsi是该函数的第二个参数,见P120),因此 strings_not_equal实际上比较了0x402400处的字符串与输入的字符串是否相等。

使用GDB查看0x402400处的字符串,得到答案。phase1就做完了。

(gdb) x/s  0x402400
0x402400:       "Border relations with Canada have never been better."

image-20230825175319463

Phase 2 循环

查看phase_2的代码,进行分析:

0000000000400efc <phase_2>:
  400efc:	55                   	push   %rbp
  400efd:	53                   	push   %rbx
  400efe:	48 83 ec 28          	sub    $0x28,%rsp		#分配0x28字节的栈空间
  400f02:	48 89 e6             	mov    %rsp,%rsi
  400f05:	e8 52 05 00 00       	callq  40145c <read_six_numbers>
  #下面比较(%rsp)与1的值,若不相等直接explode bomb,否则跳到400f30(其实我们不必关心explode以后的行为)
  400f0a:	83 3c 24 01          	cmpl   $0x1,(%rsp)
  400f0e:	74 20                	je     400f30 <phase_2+0x34> 
  400f10:	e8 25 05 00 00       	callq  40143a <explode_bomb>
  400f15:	eb 19                	jmp    400f30 <phase_2+0x34>
  
  #-----------------------------这一部分看起来是个for循环--------------------------------
  400f17:	8b 43 fc             	mov    -0x4(%rbx),%eax		# 上一个整数 -> %eax
  400f1a:	01 c0                	add    %eax,%eax			# %eax = %eax * 2
  400f1c:	39 03                	cmp    %eax,(%rbx)			# 比较上一个整数乘2 与 当前整数 的大小关系
  400f1e:	74 05                	je     400f25 <phase_2+0x29>
  400f20:	e8 15 05 00 00       	callq  40143a <explode_bomb>	# 如果不相等,explode
  400f25:	48 83 c3 04          	add    $0x4,%rbx				# 下一个整数的地址 -> %rbx
  400f29:	48 39 eb             	cmp    %rbp,%rbx				# 比较%rbx与%rpb(end指针)
  400f2c:	75 e9                	jne    400f17 <phase_2+0x1b>	# 如果下一个地址等于%rbp,说明已处理完6个整数,
  400f2e:	eb 0c                	jmp    400f3c <phase_2+0x40>	# 跳出循环,否则跳回400f17
  # 下面是for循环的init expr
  400f30:	48 8d 5c 24 04       	lea    0x4(%rsp),%rbx		 # 第二个整数的地址 -> %rbx
  400f35:	48 8d 6c 24 18       	lea    0x18(%rsp),%rbp		 # %rbp 指向“第7个”整数的位置
  400f3a:	eb db                	jmp    400f17 <phase_2+0x1b> # 无条件跳转(只执行一次)
  #--------------------------------------------------------------------------
  400f3c:	48 83 c4 28          	add    $0x28,%rsp
  400f40:	5b                   	pop    %rbx
  400f41:	5d                   	pop    %rbp
  400f42:	c3                   	retq   

经过分析,这是一段for循环代码,其对应的C代码大致如下

if (a[1] != 1) explode_bomb();
for (int i = 2; i < 7; i++) {
    if (a[i] != a[i - 1] * 2) {
        explode_bomb();
    }
}

显然,应该输入的整数序列为 1 2 4 8 16 32

image-20230810221225972

这样就完成了Phase 2.

Phase 3 switch case

0000000000400f43 <phase_3>:
  400f43:	48 83 ec 18          	sub    $0x18,%rsp
  400f47:	48 8d 4c 24 0c       	lea    0xc(%rsp),%rcx
  400f4c:	48 8d 54 24 08       	lea    0x8(%rsp),%rdx
  400f51:	be cf 25 40 00       	mov    $0x4025cf,%esi #第二个参数?
  400f56:	b8 00 00 00 00       	mov    $0x0,%eax
#调用sscanf
#sscanf的函数原型是: int sscanf(const char *str, const char *format, ...)
#可见$0x4025cf是format的地址
#在gdb执行 print (char *) 0x4025cf
#得到输入格式为"%d %d"
  400f5b:	e8 90 fc ff ff       	callq  400bf0 <__isoc99_sscanf@plt> 
  400f60:	83 f8 01             	cmp    $0x1,%eax #如果返回值小于1,explode
  400f63:	7f 05                	jg     400f6a <phase_3+0x27>
  400f65:	e8 d0 04 00 00       	callq  40143a <explode_bomb>
  400f6a:	83 7c 24 08 07       	cmpl   $0x7,0x8(%rsp) #如果第一个数大于8,explode
  400f6f:	77 3c                	ja     400fad <phase_3+0x6a>

image-20230813160524350

image-20230813173005467

 #下面是一个switch case结构,
 #根据输入的第一个数,计算跳转地址,将对应的数存入%eax,然后跳转到 400fbe
  400f71:	8b 44 24 08          	mov    0x8(%rsp),%eax
  400f75:	ff 24 c5 70 24 40 00 	jmpq   *0x402470(,%rax,8)
 #这是一个间接跳转指令,表示取出0x402470的值,+8*%rax作为跳转地址
 #用GDB看到,0x402470处的值就是0x400f7c,与猜测一致
  400f7c:	b8 cf 00 00 00       	mov    $0xcf,%eax
  400f81:	eb 3b                	jmp    400fbe <phase_3+0x7b>
  400f83:	b8 c3 02 00 00       	mov    $0x2c3,%eax
  400f88:	eb 34                	jmp    400fbe <phase_3+0x7b>
  400f8a:	b8 00 01 00 00       	mov    $0x100,%eax
  400f8f:	eb 2d                	jmp    400fbe <phase_3+0x7b>
  400f91:	b8 85 01 00 00       	mov    $0x185,%eax
  400f96:	eb 26                	jmp    400fbe <phase_3+0x7b>
  400f98:	b8 ce 00 00 00       	mov    $0xce,%eax
  400f9d:	eb 1f                	jmp    400fbe <phase_3+0x7b>
  400f9f:	b8 aa 02 00 00       	mov    $0x2aa,%eax
  400fa4:	eb 18                	jmp    400fbe <phase_3+0x7b>
  400fa6:	b8 47 01 00 00       	mov    $0x147,%eax
  400fab:	eb 11                	jmp    400fbe <phase_3+0x7b>
  400fad:	e8 88 04 00 00       	callq  40143a <explode_bomb>
  400fb2:	b8 00 00 00 00       	mov    $0x0,%eax
  400fb7:	eb 05                	jmp    400fbe <phase_3+0x7b>
  400fb9:	b8 37 01 00 00       	mov    $0x137,%eax
  400fbe:	3b 44 24 0c          	cmp    0xc(%rsp),%eax #判断输入的第二个数是否等于eax中的数
  400fc2:	74 05                	je     400fc9 <phase_3+0x86>
  400fc4:	e8 71 04 00 00       	callq  40143a <explode_bomb>
  400fc9:	48 83 c4 18          	add    $0x18,%rsp
  400fcd:	c3                   	retq   

Phase 3是一个switch case跳转表结构,根据输入的第一个数计算跳转地址,与然后与相对应的数作比较。

我们可以选择Switch的第一个分支,输入0和0xcf十进制下的表示207即可。

image-20230813173429713