CS100 学习笔记 - C++语言基础部分

Coinred 的手稿们 / 2024-04-24 / 原文

CS100 学习笔记 - C++语言基础部分

记录一些规范和自己不知道的特性。

Lesson 11

什么是C++？

Effective C++ Item 1 (by Scott Meyers): View C++ as a federation of languages.

The easiest way is to view C++ not as a single language but as a federation of related languages ... Fortunately, there are only four:

C.

Object-Oriented C++.

Template C++.

The STL.

C++ 中的 C

C++ 标准库包含了 C 标准库的设施，但并不完全一样。

因为一些历史问题（向后兼容），C 有很多不合理之处，例如 strchr 接受 const char * 却返回 char *，某些本应该是函数的东西被实现为宏。
C 缺乏 C++ 的 function overloading 等机制，因此某些设计显得繁琐。
C++ 的编译期计算能力远远强过 C，例如 <cmath> 里的数学函数自 C++23 起可以在编译时计算。

C++ 的标准库文件 没有后缀名: <iostream> instead of <iostream.h>, <string> instead of <string.h>.
C 的标准库文件 <xxx.h> 在 C++ 中的版本是 <cxxx>，并且所有名字也被引入了 namespace std。

更合理的设计：

bool、true、false 是内置的，不需要额外头文件
逻辑运算符和关系运算符的返回值是 bool 而非 int
"hello" 的类型是 const char [6] 而非 char [6]
字符字面值 'a' 的类型是 char 而非 int
所有有潜在风险的类型转换都不允许隐式发生，不是 warning，而是 error。
由 const int maxn = 100; 声明的 maxn 是编译期常量，可以作为数组大小。
int fun() 不接受参数，而非接受任意参数。

IO stream

std::cin 和 std::cout 是定义在 <iostream> 中的两个对象，分别表示标准输入流和标准输出流。

cppreference IO库 | cppreference IO操纵符

std::cin 和 std::cout 是“对象”而非“函数”。要和学生强调术语的规范性。再比如，“调用”这个词一般只能用于函数， #include <iostream> 并不是在“调用标准库”，使用 a 和 b 的值也并不是在“调用 a 和 b ”。要正确地使用术语，不要自己发明术语，不要羞于使用术语而使用一些表意不明的口头语，不要乱用术语。

namespace `std`

C++ 有一套非常庞大的标准库，为了避免名字冲突，所有的名字（函数、类、类型别名、模板、全局对象等）都在一个名为 std 的命名空间下。

你可以用 using std::cin; 将 cin 引入当前作用域，那么在当前作用域内就可以省略 std::cin 的 std::。
你可以用 using namespace std; 将 std 中的所有名字都引入当前作用域，但这将使得命名空间形同虚设，并且重新引入了名字冲突的风险。（我个人极不推荐，并且我自己从来不写）

CS100 课程中不允许在头文件的全局作用域中使用以上任何一种 using 。

`std::string`

string 类。 #include <string>

定义与初始化

std::string str = "Hello world";                        // 复制初始化，不是赋值
// equivalent: std::string str("Hello world");          // 直接初始化
// equivalent: std::string str{"Hello world"}; (modern) // 直接初始化，不是列表初始化
std::string s1(7, 'a'); // aaaaaaa                      // 直接初始化
std::string s2 = s1; // s2 is a copy of s1              // 复制初始化，不算赋值
std::string s; // "" (empty string)                     // 默认初始化

std::string 的内存：自动管理，自动分配，需要时自动增长，自动释放
使用 std::string 时，关注字符串的内容本身，而非它的实现细节
- 不必再考虑它的内存是怎么管理的，不必考虑末尾是不是有 '\0' 。

运算 & 赋值

可以用 += 与 + 拼接字符串，返回 string 类型。
s1 = s1 + s2 会先为 s1 + s2 构造一个临时对象，必然要拷贝一遍 s1 的内容。
而 s1 += s2 是直接在 s1 后面连接 s2。

std::string hello{"hello"};
std::string s0 = hello + "world";
std::string s1 = "world" + hello;
s0 += "C++";
std::string s3 = hello + "world" + "C++"; //OK，因为 + 是左结合的
                                          // 相当于 (hello + "world") + "C++"

比较：<, <=, >, >=, ==, !=。赋值：=

std::getline(std::cin, s)：从当前位置开始读一行，换行符会读掉，但不会存进来。假如前一次输入恰好停在换行符处， getline 就会读进一个空串。

遍历字符串：基于范围的 `for` 语句

例：输出所有大写字母（std::isupper 在 <cctype> 里）

for (char c : s)
  if (std::isupper(c))
    std::cout << c;
std::cout << std::endl;

等价的方法：使用下标，但不够 modern，比较啰嗦。

for (std::size_t i = 0; i != s.size(); ++i)
  if (std::isupper(s[i]))
    std::cout << s[i];
std::cout << std::endl;

[Best practice] Use range-based for loops. They are modern, clear, simple, generic, and hence more recommended.

你的意图是“遍历该字符串”，而非“创建一个整数并使它从 0 变化到 s.size() ”。

转换

对任意数值类型 x, std::to_string(x) 返回它的字符串形式. See this list.
std::stoi(s), std::stol(s), ...: Extracts the arithmetic value represented by s

Lesson 12

左值和右值

一个表达式在被使用时，有时我们使用的是它代表的对象，有时我们仅仅是使用了那个对象的值。

str[i] = ch 中，我们使用的是表达式 str[i] 所代表的对象。
ch = str[i] 中，我们使用的是表达式 str[i] 所代表的对象的值。

一个表达式本身带有值类别 (value category) 的属性：它要么是左值，要么是右值

左值：它代表了一个实际的对象
右值：它仅仅代表一个值

在 C 中，左值可以放在赋值语句的左侧，右值不能。但在 C++ 中，二者的区别远没有这么简单。

返回左值的表达式：*p, a[i]
特别地：在 C++ 中，前置递增/递减运算符返回左值，++i = 42 是合法的。
赋值表达式返回左值：a = b 的返回值是 a 这个对象（的引用）。
- 赋值运算符右结合，表达式 a = b = c 等价于 a = (b = c)。

右值仅仅代表一个值，不代表一个实际的对象。常见的右值有表达式执行产生的临时对象和字面值。

函数调用 fun() 生成的临时对象是右值。

  std::string fun(); // a function that returns a std::string object
  std::string a = fun();

特别的例外：字符串字面值 "hello" 是左值，它是长期存在于内存中的对象。
- 相比之下，整数字面值 42 仅仅产生一个临时对象，是右值。
通过类型转换生成的临时对象

std::string &r1 = std::string("hello"); // Error
std::string &r2 = "hello"; // Error. This is equivalent to ↑

Functional-style cast expression: Type(args...) ，会生成一个 Type 类型的临时对象。

对于类类型，这会调用一个适当的构造函数（或者类型转换运算符）

例如 std::string(10, 'c'), std::string("hello")

对于内置类型，就是一个普通的拷贝或者类型转换

int(x) 会生成一个 int 类型的临时对象，其值由 x 初始化。

真正的“值类别”

（语言律师需要掌握）

C++ 中的表达式依值类别被划分为如下三种：

英文	中文	has identity?	can be moved from?
lvalue	左值	yes	no
xvalue (expired value)	亡值	yes	yes
prvalue (pure rvalue)	纯右值	no	yes

lvalue + xvalue = glvalue（广义左值），xvalue + prvalue = rvalue（右值）

所以实际上“左值是实际的对象”是不严谨的，右值也可能是实际的对象（xvalue）。之后讲移动的时候我们会见到一个典型的 xvalue 。

引用

引用类型 ReferredType & ，相当于在初始化时将该变量与另一个变量绑定，作为被绑定对象的别名。
引用必须初始化（即在定义时就指明它绑定到谁），并且这个绑定关系不可修改。

References must be bound to existing objects ("lvalues")!
引用绑定到的类型必须是左值（对象类型:普通的变量、数组、指针）。引用不允许绑定临时对象与字面值（非左值）：

int &r1 = 42;    // Error: binding a reference to a literal
int &r2 = 2 + 3; // Error: binding a reference to a temporary object
int a = 10, b = 15;
int &r3 = a + b; // Error: binding a reference to a temporary object

（C++11 引入了所谓的“右值引用”。一般来说，“引用”指的是“左值引用”。）

引用是一个左值。它并不是一个对象，所以你不能创建引用的引用（同样地，指针指向的类型、数组的元素类型也不能是引用）

int ival = 42;
int &ri = ival; // binding `ri` to `ival`.
//int & &rr = ri; // Error! No such thing!
int &ri2 = ri; // Same as `int &ri2 = ival;`.
//int & *pr = &r; // No such thing!
int *pi = &ri; // Same as `int *pi = &ival;`.
int (&ar)[10] = a; //绑定到数组的引用

类似指针有：

int& x = ival, y = ival, z = ival;
// Only `x` is a reference. `y` and `z` are of type `int`.

常量引用 reference-to-`const`

类似于“指向常量的指针”（即带有“底层 const”的指针），我们也有“绑定到常量的引用”。
一个 reference-to-const 自认为自己绑定到 const 对象，所以不允许通过它修改它所绑定的对象的值，也不能让一个不带 const 的引用绑定到 const 对象。（不允许“去除底层 const”）

指针既可以带顶层 const（本身是常量），也可以带底层 const（指向的东西是常量），但引用不谈“顶层 const”。

即，只有“绑定到常量的引用”。引用本身不是对象，不谈是否带 const。
从另一个角度讲，引用本身一定带有“顶层 const”，因为绑定关系不能修改。
在不引起歧义的情况下，通常用常量引用这个词来代表“绑定到常量的引用”。

特殊规则：常量引用可以绑定到右值：

当一个常量引用被绑定到右值时，实际上就是让它绑定到了一个临时对象。
- 这是合理的，反正你也不能通过常量引用修改那个对象的值

Example: Pass by reference-to-`const`

int count_lowercase(std::string &str)

函数在传参时会发生一次赋值产生的拷贝，这是没有必要的（因为不涉及对类的临时修改），所以我们在这里加一个引用规避拷贝。
但是这会产生一个问题，我们不能传入一个表达式（右值）或字符串常量，如 int result = count_lowercase(s1 + s2); 会报错。
（虽然字符串常量是左值，但当我们传递 "Hello" 给 std::string 参数时，实际上发生了一个由 const char [6] 到 std::string 的隐式转换，这个隐式转换产生右值，无法被 std::string& 绑定）

但由于常量引用可以绑任何的值，且同时我们不需要对参数做任何修改，于是参数可以这样定义：

int count_lowercase(const std::string &str)

[Best practice] Pass by reference-to-const if copy is not necessary and the parameter should not be modified.
将参数声明为常量引用，既可以避免拷贝，又可以允许传递右值，也可以传递常量对象，也可以防止你不小心修改了它。（如果仅仅是 int 或者指针这样的内置类型，可以不需要常量引用）

Example: Use references in range-`for`

for (char c : str)
  // ...
//等价于：
for (std::size_t i = 0; i != str.size(); ++i) {
  char c = str[i];
  // ...
}

可见 c 只是 str[i] 的一个拷贝，修改 c 无法修改 str[i] 。所以：

//change all lowercase letters to their uppercase forms
for (char &c : str)
  c = std::toupper(c);
//等价于
for (std::size_t i = 0; i != str.size(); ++i) {
  char &c = str[i];
  c = std::toupper(c); // Same as `str[i] = std::toupper(str[i]);`.
}

引用与指针对比

A reference

is not itself an object. It is an alias of the object that it is bound to.

cannot be rebound to another object after initialization.

has no "default" or "zero" value. It must be bound to an object.

A pointer

is an object that stores the address of the object it points to.

can switch to point to another object at any time.

can be set to a null pointer value nullptr. （C++ 里不要用 NULL）

`std::vector`

std::vector 是一个类模板，只有给出了模板参数之后才成为一个真正的类型。编译器从类模板创建类的过程称为实例化。

std::vector v;               // Error: missing template argument.
std::vector<int> vi;         // An empty vector of `int`s.
std::vector<std::string> vs; // An empty vector of strings.
std::vector<double> vd;      // An empty vector of `double`s.
std::vector<std::vector<int>> vvi; // An empty vector of vector of `int`s.
                                   // "2-d" vector.

初始化

std::vector<int> v{2, 3, 5, 7};     // A vector of `int`s,
                                    // whose elements are {2, 3, 5, 7}.
std::vector<int> v2 = {2, 3, 5, 7}; // Equivalent to ↑

std::vector<std::string> vs{"hello", "world"}; // A vector of strings,
                                    // whose elements are {"hello", "world"}.
std::vector<std::string> vs2 = {"hello", "world"}; // Equivalent to ↑

std::vector<int> v3(10);     // A vector of ten `int`s, all initialized to 0.
std::vector<int> v4(10, 42); // A vector of ten `int`s, all initialized to 42.

vector<T> v(n) 这种构造方式会将 n 个元素都值初始化 (value-initialization)（类似于 C 中的“空初始化”），而不是得到一串 indeterminant value 。（对于类类型来说，“值初始化”几乎就是调用默认构造函数进行初始化。）

创建其他 std::vector 的拷贝:

std::vector<int> v{2, 3, 5, 7};
std::vector<int> v2 = v; // `v2`` is a copy of `v`
std::vector<int> v3(v);  // Equivalent
std::vector<int> v4{v};  // Equivalent

C++17 CTAD

Class Template Argument Deduction：只要你给出了足够的信息，编译器可以自动推导元素的类型。

std::vector v{2, 3, 5, 7};  // vector<int>
std::vector v2{3.14, 6.28}; // vector<double>
std::vector v3(10, 42);     // vector<int>
std::vector v4(10);         // Error: cannot deduce template argument type

怎样算是给出了“足够的信息”？你品。（具体规则细节略去）

成员函数：

v.size() and v.empty() : std::vector 的大小与是否空
v.clear() : 清空 std::vector （不要写愚蠢的 while (!v.empty()) v.pop_back();）
v.push_back(x) ：将元素 x 添加到 v 的末尾
v.pop_back() : 删除 std::vector 最后一个元素
v.back() 和 v.front() : 分别获得最后一个元素、第一个元素的引用。
$\qquad\cdot$ v.back(), v.front(), v.pop_back() 在 v 为空的情况下是 undefined behavior 。

遍历

基于范围的 `for` 语句:

std::vector<std::string> vs = some_strings();
for (const std::string &s : vs) // use reference-to-const to avoid copying
  std::cout << s << std::endl;

下标访问:

可以使用 v[i] 来获得第 i 个元素（i 的有效范围是 $[0,N)$，其中 N = v.size()）

越界访问是未定义行为，并且通常是严重的运行时错误。
std::vector 的下标运算符 v[i] 并不检查越界，目的是为了保证效率。
- 事实上标准库容器的大多数操作（比如刚才的 front, back, pop_back ）都没有边界检查，为了效率。
一种检查越界的下标是 v.at(i)，它会在越界时抛出 std::out_of_range 异常。
- 不妨自己试一试。

STL 的风格

基本操作和低级操作自动执行：

默认初始化，而非不确定的值。
拷贝是自动完成的（Member-wise copy）。
内存管理是自动完成的。

C++ 标准库的各种设施是也是讲究统一性的。（.at(), .front(), .back(), .push_back(x), .pop_back(), .clear() 等函数）（完整列表）

`std::vector` 的增长策略

假设现在有一片动态分配的内存，长度为 i。
当第 i+1 个元素到来时，分配一片长度为 2*i 的内存，将原有的 i 个元素拷贝过来，将新的元素放在后面，释放原来的那片内存
而当第 i+2, i+3, ..., 2*i 个元素到来时，我们不需要分配新的内存，也不需要拷贝任何对象！

假设 $n=2^m$，那么总的拷贝次数就是 $\sum_{i=0}^{m-1}2^i=O(n)$，平均（“均摊”）一次 push_back 的耗时是 $O(1)$（常数），可以接受。