Summary Functions and Maps(pandas学习三)

hisun9 / 2024-10-08 / 原文

Summary functions

reviews()

输出如下:

country	description	designation	points	price	province	region_1	region_2	taster_name	taster_twitter_handle	title	variety	winery
0	Italy	Aromas include tropical fruit, broom, brimston...	Vulkà Bianco	87	NaN	Sicily & Sardinia	Etna	NaN	Kerin O’Keefe	@kerinokeefe	Nicosia 2013 Vulkà Bianco (Etna)	White Blend
1	Portugal	This is ripe and fruity, a wine that is smooth...	Avidagos	87	15.0	Douro	NaN	NaN	Roger Voss	@vossroger	Quinta dos Avidagos 2011 Avidagos Red (Douro)	Portuguese Red
...	...	...	...	...	...	...	...	...	...	...	...	...
129969	France	A dry style of Pinot Gris, this is crisp with ...	NaN	90	32.0	Alsace	Alsace	NaN	Roger Voss	@vossroger	Domaine Marcel Deiss 2012 Pinot Gris (Alsace)	Pinot Gris
129970	France	Big, rich and off-dry, this is powered by inte...	Lieu-dit Harth Cuvée Caroline	90	21.0	Alsace	Alsace	NaN	Roger Voss	@vossroger	Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...	Gewürztraminer

129971 rows × 13 columns

Pandas提供了许多简单的“Summary functions”（不是官方名称），它们以某种有用的方式重组数据。例如，考虑describe()方法：
```
reviews.points.describe()
```
输出如下：
此方法生成给定列属性的高级摘要。它是类型感知的，这意味着其输出会根据输入的数据类型而变化。上面的输出仅对数值数据有意义；对于字符串数据，我们得到的是：
```
reviews.taster_name.describe()
```
输出如下：
如果你想获得关于DataFrame或Series中某一列的特定简单汇总统计信息，通常会有一个有用的 pandas 函数来实现。
- 例如，要查看分数的平均值（例如，平均评级的葡萄酒表现如何），我们可以使用 mean()函数：
```
reviews.points.mean()
```
  输出如下：
- 要查看唯一值列表，我们可以使用 unique()函数。
```
reviews.taster_name.unique()
```
  输出如下：
- 要查看数据集中的唯一值列表以及它们出现的频率，可以使用value_counts()方法：
```
reviews.taster_name.value_counts()
```
  输出如下：

在数学中借用的术语map指的是一种函数，它接收一组值并将其“映射”到另一组值。在数据科学中，我们经常需要从现有数据创建新的表示形式，或者将数据从当前的格式转换为我们以后想要的格式。Maps(映射)就是处理这项工作的工具，因此它们对于完成你的工作极其重要！

有两种映射方法是你会经常使用的。

如果我们想通过在每一行上调用自定义方法来转换整个DataFrame，则apply()是等效的方法。

def remean_points(row):
    row.points = row.points - review_points_mean
    return row

reviews.apply(remean_points, axis='columns')

输出如下：

country	description	designation	points	price	province	region_1	region_2	taster_name	taster_twitter_handle	title	variety	winery
0	Italy	Aromas include tropical fruit, broom, brimston...	Vulkà Bianco	-1.447138	NaN	Sicily & Sardinia	Etna	NaN	Kerin O’Keefe	@kerinokeefe	Nicosia 2013 Vulkà Bianco (Etna)	White Blend
1	Portugal	This is ripe and fruity, a wine that is smooth...	Avidagos	-1.447138	15.0	Douro	NaN	NaN	Roger Voss	@vossroger	Quinta dos Avidagos 2011 Avidagos Red (Douro)	Portuguese Red
...	...	...	...	...	...	...	...	...	...	...	...	...
129969	France	A dry style of Pinot Gris, this is crisp with ...	NaN	1.552862	32.0	Alsace	Alsace	NaN	Roger Voss	@vossroger	Domaine Marcel Deiss 2012 Pinot Gris (Alsace)	Pinot Gris
129970	France	Big, rich and off-dry, this is powered by inte...	Lieu-dit Harth Cuvée Caroline	1.552862	21.0	Alsace	Alsace	NaN	Roger Voss	@vossroger	Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...	Gewürztraminer

129971 rows × 13 columns

如果我们使用 axis='index' 调用 reviews.apply()，那么我们就不需要传递一个函数来转换每一行，而是需要提供一个函数来转换每一列。

请注意，map() 和 apply() 分别返回新的、转换后的 Series 和 DataFrame。它们不会修改调用它们的原始数据。如果我们查看第一行评论，我们可以看到它仍然具有其原始分数值。

reviews.head(1)

输出如下：

country	description	designation	points	price	province	region_1	region_2	taster_name	taster_twitter_handle	title	variety	winery
0	Italy	Aromas include tropical fruit, broom, brimston...	Vulkà Bianco	87	NaN	Sicily & Sardinia	Etna	NaN	Kerin O’Keefe	@kerinokeefe	Nicosia 2013 Vulkà Bianco (Etna)	White Blend