ClickHouse 的 Parts 和 Partitions
在 ClickHouse 中,磁盘上存储表的部分数据的物理文件称为 "part"。"partition "是使用分区键创建的表数据的逻辑划分。分区可用于提高大型表的性能和可管理性,允许用户更有效地存储、查询和操作数据子集。
Parts
如前所述,"Parts"是磁盘上的物理文件。默认情况下,所有数据相关文件都位于"/var/lib/clickhouse"目录下。ClickHouse 中的每个合并树表都有唯一的存储目录路径来存储 Parts。你可以从 "system.parts "表中获取parts的实际位置、parts名称、分区信息(如果有的话)以及其他一些有价值的信息。
以下是从 system.parts 表查询结果的示例。
SELECT
substr(table, 1, 22),
partition AS prt,
name,
part_type,
path
FROM system.parts
WHERE database = 'mytest'
ORDER BY
table ASC,
partition ASC,
name ASC
Query id: e2f55694-fa55-48f3-99f7-74bca59aca48
┌─substring(table, 1, 22)─┬─prt──────┬─name─────────────────┬─part_type─┬─path──────────────────────────────────────────────────────────────────────────────────────┐
│ bucket_daily_stats_v2 │ tuple() │ all_1_1389_6_695 │ Wide │ /var/lib/clickhouse/store/7b0/7b0e6f30-114d-472d-8ff4-f4c9a4cf7cc1/all_1_1389_6_695/ │
│ corr_pair_value │ tuple() │ all_208611_209409_4 │ Wide │ /var/lib/clickhouse/store/5d7/5d732078-0f2e-4d04-9848-d216a1839b32/all_208611_209409_4/ │
│ corr_pair_value │ tuple() │ all_209410_209634_3 │ Wide │ /var/lib/clickhouse/store/5d7/5d732078-0f2e-4d04-9848-d216a1839b32/all_209410_209634_3/ │
│ corr_pair_value │ tuple() │ all_209635_210331_4 │ Wide │ /var/lib/clickhouse/store/5d7/5d732078-0f2e-4d04-9848-d216a1839b32/all_209635_210331_4/ │
│ mytest_of_ti │ 202408 │ 202408_4241_5001_5 │ Wide │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202408_4241_5001_5/ │
│ mytest_of_ti │ 202408 │ 202408_5002_5008_1 │ Wide │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202408_5002_5008_1/ │
│ mytest_of_ti │ 202408 │ 202408_5009_5045_2 │ Wide │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202408_5009_5045_2/ │
│ mytest_of_ti │ 202408 │ 202408_5046_5046_0 │ Compact │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202408_5046_5046_0/ │
│ mytest_of_ti │ 202409 │ 202409_5013_5624_5 │ Wide │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202409_5013_5624_5/ │
│ mytest_of_ti │ 202409 │ 202409_5625_5661_2 │ Wide │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202409_5625_5661_2/ │
│ mytest_of_ti │ 202409 │ 202409_5662_5699_2 │ Wide │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202409_5662_5699_2/ │
│ mytest_of_ti │ 202409 │ 202409_5700_5708_1 │ Wide │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202409_5700_5708_1/ │
│ mytest_of_ti │ 202409 │ 202409_5710_5793_4 │ Wide │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202409_5710_5793_4/ │
│ mytest_of_ti │ 202409 │ 202409_5794_5794_0 │ Wide │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202409_5794_5794_0/ │
│ mytest_of_ti │ 202409 │ 202409_5795_5795_0 │ Wide │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202409_5795_5795_0/ │
│ mytest_of_ti │ 202410 │ 202410_5703_5810_7 │ Wide │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202410_5703_5810_7/ │
│ mytest_of_ti │ 202410 │ 202410_5811_5849_2 │ Wide │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202410_5811_5849_2/ │
│ mytest_of_ti │ 202410 │ 202410_5850_5882_2 │ Wide │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202410_5850_5882_2/ │
│ mytest_of_ti │ 202410 │ 202410_5883_5888_1 │ Wide │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202410_5883_5888_1/ │
│ mytest_of_ti │ 202410 │ 202410_5889_5894_1 │ Wide │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202410_5889_5894_1/ │
│ mytest_of_ti │ 202410 │ 202410_5895_5901_1 │ Wide │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202410_5895_5901_1/ │
│ mytest_of_ti │ 202410 │ 202410_5902_5902_0 │ Wide │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202410_5902_5902_0/ │
│ mytest_of_ti │ 202410 │ 202410_5903_5903_0 │ Wide │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202410_5903_5903_0/ │
│ mytest_of_ti │ 202410 │ 202410_5904_5904_0 │ Wide │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202410_5904_5904_0/ │
└─────────────────────────┴──────────┴──────────────────────┴───────────┴───────────────────────────────────────────────────────────────────────────────────────────┘
在 "part_type" 列中,"Wide "表示每一列都存储在文件系统中的一个单独文件中,而 "Compact "表示所有列都存储在文件系统的一个文件中。此外,"partition "列中的 "tuple() "表示该表未分区。
也可以在目录“/var/lib/clickhouse/data/<DBNAME>/<TABLENAME>”中查看表的 parts,会发现该目录下存放都是符号链接,通过链接可以查看表包含的parts。比如,进入表 mytest_of_ti 所在目录查看:
# ls -la ...... drwxr-x--- 2 clickhouse clickhouse 4096 Oct 10 15:10 202410_5811_5849_2 drwxr-x--- 2 clickhouse clickhouse 4096 Oct 11 15:10 202410_5850_5882_2 drwxr-x--- 2 clickhouse clickhouse 4096 Oct 11 15:10 202410_5883_5888_1 drwxr-x--- 2 clickhouse clickhouse 4096 Oct 11 15:10 202410_5889_5894_1 drwxr-x--- 2 clickhouse clickhouse 4096 Oct 13 15:10 202410_5895_5901_1 drwxr-x--- 2 clickhouse clickhouse 4096 Oct 13 15:10 202410_5902_5902_0 drwxr-x--- 2 clickhouse clickhouse 4096 Oct 13 15:10 202410_5903_5903_0 drwxr-x--- 2 clickhouse clickhouse 4096 Oct 13 15:10 202410_5904_5904_0 drwxr-x--- 2 clickhouse clickhouse 10 Nov 11 2023 detached -rw-r----- 1 clickhouse clickhouse 1 Nov 11 2023 format_version.txt -rw-r----- 1 clickhouse clickhouse 123 Dec 11 2023 mutation_252.txt -rw-r----- 1 clickhouse clickhouse 96 Dec 11 2023 mutation_259.txt -rw-r----- 1 clickhouse clickhouse 230 Dec 11 2023 mutation_264.txt
可以统计出表mytest_of_ti一共有个part:
# ls -la |grep 20 |wc -l 133
以列出的202410_5850_5882_2 该part为例,其中202410表示分区id;5850表示part中最小的块号;5882表示最大的块号;2表示块的级别(chunk level)。如果没有做分区,part 的名称以all_开头。
也可以从系统表 system.parts 中查看该part 的信息:
SELECT
name,
partition_id,
min_block_number,
max_block_number,
level,
data_version
FROM system.parts
WHERE (database = 'factor_data') AND (table = 'factor_value_of_trading') AND (name = '202410_5850_5882_2')
Query id: e6678b2e-31e2-4a9d-9c9b-16ff8bf9ecf0
┌─name───────────────┬─partition_id─┬─min_block_number─┬─max_block_number─┬─level─┬─data_version─┐
│ 202410_5850_5882_2 │ 202410 │ 5850 │ 5882 │ 2 │ 5850 │
└────────────────────┴──────────────┴──────────────────┴──────────────────┴───────┴──────────────┘
1 row in set. Elapsed: 0.006 sec.
Partitions
与 parts 一样,也可以从 “system.parts ”表中访问合并树表的分区信息。不过,分区列现在不是用 “tuple() ”来表示了。要创建分区表,首先需要在创建表时使用 “PARTITION BY expr ”子句。例如,“PARTITION BY toYYYMMDD(start_time) ”子句创建了一个与 “start_time ”列相关的每日分区。在下面的示例中,可以看到Partitions名和parts名是不同的。分区类似于逻辑上的划分,但 parts是作为物理文件定位的。一个分区可以包含一个或多个 parts。
SELECT
partition,
name,
active
FROM system.parts
WHERE (table = 'mytest_of_ti') AND (database = 'test')
Query id: 9c8c15bc-df95-4713-b001-39447010a1ab
┌─partition─┬─name─────────────────┬─active─┐
│ 201411 │ 201411_1_1_1_264 │ 1 │
│ 201411 │ 201411_2_2_3_264 │ 1 │
│ 201412 │ 201412_3_3_3_264 │ 1 │
│ 201412 │ 201412_4_4_1_264 │ 1 │
│ 201412 │ 201412_5_5_1_264 │ 1 │
│ 202001 │ 202001_6_6_3_264 │ 1 │
│ 202002 │ 202002_7_7_3_264 │ 1 │
│ 202003 │ 202003_8_8_3_264 │ 1 │
│ 202004 │ 202004_9_9_1_264 │ 1 │
│ 202004 │ 202004_10_10_3_264 │ 1 │
...
│ 202410 │ 202410_5934_5934_0 │ 0 │
│ 202410 │ 202410_5935_5935_0 │ 0 │
│ 202410 │ 202410_5936_5936_0 │ 0 │
│ 202410 │ 202410_5937_5937_0 │ 1 │
│ 202410 │ 202410_5938_5938_0 │ 1 │
└───────────┴──────────────────────┴────────┘
167 rows in set. Elapsed: 0.005 sec.
通常,分区是用来提升查询性能。便于用户灵活地管理数据子集。可以直接查询分区、删除分区等。
可以通过指定 where 子句或者使用隐藏列"_partition_id" 来查看指定的分区。当然最好使用官方推荐的方式,在where子句中添加分区条件比较好。 有些特殊情况下,需要使用隐藏列"_partition_id"。
现在,让我们看看分区表的查询示例。假设我们的表(test.mytest_of_ti)对 “created_at ”列进行了分区。我们可以使用分区键列和隐藏的“_partition_id ”列访问特定分区。此外,我们还可以使用“_partition_id ”查询前 10 个分区。
SELECT count() FROM recoDB.mytest_of_ti WHERE toDate(created_at) = '2023-01-21' ┌─count()─┐ │ 9731 │ └─────────┘ ##################################### SELECT count() FROM test.mytest_of_ti WHERE _partition_id = '20230121' ┌─count()─┐ │ 9731 │ └─────────┘
SELECT
_partition_id,
count()
FROM test.mytest_of_ti
GROUP BY _partition_id
ORDER BY 2 DESC
LIMIT 10
卸载或附加Parts/Partitions
detach 操作可以将指定的 parts/partitions 移动到 detached 目录。在重新附加之前,用户无法访问这些数据。缺省情况下,detached 目录位于"/var/lib/clickhouse/data/<DATABASE_NAME>/<TABLE_NAME>"中。
而 attach 操作可以将被卸载的parts/partitions 附加进来。语法如下所示:
#DETACH PART/PARTITION ALTER TABLE <DATABASE_NAME>.<TABLE_NAME> [ON CLUSTER <CLUSTER_NAME>] DETACH PARTITION|PART <PARTITION_EXPRESSION> #ATTACH PART/PARTITION ALTER TABLE <DATABASE_NAME>.<TABLE_NAME> [ON CLUSTER <CLUSTER_NAME>] ATTACH PARTITION|PART <PARTITION_EXPRESSION>
这里的 PARTITION_EXPRESSION 是分区名、分区id或表达式自身。
这里,分区表达式应该是分区名称、分区 id 或表达式本身。
假设我们要从 “test.mytest_of_ti ”表中卸载任何指定日期的分区。
首先,找到指定日期的分区和parts名称。如图所示,该日期位于分区 “20231013 ”中,该分区有两个part。
SELECT
partition,
name,
partition_id
FROM system.parts
WHERE (database = 'test') AND (table = 'mytest_of_ti') AND (partition_id IN (
SELECT DISTINCT _partition_id
FROM test.mytest_of_ti
WHERE toDate(created_at) = '2023-10-13'
))
┌─partition─┬─name─────────────┬─partition_id─┐
│ 20231013 │ 20231013_62_62_0 │ 20231013 │
│ 20231013 │ 20231013_78_78_0 │ 20231013 │
└───────────┴──────────────────┴──────────────┘
开始卸载分区"20231013",然后再附加进来。
-- Count related date before detached SELECT count() FROM test.mytest_of_ti WHERE toDate(created_at) = '2023-10-13' ┌─count()─┐ │ 78536 │ └─────────┘ -- detach partition ALTER TABLE test.mytest_of_ti DETACH PARTITION 20231013 Ok. -- Count related date after detached SELECT count() FROM test.mytest_of_ti WHERE toDate(created_at) = '2023-10-13' ┌─count()─┐ │ 0 │ └─────────┘
然后再重新附加进来:
--attach partition ALTER TABLE test.mytest_of_ti ATTACH PARTITION 20231013 Ok. --Count related date after attach partition SELECT count() FROM test.mytest_of_ti WHERE toDate(created_at) = '2023-10-13' ┌─count()─┐ │ 78536 │ └─────────┘
同样对单个part做如上的操作。
--find parts for given date
SELECT
partition,
name,
partition_id
FROM system.parts
WHERE (database = 'test') AND (table = 'mytest_of_ti') AND (partition_id IN (
SELECT DISTINCT _partition_id
FROM test.mytest_of_ti
WHERE toDate(created_at) = '2023-10-13'
))
┌─partition─┬─name───────────────┬─partition_id─┐
│ 20231013 │ 20231013_142_142_0 │ 20231013 │
│ 20231013 │ 20231013_143_143_0 │ 20231013 │
└───────────┴────────────────────┴──────────────┘
-- Count before detach part operation
SELECT count()
FROM test.mytest_of_ti
WHERE toDate(created_at) = '2023-10-13'
┌─count()─┐
│ 78536 │
└─────────┘
-- detach any of the parts
ALTER TABLE test.mytest_of_ti DETACH PART '20231013_142_142_0'
Ok.
-- Count after detach part operation
SELECT count()
FROM test.mytest_of_ti
WHERE toDate(created_at) = '2023-10-13'
┌─count()─┐
│ 78406 │
└─────────┘
-- Attach related part
ALTER TABLE test.mytest_of_ti ATTACH PART '20231013_142_142_0'
-- Count after attach the part
SELECT count()
FROM test.mytest_of_ti
WHERE toDate(created_at) = '2023-10-13'
┌─count()─┐
│ 78536 │
└─────────┘
也可以将源表的分区附加到目标表上,如下例所示,附加后数据不会从源表或目标表被删除:
ALTER TABLE <DESTINATION_TABLE> [ON CLUSTER <CLUSTER_NAME>] ATTACH PARTITION <PARTITION EXPRESSION> FROM <SOURCE_TABLE>
不过,该操作需要目标表和源表满足以下条件:
·结构相同
·分区键、排序键和主键相同
·存储策略相同
删除Parts/Partitions
删除的parts/Partitions 在系统表 system.parts 中被标记为不活跃的,且删除操作完成后,对应的parts/Partitions 会在 "/var/lib/clickhouse/data/<database_name>/<table_name>/" 中保留十分钟。
-- Drop from table itself ALTER TABLE <DATABASE_NAME>.<TABLE_NAME> [ON CLUSTER <CLUSTER_NAME>] DROP PARTITION|PART 'PART/PARTITON EXPRESSION' -- Remove specified part/partition from detached folder ALTER TABLE <DATABASE_NAME>.<TABLE_NAME> [ON CLUSTER <CLUSTER_NAME>] DROP DETACHED PARTITION|PART 'PART/PARTITON EXPRESSION'
移动Parts/Partitions
可以将partitions移动到 Clickhouse 中的另一个表。在这种情况下,源表和目标表都具有相同的结构、分区键、按键排序、主键、存储策略和引擎系列。
对于合并树引擎表,移动的另一个选项是将parts或partitions移动到另一个磁盘或卷上。
-- Usage: ALTER TABLE <DATABASE_NAME.SOURCE_TABLE> [ON CLUSTER <CLUSTER_NAME>] MOVE PARTITION <PARTITION EXPRESSION> TO TABLE <DATABASE_NAME.DESTINATION_TABLE>
-- 1. Create empty mytest_of_ti_new table same as the mytest_of_ti
-- 2. Move 20231013 partition from mytest_of_ti to mytest_of_ti_new
ALTER TABLE test.mytest_of_ti
MOVE PARTITION '20231013' TO TABLE test.mytest_of_ti_new
Ok.
--3. Take partition count for 20231013 in source table
SELECT count()
FROM test.mytest_of_ti
WHERE _partition_id = '20231013'
┌─count()─┐
│ 0 │
└─────────┘
--4 Take partition count for 20231013 in destination table
SELECT count()
FROM test.mytest_of_ti_new
WHERE _partition_id = '20231013'
Query id: 49319c73-c84f-4fbf-838d-ab787971eaad
┌─count()─┐
│ 78536 │
└─────────┘
移动 parts/partitions 到其它的磁盘或卷,需要设置一个存储策略,并使用创建的策略创建一个新表。这里不再讲存储策略。
-- Our table's(mytest_of_ti_with_storage_policy) policy is "vo1_to_vo2"
-- and this volume contain 2 disks(vo1 and vo2)
SELECT *
FROM system.storage_policies
WHERE policy_name = (
SELECT storage_policy
FROM system.tables
WHERE (database = 'test') AND (name = 'mytest_of_ti_with_storage_policy')
┌─policy_name─┬─volume_name─┬─volume_priority─┬─disks───┬─volume_type─┬─max_data_part_size─┬─move_factor─┬─prefer_not_to_merge─┐
│ vo1_to_vo2 │ vo1_volume │ 1 │ ['vo1'] │ JBOD │ 0 │ 0.1 │ 0 │
│ vo1_to_vo2 │ vo2_volume │ 2 │ ['vo2'] │ JBOD │ 0 │ 0.1 │ 0 │
└─────────────┴─────────────┴─────────────────┴─────────┴─────────────┴────────────────────┴─────────────┴─────────────────────┘
-- The 20231013 partition is stored under vo1 disk.
SELECT
partition,
name,
path
FROM system.parts
WHERE (database = 'test') AND (table = 'mytest_of_ti_with_storage_policy') AND (partition = '20231013')
┌─partition─┬─name───────────────┬─path────────────────────────────────────────────────────────────────────┐
│ 20231013 │ 20231013_107_107_0 │ /vo1/store/346/3465ekb6-231d-493d-94bc-518234d79b45/20231013_107_107_0/ │
└───────────┴────────────────────┴─────────────────────────────────────────────────────────────────────────┘
-- Let's move it to the vo2 disk
ALTER TABLE test.mytest_of_ti_with_storage_policy
MOVE PARTITION '20231013' TO DISK 'vo2'
-Check the new volume
SELECT
partition,
name,
path
FROM system.parts
WHERE (database = 'test') AND (table = 'mytest_of_ti_with_storage_policy') AND (partition = '20231013')
┌─partition─┬─name───────────────┬─path────────────────────────────────────────────────────────────────────┐
│ 20231013 │ 20231013_107_107_0 │ /vo2/store/346/3465ekb6-231d-493d-94bc-518234d79b45/20231013_107_107_0/ │
└───────────┴────────────────────┴─────────────────────────────────────────────────────────────────────────┘
-- You can move parts and also you can move volume instead of disk
