优化 PostgreSQL 中的一个 JSON 查询

admin

2025年9月20日 0:4 本文热度 1069

关系型数据库引入 JSON 支持后，出现了一些更高复杂度的 SQL 查询。

多年来，对象关系映射（ORM）工具已将查询生成流程标准化，让开发人员得以专注于优化少数真正需要调整的查询。但这些工具有时会生成过于复杂的 SQL，包含过多连接操作和业务逻辑，使得调优工作如同噩梦。开发者往往难以判断何时该将一个查询拆分为多个小型查询。而 JSON 数据类型的使用进一步加剧了这种复杂性。

ORM 生成的低效查询示例

以下是一个由 ORM 生成的复杂 SQL 查询的精简版本。由于其执行时间超过 90 分钟，不符合业务需求，我们需要对其进行调优。

WITH result_list AS (
    SELECT o.id,
           o.data AS order_data,
           ARRAY_AGG(DISTINCT cr.id) AS customer_refs_id,
           ARRAY_AGG(DISTINCT cr.data) AS customer_refs_data,
           ARRAY_AGG(DISTINCT cp.policy_id) AS customer_policy_id,
           ROW_NUMBER() OVER () AS rowid
    FROM orders o
    LEFT JOIN customer_refs cr ON cr.data->>'id' = o.data->'customer'->>'id'
    LEFT JOIN customer_policies cp ON cp.policy_data->>'id' = o.data->'customer'->>'id'
        AND cp.policy_data->>'region' = ANY (ARRAY['GLOBAL', 'REGION_A', 'REGION_B', 'REGION_C', 'REGION_D'])
        AND (
            (
                cp.policy_data->>'status' IS NOT NULL
                AND cp.policy_data->>'status' NOT IN ('inactive', 'blocked', 'suspended')
            )
            AND (
                (
                    cp.policy_data->>'customer_type' IS NOT NULL
                    AND cp.policy_data->>'customer_type' = 'issuer'
                )
                OR (
                    cp.policy_data->>'customer_type' IS NOT NULL
                    AND cp.policy_data->>'customer_type' IN ('type1', 'type2', 'type3')
                )
                OR (
                    get_text_from_path(cp.policy_data->'shared_groups', '$[*]', 'NONE') IS NOT NULL
                    AND get_text_from_path(cp.policy_data->'shared_groups', '$[*]', 'NONE') && ARRAY['GROUP_1']
                )
            )
        )
    WHERE EXISTS (
        SELECT 1
        FROM account_policies ap
        WHERE ap.policy_id = ARRAY_TO_STRING(ARRAY[
            o.data->'account'->'id'->>'system_ref',
            o.data->'account'->'id'->>'type_ref',
            o.data->'account'->'id'->>'region',
            o.data->'account'->'id'->>'id'], '-')
        AND ap.policy_data->>'region' = ANY (ARRAY['GLOBAL', 'REGION_A', 'REGION_B', 'REGION_C', 'REGION_D'])
        AND ap.policy_data->>'status' NOT IN ('inactive', 'blocked', 'suspended')
        AND ap.policy_data->>'contract' != 'prospect'
        AND (
            ap.policy_data->>'account_type' IN ('typeX', 'typeY', 'typeZ')
            OR get_text_from_path(ap.policy_data->'shared_groups', '$[*]', 'NONE') && ARRAY['GROUP_1']
        )
    )
    AND (
        (
            o.data->>'order_type' = 'MONEY_ORDER'
            AND (
                o.data->'close_date'->>'value' > '2024-09-10'
                OR o.data->'close_date'->>'value' IS NULL
                OR o.data->'close_date'->>'value' = ''
            )
        )
        OR (
            o.data->>'order_type' != 'MONEY_ORDER'
            AND COALESCE(
                NULLIF(o.data->'valuation'->'quantity'->>'value', '')::DECIMAL,
                0
            ) != 0
        )
    )
    AND (
        LOWER(o.data->>'display_name') LIKE '%current%'
        OR LOWER(o.data->'product'->'item'->'item_name'->>'abbreviation') LIKE '%current%'
        OR LOWER(o.data->'product'->'item'->'item_name'->>'full') LIKE '%current%'
        OR LOWER(o.data->'product'->'item'->>'identifier') LIKE '%current%'
        OR LOWER(o.data->'product'->'issuer'->>'display_name') LIKE '%current%'
        OR get_text_for_search_from_path(o.data->'product'->'underlying', '$[*].item.item_name.abbreviation', 'LOWER') LIKE '%current%'
    )
    GROUP BY o.id, o.data
    ORDER BY o.id
),
ordered_list AS (
    SELECT *
    FROM result_list
    ORDER BY rowid
    LIMIT 23
)
SELECT *
FROM ordered_list
ORDER BY rowid;

查询的核心特征：

• 多连接操作：查询包含多个LEFT JOIN，且ON条件涉及复杂的 JSON 字段操作。
• 嵌套 JSON 访问：频繁使用 JSON 运算符（->、->>）访问深层嵌套的字段。
• 自定义函数：使用get_text_from_path和get_text_for_search_from_path等函数，这类函数可能未优化性能。
• 复杂 WHERE 条件：WHERE子句包含多个嵌套条件和子查询，且常涉及 JSON 字段。

查询低效的原因

1. 在大型数据集中，使用运算符访问深层嵌套的 JSON 字段速度较慢；若未正确建立索引，会导致全表扫描，大幅增加执行时间。
2. 查询对 JSON 字段执行大量的LOWER()和LIKE操作，计算成本高，且会导致索引无法生效。
3. 由于条件的写法问题，JSON 字段上的索引可能无法被利用。
4. LEFT JOIN的条件中包含函数和 JSON 字段比较，导致执行计划效率低下。
5. EXISTS子查询进一步增加了执行复杂性，可能引发全表扫描。
6. ORM 通常会生成通用性 SQL，未考虑特定数据库的优化方式。
7. ORM 的抽象特性可能导致冗余或不必要的条件与连接操作。

糟糕的执行计划

Limit  (cost=292646835.98..292646835.99 rows=1 width=1813) (actual time=5504844.738..5504844.745 rows=23 loops=1)
  Buffers: shared hit=3622772239 read=7862262 dirtied=3639
  ->  Sort  (cost=292646835.98..292646835.99 rows=1 width=1813) (actual time=5502255.883..5502255.889 rows=23 loops=1)
        Sort Key: (row_number() OVER (?))
        Sort Method: quicksort  Memory: 114kB
        Buffers: shared hit=3622772239 read=7862262 dirtied=3639
        ->  WindowAgg  (cost=292646835.93..292646835.97 rows=1 width=1813) (actual time=5502254.506..5502255.810 rows=43 loops=1)
              Buffers: shared hit=3622772236 read=7862262 dirtied=3639
              ->  GroupAggregate  (cost=292646835.93..292646835.96 rows=1 width=1805) (actual time=5502253.088..5502253.249 rows=43 loops=1)
                    Group Key: o.id
                    Buffers: shared hit=3622772236 read=7862262 dirtied=3639
                    ->  Sort  (cost=292646835.93..292646835.93 rows=1 width=1930) (actual time=5502244.952..5502244.964 rows=43 loops=1)
                          Sort Key: o.id, cr.id
                          Sort Method: quicksort  Memory: 71kB
                          Buffers: shared hit=3622772236 read=7862262 dirtied=3639
                          ->  Nested Loop  (cost=0.98..292646835.92 rows=1 width=1930) (actual time=2015030.222..5502243.020 rows=43 loops=1)
                                Join Filter: (array_to_string(ARRAY[...]) = ap.policy_id)
                                Rows Removed by Join Filter: 246198062
                                Buffers: shared hit=3622772236 read=7862262 dirtied=3639
---
 
"Planning:"
"  Buffers: shared hit=649 read=2"
"Planning Time: 133.626 ms"
"JIT:"
"  Functions: 32"
"  Options: Inlining true, Optimization true, Expressions true, Deforming true"
"  Timing: Generation 226.277 ms, Inlining 631.748 ms, Optimization 801.960 ms, Emission 1156.811 ms, Total 2816.797 ms"
"Execution Time: 5505139.477 ms"

执行计划观察结果：

执行时间：约 90 分钟

缓冲区使用：

• 共享命中（shared hit）：约 36 亿次
• 读取（read）：约 780 万次
• 修改（dirtied）：约 3639 次

核心问题：

• 连接与条件中的函数：导致索引无法有效使用。
• 嵌套循环（Nested Loop）：逐行处理效率低下。
• 排序操作：因未使用索引，排序成本极高。
• 全表扫描：连接筛选器过滤掉大量行，说明存在大量全表扫描。

优化查询

让我们来对查询做如下修改，以帮助优化器使用索引：

1. 建立索引并创建生成列（generated column），避免在WHERE子句中对字段应用函数。
2. 在可能的情况下，将LEFT JOIN改为INNER JOIN，以便在查询执行早期缩小数据集范围。
3. 重写部分条件，避免对 JSON 字段使用函数。

ALTER TABLE orders ADD COLUMN customer_id TEXT GENERATED ALWAYS AS (data->'customer'->>'id') STORED;
ALTER TABLE orders ADD COLUMN order_type TEXT GENERATED ALWAYS AS (data->>'order_type') STORED;
ALTER TABLE orders ADD COLUMN display_name_lower TEXT GENERATED ALWAYS AS (LOWER(data->>'display_name')) STORED;
CREATE INDEX idx_orders_customer_id ON orders (customer_id);
CREATE INDEX idx_orders_order_type ON orders (order_type);
CREATE INDEX idx_orders_display_name_lower ON orders (display_name_lower);

WITH result_list AS (
    SELECT o.id,
           o.data AS order_data,
           ARRAY_AGG(DISTINCT cr.id) AS customer_refs_id,
           ARRAY_AGG(DISTINCT cr.data) AS customer_refs_data,
           ARRAY_AGG(DISTINCT cp.policy_id) AS customer_policy_id,
           ROW_NUMBER() OVER () AS rowid
    FROM orders o
    INNER JOIN customer_refs cr ON cr.data->>'id' = o.customer_id
    INNER JOIN customer_policies cp ON cp.policy_data->>'id' = o.customer_id
        AND cp.policy_data->>'region' = ANY (ARRAY['GLOBAL', 'REGION_A', 'REGION_B', 'REGION_C', 'REGION_D'])
        AND cp.policy_data->>'status' NOT IN ('inactive', 'blocked', 'suspended')
        AND (
            cp.policy_data->>'customer_type' = 'issuer'
            OR cp.policy_data->>'customer_type' IN ('type1', 'type2', 'type3')
            OR get_text_from_path(cp.policy_data->'shared_groups', '$[*]', 'NONE') && ARRAY['GROUP_1']
        )
    WHERE EXISTS (
        SELECT 1
        FROM account_policies ap
        WHERE ap.policy_id = ARRAY_TO_STRING(ARRAY[
            o.data->'account'->'id'->>'system_ref',
            o.data->'account'->'id'->>'type_ref',
            o.data->'account'->'id'->>'region',
            o.data->'account'->'id'->>'id'], '-')
        AND ap.policy_data->>'region' = ANY (ARRAY['GLOBAL', 'REGION_A', 'REGION_B', 'REGION_C', 'REGION_D'])
        AND ap.policy_data->>'status' NOT IN ('inactive', 'blocked', 'suspended')
        AND ap.policy_data->>'contract' != 'prospect'
        AND (
            ap.policy_data->>'account_type' IN ('typeX', 'typeY', 'typeZ')
            OR get_text_from_path(ap.policy_data->'shared_groups', '$[*]', 'NONE') && ARRAY['GROUP_1']
        )
    )
    AND o.order_type = 'MONEY_ORDER'
    AND (o.data->'close_date'->>'value' > '2024-09-10' OR o.data->'close_date'->>'value' IS NULL OR o.data->'close_date'->>'value' = '')
    AND o.display_name_lower LIKE '%current%'
    GROUP BY o.id, o.data
    ORDER BY o.id
)
SELECT *
FROM result_list
ORDER BY rowid
LIMIT 23;

更新后的执行计划分析

Limit  (cost=9013972.46..9013972.46 rows=1 width=1789) (actual time=175757.446..175757.449 rows=0 loops=1)
  Buffers: shared hit=20716139 read=836484 dirtied=1676
  ->  Sort  (cost=9013972.46..9013972.46 rows=1 width=1789) (actual time=174722.049..174722.052 rows=0 loops=1)
        Sort Key: (row_number() OVER (?))
        Sort Method: quicksort  Memory: 25kB
        Buffers: shared hit=20716139 read=836484 dirtied=1676
        ->  WindowAgg  (cost=9013972.40..9013972.45 rows=1 width=1789) (actual time=174722.010..174722.013 rows=0 loops=1)
              Buffers: shared hit=20716136 read=836484 dirtied=1676
              ->  GroupAggregate  (cost=9013972.40..9013972.44 rows=1 width=1781) (actual time=174722.008..174722.011 rows=0 loops=1)
                    Group Key: positions.id
                    Buffers: shared hit=20716136 read=836484 dirtied=1676
                    ->  Sort  (cost=9013972.40..9013972.41 rows=1 width=1906) (actual time=174722.007..174722.009 rows=0 loops=1)
                          Sort Key: positions.id, client_refs.id
                          Sort Method: quicksort  Memory: 25kB
                          Buffers: shared hit=20716136 read=836484 dirtied=1676
                          ->  Nested Loop  (cost=1.41..9013972.39 rows=1 width=1906) (actual time=174722.001..174722.004 rows=0 loops=1)
                                Buffers: shared hit=20716136 read=836484 dirtied=1676
                                ->  Nested Loop Left Join  (cost=0.99..9013961.75 rows=1 width=1933) (actual time=184.358..170945.266 rows=8554 loops=1)
                                      Buffers: shared hit=20683249 read=835147 dirtied=1672
                                      ->  Nested Loop Left Join  (cost=0.56..9013950.91 rows=1 width=1931) (actual time=138.461..146595.622 rows=8554 loops=1)
                                            Join Filter: (client_refs.id = pv.client_id)
                                            Rows Removed by Join Filter: 328,884,138
                                            Buffers: shared hit=20655467 read=828553 dirtied=1153
                                            ->  Nested Loop  (cost=0.56..9008615.75 rows=1 width=1736) (actual time=33.568..75336.287 rows=8554 loops=1)
                                                  Buffers: shared hit=27,870 read=824,707 dirtied=851
                                                  ->  Seq Scan on positions_view pv  (cost=0.00..9008612.97 rows=1 width=78) (actual time=16.384..12504.255 rows=8554 loops=1)
                                                        Filter: ...
                                                        Rows Removed by Filter: 32,144,940
                                                        Buffers: shared hit=7 read=809,465
                                                  ->  Index Scan using positions_pkey on positions positions  (cost=0.56..2.78 rows=1 width=1685) (actual time=7.327..7.327 rows=1 loops=8554)
                                                        Index Cond: (id = pv.id)
                                                        Buffers: shared hit=27,550 read=15,242 dirtied=602
                                            ->  Seq Scan on client_refs  (cost=0.00..4513.96 rows=65,696 width=195) (actual time=0.004..5.734 rows=38,449 loops=8554)
                                                  Buffers: shared hit=20,627,597 read=3,846 dirtied=302
                                      ->  Index Scan using clients_policy_pkey on clients_policy  (cost=0.43..10.84 rows=1 width=26) (actual time=2.841..2.841 rows=0 loops=8554)
                                            Index Cond: (policy_id = pv.client_id)
                                            Filter: ...
                                            Rows Removed by Filter: 1
                                            Buffers: shared hit=27,782 read=6,594 dirtied=519
                                ->  Index Scan using idx_btree_portfolios_policy_id on portfolios_policy  (cost=0.42..10.64 rows=1 width=28) (actual time=0.439..0.439 rows=0 loops=8554)
                                      Index Cond: (policy_id = pv.portfolio_id)
                                      Filter: ...
                                      Rows Removed by Filter: 1
                                      Buffers: shared hit=32,887 read=1,337 dirtied=4
                                       
---

"Planning:"
"  Buffers: shared hit=954 read=78 dirtied=9"
"Planning Time: 130.627 ms"
"JIT:"
"  Functions: 33"
"  Options: Inlining true, Optimization true, Expressions true, Deforming true"
"  Timing: Generation 5.483 ms, Inlining 81.013 ms, Optimization 530.635 ms, Emission 423.850 ms, Total 1040.982 ms"
"Execution Time: 175799.036 ms"

执行时间：不到 3 分钟

缓冲区使用：

• 共享命中：约 2070 万次
• 读取：约 836,484 次
• 修改：约 1,676 次

优化点：

• 高效连接：使用INNER JOIN在早期缩小数据集范围。
• 索引列：通过索引实现快速数据检索，避免全表扫描。
• 简化条件：移除WHERE子句中的部分函数操作，让索引得以生效。
• 减少缓冲区操作：大幅降低 I/O 开销。

与之前执行计划的对比：尽管仍需处理大量行，但优化后的查询执行速度显著提升。

阅读原文：原文链接

该文章在 2025/9/20 10:55:05 编辑过

关键字查询

一个

SQL

优化

Postgre

正在查询...

点晴ERP是一款针对中小制造业的专业生产管理软件系统,系统成熟度和易用性得到了国内大量中小企业的青睐。

点晴PMS码头管理系统主要针对港口码头集装箱与散货日常运作、调度、堆场、车队、财务费用、相关报表等业务管理，结合码头的业务特点，围绕调度、堆场作业而开发的。集技术的先进性、管理的有效性于一体，是物流码头及其他港口类企业的高效ERP管理信息系统。

点晴WMS仓储管理系统提供了货物产品管理,销售管理,采购管理,仓储管理,仓库管理,保质期管理,货位管理,库位管理,生产管理,WMS管理系统,标签打印,条形码,二维码管理,批号管理软件。

点晴免费OA是一款软件和通用服务都免费，不限功能、不限时间、不限用户的免费OA协同办公管理系统。