有三台24G的doris服务器,跑一张400M大的表,为什么会一直报内存不足

报错信息:
SQL 错误 [1105] [HY000]: errCode = 2, detailMessage = Memory limit exceeded:<consuming tracker:<Load#Id=192df8629fe4315-b638fec7749a563d>, failed alloc size 128.00 MB, exceeded tracker:<Load#Id=192df8629fe4315-b638fec7749a563d>, limit 8.00 GB, peak used 7.93 GB, current used 7.93 GB>, executing msg:<execute:<ExecNode:VHASH_JOIN_NODE (id=4)>>. backend 172.x.x.x process memory used 7.77 GB, limit 24.21 GB. If query tracker exceed, set exec_mem_limit=8G to change limit, details see be.INFO.

be也会宕机:
执行dmesg -T命令后查看到确实是OOM了
Out of memory: Kill process 13706 (doris_be) score 526 or sacrifice child

你好,看到你的提问信息不太全面,可以参考:提问模版 - 帮助你更快的得到有效帮助 来补充。这样可以方便你的问题更快速的被定位,从而提升被解答的几率哦~

【Doris 使用环境】测试

【Doris 版本】selectdb_doris-1.2.1.1-x86_64-avx2

【问题描述】

  • 报错信息:执行了SQL,对一张数据量为400M的表进行了聚合和join操作。
    报错信息:
    SQL 错误 [1105] [HY000]: errCode = 2, detailMessage = Process has no memory available, cancel top memory usage query: query memory tracker <Query#Id=ea36d749b99b4295-acbfe4037216622c> consumption 9.21 GB, backend 172.x.x.x process memory used 10.23 GB exceed limit 24.21 GB or sys mem available 4.56 GB less than low water mark 1.60 GB. Execute again after enough memory, details see be.INFO.
    后来将数据插入到另一张表当中,最后报错:
    SQL 错误 [1105] [HY000]: errCode = 2, detailMessage = Memory limit exceeded:<consuming tracker:<Load#Id=192df8629fe4315-b638fec7749a563d>, failed alloc size 128.00 MB, exceeded tracker:<Load#Id=192df8629fe4315-b638fec7749a563d>, limit 8.00 GB, peak used 7.93 GB, current used 7.93 GB>, executing msg:<execute:<ExecNode:VHASH_JOIN_NODE (id=4)>>. backend 172.x.x.x process memory used 7.77 GB, limit 24.21 GB. If query tracker exceed, set exec_mem_limit=8G to change limit, details see be.INFO.
  • 具体表现:有时候会导致be宕机:
    执行dmesg -T命令后查看到确实是OOM了
    Out of memory: Kill process 13706 (doris_be) score 526 or sacrifice child

【操作系统】centOS 7.9

【机器配置】包括:CPU核数:8、内存:32、磁盘:200GB

【复现路径】报错后在网上查了一下资料,修改过的参数:max_sys_mem_available_low_water_mark_bytes
disable_auto_compaction
load_process_max_memory_limit_percent
发现都没有作用后,就注释了
【附件】可附加 截图/监控/日志/相关 issue 等进行辅助说明

1 个赞

是否方便给下查询 SQL 语句呢

想要取出一段时间内的第一条数据和最后一条数据:
SELECT t1.time,
t1.n,
t1.minute_time,
d2.v as end_v from (
select max(time) time,
n,
minute_time from dwd_data_2
WHERE DataCalcType=5
group by n,minute_time) as t1
join dwd_data_2 d2 where t1.time = d2.time and t1.n=d2.n
UNION
SELECT t1.time,
t1.n,
t1.minute_time,
d2.v as end_v from (
select min(time) time,
n,
minute_time from dwd_data_2
WHERE DataCalcType=4
group by n,minute_time) as t1
join dwd_data_2 d2 where t1.time = d2.time and t1.n=d2.n;

一般不是join on么,为什么用join where呢 :man_shrugging: