✅ FE、BE单节点docker-compose 部署 重启EB后 节点不可用

Doris 使用环境】测试

【Doris 版本】1.2.2

【问题描述】
docker-compose 部署单节点FE、BE,重启BE后出现问题。

  • 报错信息:
    be.WARNING.log 日志如下:
    failed to report TASK|host=172.50.80.2|port=9020|error=Internal error: Fail to get master client from cache。
    fe.warn.log 如下:
    get bad heartbeat response: type: BACKEND, status: BAD, msg: java.net.ConnectException: Connection refused (Connection refused), beId: 10003, beHost: 172.50.80.3, bePort: 0, httpPort: 0, brpcPort: 0
    2023-04-24 02:22:14,591 WARN (heartbeat-mgr-pool-3|213) [HeartbeatMgr$BackendHeartbeatHandler.call():268] backend heartbeat got exception
    org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)
    at org.apache.thrift.transport.TSocket.open(TSocket.java:226) ~[libthrift-0.13.0.jar:0.13.0]
    at org.apache.doris.common.GenericPool$ThriftClientFactory.create(GenericPool.java:143) ~[doris-fe.jar:1.2-SNAPSHOT]
    at org.apache.doris.common.GenericPool$ThriftClientFactory.create(GenericPool.java:126) ~[doris-fe.jar:1.2-SNAPSHOT]
    at org.apache.commons.pool2.BaseKeyedPooledObjectFactory.makeObject(BaseKeyedPooledObjectFactory.java:62) ~[commons-pool2-2.2.jar:2.2]
    at org.apache.commons.pool2.impl.GenericKeyedObjectPool.create(GenericKeyedObjectPool.java:1012) ~[commons-pool2-2.2.jar:2.2]
    at org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:356) ~[commons-pool2-2.2.jar:2.2]
    at org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:277) ~[commons-pool2-2.2.jar:2.2]
    at org.apache.doris.common.GenericPool.borrowObject(GenericPool.java:95) ~[doris-fe.jar:1.2-SNAPSHOT]
    at org.apache.doris.system.HeartbeatMgr$BackendHeartbeatHandler.call(HeartbeatMgr.java:225) ~[doris-fe.jar:1.2-SNAPSHOT]
    at org.apache.doris.system.HeartbeatMgr$BackendHeartbeatHandler.call(HeartbeatMgr.java:203) ~[doris-fe.jar:1.2-SNAPSHOT]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_342]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_342]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_342]
    at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_342]
    Caused by: java.net.ConnectException: Connection refused (Connection refused)
    at java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:1.8.0_342]
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) ~[?:1.8.0_342]
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) ~[?:1.8.0_342]
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) ~[?:1.8.0_342]
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:1.8.0_342]
    at java.net.Socket.connect(Socket.java:607) ~[?:1.8.0_342]
    at org.apache.thrift.transport.TSocket.open(TSocket.java:221) ~[libthrift-0.13.0.jar:0.13.0]
    … 13 more

  • 具体表现:
    Failed to execute sql: java.sql.SQLException: (conn=4936) errCode = 2, detailMessage = 11051 have no queryable replicas. err: 11052’s backend 10003 does not exist or not alive
    【操作系统】
    CentOS Linux release 8.4.2105
    【机器配置】包括:CPU核数、内存、磁盘
    CPU = 1 ,内存 = 8G , 磁盘 = 40G
    【复现路径】
    重启BE 后就会出现。docker 重启

【附件】可附加 截图/监控/日志/相关 issue 等进行辅助说明
新用户不让上传附件

你的重启是指怎样重启,Stop & Start 还是docker rm & docker run?

同时看看 docker logs be 的日志

docker restart

docker logs be 日志:
[Note] [Entrypoint]: BE is not register. retry.

be没注册上吧,是不是在fe上执行的sysctl -w vm.max_map_count=2000000。重启之后得重新设置,因为这个是临时生效,关机重启之后还得重新设置。

1 个赞