前言 在演示环境测试的时候,发现通过es查的数据偶尔会出现报错的情况,要根据实际情况进行排查和调优。
排查过程 1.问题定位 查看es报错日志,看了一下大致意思是请求数据的时候内存超限了,触发了熔断器。
1 2 3 4 5 [2021-03-16T21:05:10,338] [DEBUG] [o.e.a.a.c.n.i.TransportNodesInfoAction] [java-d-service-es-200-56-client-1] failed to execute on node [hsF4JzeAQ6mflJRGnJIKzQ] org.elasticsearch .transport .RemoteTransportException : [data-es-group-online-200-67-2] [10.110.200.67:9301] [cluster:monitor/nodes/info[n] ] Caused by: org.elasticsearch .common .breaker .CircuitBreakingException : [parent] Data too large, data for [<transport_request>] would be [33093117638/30.8gb] , which is larger than the limit of [31621696716/29.4gb] , real usage: [33093114144/30.8gb] , new bytes reserved: [3494/3.4kb] , usages [request=0/0b, fielddata=0/0b, in_flight_requests=3494/3.4kb, accounting=104564949/99.7mb] at org.elasticsearch .indices .breaker .HierarchyCircuitBreakerService .checkParentLimit (HierarchyCircuitBreakerService.java :342 ) ~[elasticsearch-7.3.2.jar:7.3.2] at ......
拉下es源码,定位报错位置org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService,代码如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 public void checkParentLimit (long newBytesReserved, String label) throws CircuitBreakingException { final MemoryUsage memoryUsed = memoryUsed(newBytesReserved); long parentLimit = this .parentSettings.getLimit(); if (memoryUsed.totalUsage > parentLimit) { this .parentTripCount.incrementAndGet(); final StringBuilder message = new StringBuilder ("[parent] Data too large, data for [" + label + "]" + " would be [" + memoryUsed.totalUsage + "/" + new ByteSizeValue (memoryUsed.totalUsage) + "]" + ", which is larger than the limit of [" + parentLimit + "/" + new ByteSizeValue (parentLimit) + "]" ); if (this .trackRealMemoryUsage) { final long realUsage = memoryUsed.baseUsage; message.append(", real usage: [" ); message.append(realUsage); message.append("/" ); message.append(new ByteSizeValue (realUsage)); message.append("], new bytes reserved: [" ); message.append(newBytesReserved); message.append("/" ); message.append(new ByteSizeValue (newBytesReserved)); message.append("]" ); } else { message.append(", usages [" ); message.append(String.join(", " , this .breakers.entrySet().stream().map(e -> { final CircuitBreaker breaker = e.getValue(); final long breakerUsed = (long )(breaker.getUsed() * breaker.getOverhead()); return e.getKey() + "=" + breakerUsed + "/" + new ByteSizeValue (breakerUsed); }) .collect(Collectors.toList()))); message.append("]" ); } CircuitBreaker.Durability durability = memoryUsed.transientChildUsage >= memoryUsed.permanentChildUsage ? CircuitBreaker.Durability.TRANSIENT : CircuitBreaker.Durability.PERMANENT; throw new CircuitBreakingException (message.toString(), memoryUsed.totalUsage, parentLimit, durability); } }
从代码可以看出,当memoryUsed.totalUsage > parentLimit时,才会出现熔断;parentLimit的值与配置indices.breaker.total.limit(默认值为95%或者70%)有关,它的默认值与indices.breaker.total.use_real_memory(默认值为true)的配置有关,如下代码所示:
1 2 3 4 5 6 7 8 9 10 11 public static final Setting<Boolean> USE_REAL_MEMORY_USAGE_SETTING = Setting.boolSetting("indices.breaker.total.use_real_memory" , true , Property.NodeScope); public static final Setting<ByteSizeValue> TOTAL_CIRCUIT_BREAKER_LIMIT_SETTING = Setting.memorySizeSetting("indices.breaker.total.limit" , settings -> { if (USE_REAL_MEMORY_USAGE_SETTING.get(settings)) { return "95%" ; } else { return "70%" ; } }, Property.Dynamic, Property.NodeScope);
我们再来看看memoryUsed.totalUsage的值,它是该类的一个方法计算出来,代码如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 private MemoryUsage memoryUsed (long newBytesReserved) { long transientUsage = 0 ; long permanentUsage = 0 ; for (CircuitBreaker breaker : this .breakers.values()) { long breakerUsed = (long )(breaker.getUsed() * breaker.getOverhead()); if (breaker.getDurability() == CircuitBreaker.Durability.TRANSIENT) { transientUsage += breakerUsed; } else if (breaker.getDurability() == CircuitBreaker.Durability.PERMANENT) { permanentUsage += breakerUsed; } } if (this .trackRealMemoryUsage) { final long current = currentMemoryUsage(); return new MemoryUsage (current, current + newBytesReserved, transientUsage, permanentUsage); } else { long parentEstimated = transientUsage + permanentUsage; return new MemoryUsage (parentEstimated, parentEstimated, transientUsage, permanentUsage); } }
trackRealMemoryUsage的值(取自该配置indices.breaker.total.use_real_memory)决定了是使用实际的内存使用量还是child circuit breakers的内存使用量来判断熔断; 官方解释如下:
Static setting determining whether the parent breaker should take real memory usage into account (true
) or only consider the amount that is reserved by child circuit breakers (false
). Defaults to true
2.解决方案 可以通过修改es节点配置来解决,给es配置文件elasticsearch.yml添加如下配置后重启节点即可:
1 indices.breaker.total.use_real_memory: false
如果还无法解决,可以尝试增加es的jvm内存,修改jvm.options: