Child channel count was caught attempting to be increased over max concurrency #1803

lbourdages · 2020-04-30T17:20:43Z

I found a log at WARN level that told me to contact the Java SDK team, so here's the bug report.

Describe the bug

We run a Kinesis Client Library application. During what appears to be normal operation, a log printed out telling me to contact the Java SDK team.

Expected Behavior

N/A

Current Behavior

Here's the log in its entirety:

Child channel count was caught attempting to be increased over max concurrency. Please report this issue to the AWS SDK for Java team.

The logger name is "software.amazon.awssdk.http.nio.netty.internal.http2.MultiplexedChannelRecord"

Steps to Reproduce

It happened out of nowhere, I don't know

Possible Solution

N/A

Context

KCL github: https://github.com/awslabs/amazon-kinesis-client

Your Environment

AWS Java SDK version used: 2.11.6
JDK version used: 1.8_212
Operating System and version: Alpine Linux 3.9.4
KCL version: 2.2.10

debora-ito · 2020-05-05T00:57:42Z

Hi @lbourdages thank you for being diligent and reporting this to us. I understand the issue is happening out of nowhere, but can you answer some questions to help us isolate the case so we can try to reproduce it in our side?

Can you provide any logs of the error around the time it happened?
How long the service was running before you saw the error?
Were there any exceptions or interesting behaviors you observed in the hours leading up to or after seeing the error?

lbourdages · 2020-05-07T13:46:03Z

Hi!

Here are a few minutes of logs. Note that I've scrubbed off logs from our business logic but they should not be having any impact on the operation of the Netty internals anyway.

logs.txt

The service had been running for about 25 hours at the time the log happened.

Several hours before, we had two instances of this:
Apr 30, 2020 @ 01:43:18.302 WARN software.amazon.awssdk.http.nio.netty.internal.ResponseHandler Subscriber cancelled before all events were published Apr 29, 2020 @ 22:47:53.531 WARN software.amazon.awssdk.http.nio.netty.internal.ResponseHandler Subscriber cancelled before all events were published

That's all I can find.

Bas83 · 2021-01-26T03:36:46Z

We just experienced this too on version 2.3.3 of the amazon-kinesis-client after upgrading from 2.2.5
We're also seeing fairly frequent logs saying "An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception." with the following stacktrace:

software.amazon.awssdk.http.nio.netty.internal.http2.Http2ConnectionTerminatingException: An I/O error occurred on an associated Http2 stream [id: 0x16f5249b, L:/10.0.2.231:44260 ! R:kinesis.eu-west-1.amazonaws.com/99.80.34.137:443](H2 - 8233) at software.amazon.awssdk.http.nio.netty.internal.http2.Http2StreamExceptionHandler.exceptionCaught(Http2StreamExceptionHandler.java:50) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:273) at software.amazon.awssdk.http.nio.netty.internal.UnusedChannelExceptionHandler.exceptionCaught(UnusedChannelExceptionHandler.java:52) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:273) at io.netty.channel.DefaultChannelPipeline$HeadContext.exceptionCaught(DefaultChannelPipeline.java:1377) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281) at io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:907) at io.netty.handler.codec.http2.AbstractHttp2StreamChannel.windowUpdateFrameWriteComplete(AbstractHttp2StreamChannel.java:118) at io.netty.handler.codec.http2.AbstractHttp2StreamChannel.access$000(AbstractHttp2StreamChannel.java:57) at io.netty.handler.codec.http2.AbstractHttp2StreamChannel$Http2ChannelUnsafe.updateLocalWindowIfNeeded(AbstractHttp2StreamChannel.java:850) at io.netty.handler.codec.http2.AbstractHttp2StreamChannel$Http2ChannelUnsafe.beginRead(AbstractHttp2StreamChannel.java:780) at io.netty.channel.DefaultChannelPipeline$HeadContext.read(DefaultChannelPipeline.java:1362) at io.netty.channel.AbstractChannelHandlerContext.invokeRead(AbstractChannelHandlerContext.java:686) at io.netty.channel.AbstractChannelHandlerContext.read(AbstractChannelHandlerContext.java:671) at software.amazon.awssdk.http.nio.netty.internal.http2.FlushOnReadHandler.read(FlushOnReadHandler.java:40) at io.netty.channel.AbstractChannelHandlerContext.invokeRead(AbstractChannelHandlerContext.java:686) at io.netty.channel.AbstractChannelHandlerContext.read(AbstractChannelHandlerContext.java:671) at software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.requestDemand(HandlerPublisher.java:130) at software.amazon.awssdk.http.nio.netty.internal.nrs.HttpStreamsHandler$1.requestDemand(HttpStreamsHandler.java:191) at software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.flushBuffer(HandlerPublisher.java:345) at software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.receivedDemand(HandlerPublisher.java:291) at software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.access$200(HandlerPublisher.java:61) at software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher$ChannelSubscription$1.run(HandlerPublisher.java:495) at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) at io.netty.util.concurrent.PromiseTask.run(PromiseTask.java:106) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at java.base/java.lang.Thread.run(Thread.java:832)

This happend 4 times the last day. They don't seem to occur at the same time as the other error though.

We're going to roll back to the old version now. Would appreciate any update on this.

debora-ito · 2021-01-26T17:54:15Z

@Bas83 thank you for reporting.

Regarding the Http2ConnectionTerminatingException there's an open issue in the amazon-kinesis-client repository, I suggest you upvote there too: awslabs/amazon-kinesis-client#755

mosche · 2022-02-11T18:08:24Z

I'm facing the same issue when writing a stream of records (batch API) to Kinesis using KinesisAsyncClient (v 2.17.106/127).
Scenario is as follows: The stream is processed by x threads in parallel (Apache Spark tasks), each thread triggers up to y concurrent requests. All threads share one instance of the async client. Currently I'm still testing locally, not even on a cluster.

Setting particularly X, but also Y too high will repeatedly fail the stream within 1 minute. Even setting numRetries unreasonably high does not prevent the issue.

Initially I'm seeing this periodically:

22/02/11 18:28:48 WARN software.amazon.awssdk.http.nio.netty.internal.http2.MultiplexedChannelRecord: [Channel: 11fa7015] Child channel count was caught attempting to be increased over max concurrency. Please report this issue to the AWS SDK for Java team.

soon after followed by:

java.util.concurrent.CompletionException: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: The connection was closed during the request. The request will usually succeed on a retry, but if it does not: consider disabling any proxies you have configured, enabling debug logging, or performing a TCP dump to identify the root cause. If this is a streaming operation, validate that data is being read or written in a timely manner. Channel Information: ChannelDiagnostics(channel=[id: 0x551a2dfd, L:/192.168.178.20:50484 - R:kinesis.eu-central-1.amazonaws.com/3.123.12.193:443](H2 - 7), channelAge=PT0.001S, requestCount=1) Parent Channel Information: ChannelDiagnostics(channel=[id: 0x551a2dfd, L:/192.168.178.20:50484 - R:kinesis.eu-central-1.amazonaws.com/3.123.12.193:443], channelAge=PT10.166S, requestCount=0)
	at software.amazon.awssdk.utils.CompletableFutureUtils.errorAsCompletionException(CompletableFutureUtils.java:62)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncExecutionFailureExceptionReportingStage.lambda$execute$0(AsyncExecutionFailureExceptionReportingStage.java:51)
	at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836)
	at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)

An interesting observation is that for x<=2 even a high y works well.
A higher x, on the other hand, quickly leads to the issue above if y>=2.

mosche · 2022-02-16T14:07:42Z

@debora-ito I tried looking into the issue. The strange thing is, when using a custom build of netty-nio-client (tag 2.17.127, openjdk 1.8.0_322) everything works fine. As soon as I use the jar published to maven I'm reliably running into the issue above. But I don't see any difference except for some build metadata

debora-ito · 2022-02-19T00:45:23Z

@mosche the custom build of netty-nio-client is written by you/your team? How are you checking the differences?

mosche · 2022-02-21T10:01:52Z

@debora-ito All I did was a mvn clean install -pl :netty-nio-client -P quick --am on the original code, though turns out i didn't pay close enough attention to transitive dependencies.

Digging further this turns out to be a classpath issue with different transitive versions of netty. In my case Spark 2.4.7 uses
io.netty:netty-all:4.1.47.Final, which is a fat jar containing all netty io classes of all submodules. netty-all:4.1.72.Final has become an uber jar with dependencies. It looks like this causes incompatible classes to be loaded at runtime, causing the issue.

In my case forcing any consistent version of netty solves the issue.

 io.netty:netty:3.9.9.Final
-io.netty:netty-all:4.1.47.Final
+io.netty:netty-all:4.1.72.Final
 io.netty:netty-buffer:4.1.72.Final
 io.netty:netty-codec:4.1.72.Final
+io.netty:netty-codec-dns:4.1.72.Final
+io.netty:netty-codec-haproxy:4.1.72.Final
 io.netty:netty-codec-http:4.1.72.Final
 io.netty:netty-codec-http2:4.1.72.Final
+io.netty:netty-codec-memcache:4.1.72.Final
+io.netty:netty-codec-mqtt:4.1.72.Final
+io.netty:netty-codec-redis:4.1.72.Final
+io.netty:netty-codec-smtp:4.1.72.Final
+io.netty:netty-codec-socks:4.1.72.Final
+io.netty:netty-codec-stomp:4.1.72.Final
+io.netty:netty-codec-xml:4.1.72.Final
 io.netty:netty-common:4.1.72.Final
 io.netty:netty-handler:4.1.72.Final
+io.netty:netty-handler-proxy:4.1.72.Final
 io.netty:netty-resolver:4.1.72.Final
+io.netty:netty-resolver-dns:4.1.72.Final
+io.netty:netty-resolver-dns-classes-macos:4.1.72.Final
+io.netty:netty-resolver-dns-native-macos:4.1.72.Final
 io.netty:netty-tcnative-classes:2.0.46.Final
 io.netty:netty-transport:4.1.72.Final
 io.netty:netty-transport-classes-epoll:4.1.72.Final
+io.netty:netty-transport-classes-kqueue:4.1.72.Final
+io.netty:netty-transport-native-epoll:4.1.72.Final
+io.netty:netty-transport-native-kqueue:4.1.72.Final
 io.netty:netty-transport-native-unix-common:4.1.72.Final
+io.netty:netty-transport-rxtx:4.1.72.Final
+io.netty:netty-transport-sctp:4.1.72.Final
+io.netty:netty-transport-udt:4.1.72.Final

mosche · 2022-02-21T10:09:39Z

Looks like 4.1.69.Final is the first correct uber jar that doesn't contain classes itself ...

mosche · 2022-02-21T10:59:34Z

@debora-ito Also wondering, is com.typesafe.netty » netty-reactive-streams-http an intentional dependency of netty-nio-client?

debora-ito · 2022-02-21T22:48:56Z

Yes.

aws-sdk-java-v2/http-clients/netty-nio-client/pom.xml

Lines 90 to 93 in 76aef12

    
           <dependency> 
        
               <groupId>com.typesafe.netty</groupId> 
        
               <artifactId>netty-reactive-streams-http</artifactId> 
        
           </dependency>

debora-ito · 2022-10-04T18:15:47Z

@lbourdages @Bas83 we want to investigate this error further, but we need a sample code so we can reproduce in our side.

Do you have self-contained repro code that reliably show the error, using the latest version of the SDK?

Bas83 · 2022-10-06T06:36:35Z

@lbourdages @Bas83 we want to investigate this error further, but we need a sample code so we can reproduce in our side.

Do you have self-contained repro code that reliably show the error, using the latest version of the SDK?

I'm afraid we don't, it's been a while and we've been running on version 2.4.3 for a while now. The only problem with that version that we have (or rather with the 2.4.1 one before that, but we haven't tried again) is that updating to Java 17 (from 16) miraculously makes kinesis skip part of the records while consuming.

debora-ito · 2024-12-24T23:33:14Z

Not a lot of movement around this issue lately, so I'm closing this.

If anyone have a repro code that reproduces the error reliably, please send it to us in a new fresh github issue.

github-actions · 2024-12-24T23:34:17Z

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

lbourdages added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Apr 30, 2020

debora-ito added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. and removed needs-triage This issue or PR still needs to be triaged. labels May 5, 2020

debora-ito removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label May 7, 2020

debora-ito self-assigned this Feb 16, 2022

mosche mentioned this issue Feb 25, 2022

[BEAM-13175] Add KinesisIO.write for AWS SDK v2. apache/beam#16077

Merged

4 tasks

yasminetalby added p2 This is a standard priority issue needs-discussion This issue/PR requires more discussion with community. labels Nov 28, 2022

debora-ito closed this as not planned Won't fix, can't repro, duplicate, stale Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Child channel count was caught attempting to be increased over max concurrency #1803

Child channel count was caught attempting to be increased over max concurrency #1803

lbourdages commented Apr 30, 2020

debora-ito commented May 5, 2020

lbourdages commented May 7, 2020

Bas83 commented Jan 26, 2021 •

edited

Loading

debora-ito commented Jan 26, 2021

mosche commented Feb 11, 2022

mosche commented Feb 16, 2022

debora-ito commented Feb 19, 2022

mosche commented Feb 21, 2022

mosche commented Feb 21, 2022

mosche commented Feb 21, 2022

debora-ito commented Feb 21, 2022

debora-ito commented Oct 4, 2022

Bas83 commented Oct 6, 2022

debora-ito commented Dec 24, 2024

github-actions bot commented Dec 24, 2024

Child channel count was caught attempting to be increased over max concurrency #1803

Child channel count was caught attempting to be increased over max concurrency #1803

Comments

lbourdages commented Apr 30, 2020

Describe the bug

Expected Behavior

Current Behavior

Steps to Reproduce

Possible Solution

Context

Your Environment

debora-ito commented May 5, 2020

lbourdages commented May 7, 2020

Bas83 commented Jan 26, 2021 • edited Loading

debora-ito commented Jan 26, 2021

mosche commented Feb 11, 2022

mosche commented Feb 16, 2022

debora-ito commented Feb 19, 2022

mosche commented Feb 21, 2022

mosche commented Feb 21, 2022

mosche commented Feb 21, 2022

debora-ito commented Feb 21, 2022

debora-ito commented Oct 4, 2022

Bas83 commented Oct 6, 2022

debora-ito commented Dec 24, 2024

github-actions bot commented Dec 24, 2024

Bas83 commented Jan 26, 2021 •

edited

Loading