Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Child channel count was caught attempting to be increased over max concurrency #1803

Closed
lbourdages opened this issue Apr 30, 2020 · 15 comments
Closed
Assignees
Labels
bug This issue is a bug. needs-discussion This issue/PR requires more discussion with community. p2 This is a standard priority issue

Comments

@lbourdages
Copy link

I found a log at WARN level that told me to contact the Java SDK team, so here's the bug report.

Describe the bug

We run a Kinesis Client Library application. During what appears to be normal operation, a log printed out telling me to contact the Java SDK team.

Expected Behavior

N/A

Current Behavior

Here's the log in its entirety:

Child channel count was caught attempting to be increased over max concurrency. Please report this issue to the AWS SDK for Java team.

The logger name is "software.amazon.awssdk.http.nio.netty.internal.http2.MultiplexedChannelRecord"

Steps to Reproduce

It happened out of nowhere, I don't know

Possible Solution

N/A

Context

KCL github: https://github.com/awslabs/amazon-kinesis-client

Your Environment

  • AWS Java SDK version used: 2.11.6
  • JDK version used: 1.8_212
  • Operating System and version: Alpine Linux 3.9.4
  • KCL version: 2.2.10
@lbourdages lbourdages added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Apr 30, 2020
@debora-ito
Copy link
Member

Hi @lbourdages thank you for being diligent and reporting this to us. I understand the issue is happening out of nowhere, but can you answer some questions to help us isolate the case so we can try to reproduce it in our side?

  • Can you provide any logs of the error around the time it happened?
  • How long the service was running before you saw the error?
  • Were there any exceptions or interesting behaviors you observed in the hours leading up to or after seeing the error?

@debora-ito debora-ito added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. and removed needs-triage This issue or PR still needs to be triaged. labels May 5, 2020
@lbourdages
Copy link
Author

Hi!

Here are a few minutes of logs. Note that I've scrubbed off logs from our business logic but they should not be having any impact on the operation of the Netty internals anyway.

logs.txt

The service had been running for about 25 hours at the time the log happened.

Several hours before, we had two instances of this:
Apr 30, 2020 @ 01:43:18.302 WARN software.amazon.awssdk.http.nio.netty.internal.ResponseHandler Subscriber cancelled before all events were published Apr 29, 2020 @ 22:47:53.531 WARN software.amazon.awssdk.http.nio.netty.internal.ResponseHandler Subscriber cancelled before all events were published

That's all I can find.

@debora-ito debora-ito removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label May 7, 2020
@Bas83
Copy link

Bas83 commented Jan 26, 2021

We just experienced this too on version 2.3.3 of the amazon-kinesis-client after upgrading from 2.2.5
We're also seeing fairly frequent logs saying "An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception." with the following stacktrace:

software.amazon.awssdk.http.nio.netty.internal.http2.Http2ConnectionTerminatingException: An I/O error occurred on an associated Http2 stream [id: 0x16f5249b, L:/10.0.2.231:44260 ! R:kinesis.eu-west-1.amazonaws.com/99.80.34.137:443](H2 - 8233) at software.amazon.awssdk.http.nio.netty.internal.http2.Http2StreamExceptionHandler.exceptionCaught(Http2StreamExceptionHandler.java:50) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:273) at software.amazon.awssdk.http.nio.netty.internal.UnusedChannelExceptionHandler.exceptionCaught(UnusedChannelExceptionHandler.java:52) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:273) at io.netty.channel.DefaultChannelPipeline$HeadContext.exceptionCaught(DefaultChannelPipeline.java:1377) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281) at io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:907) at io.netty.handler.codec.http2.AbstractHttp2StreamChannel.windowUpdateFrameWriteComplete(AbstractHttp2StreamChannel.java:118) at io.netty.handler.codec.http2.AbstractHttp2StreamChannel.access$000(AbstractHttp2StreamChannel.java:57) at io.netty.handler.codec.http2.AbstractHttp2StreamChannel$Http2ChannelUnsafe.updateLocalWindowIfNeeded(AbstractHttp2StreamChannel.java:850) at io.netty.handler.codec.http2.AbstractHttp2StreamChannel$Http2ChannelUnsafe.beginRead(AbstractHttp2StreamChannel.java:780) at io.netty.channel.DefaultChannelPipeline$HeadContext.read(DefaultChannelPipeline.java:1362) at io.netty.channel.AbstractChannelHandlerContext.invokeRead(AbstractChannelHandlerContext.java:686) at io.netty.channel.AbstractChannelHandlerContext.read(AbstractChannelHandlerContext.java:671) at software.amazon.awssdk.http.nio.netty.internal.http2.FlushOnReadHandler.read(FlushOnReadHandler.java:40) at io.netty.channel.AbstractChannelHandlerContext.invokeRead(AbstractChannelHandlerContext.java:686) at io.netty.channel.AbstractChannelHandlerContext.read(AbstractChannelHandlerContext.java:671) at software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.requestDemand(HandlerPublisher.java:130) at software.amazon.awssdk.http.nio.netty.internal.nrs.HttpStreamsHandler$1.requestDemand(HttpStreamsHandler.java:191) at software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.flushBuffer(HandlerPublisher.java:345) at software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.receivedDemand(HandlerPublisher.java:291) at software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.access$200(HandlerPublisher.java:61) at software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher$ChannelSubscription$1.run(HandlerPublisher.java:495) at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) at io.netty.util.concurrent.PromiseTask.run(PromiseTask.java:106) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at java.base/java.lang.Thread.run(Thread.java:832)

This happend 4 times the last day. They don't seem to occur at the same time as the other error though.

We're going to roll back to the old version now. Would appreciate any update on this.

@debora-ito
Copy link
Member

@Bas83 thank you for reporting.

Regarding the Http2ConnectionTerminatingException there's an open issue in the amazon-kinesis-client repository, I suggest you upvote there too: awslabs/amazon-kinesis-client#755

@mosche
Copy link

mosche commented Feb 11, 2022

I'm facing the same issue when writing a stream of records (batch API) to Kinesis using KinesisAsyncClient (v 2.17.106/127).
Scenario is as follows: The stream is processed by x threads in parallel (Apache Spark tasks), each thread triggers up to y concurrent requests. All threads share one instance of the async client. Currently I'm still testing locally, not even on a cluster.

Setting particularly X, but also Y too high will repeatedly fail the stream within 1 minute. Even setting numRetries unreasonably high does not prevent the issue.

Initially I'm seeing this periodically:

22/02/11 18:28:48 WARN software.amazon.awssdk.http.nio.netty.internal.http2.MultiplexedChannelRecord: [Channel: 11fa7015] Child channel count was caught attempting to be increased over max concurrency. Please report this issue to the AWS SDK for Java team.

soon after followed by:

java.util.concurrent.CompletionException: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: The connection was closed during the request. The request will usually succeed on a retry, but if it does not: consider disabling any proxies you have configured, enabling debug logging, or performing a TCP dump to identify the root cause. If this is a streaming operation, validate that data is being read or written in a timely manner. Channel Information: ChannelDiagnostics(channel=[id: 0x551a2dfd, L:/192.168.178.20:50484 - R:kinesis.eu-central-1.amazonaws.com/3.123.12.193:443](H2 - 7), channelAge=PT0.001S, requestCount=1) Parent Channel Information: ChannelDiagnostics(channel=[id: 0x551a2dfd, L:/192.168.178.20:50484 - R:kinesis.eu-central-1.amazonaws.com/3.123.12.193:443], channelAge=PT10.166S, requestCount=0)
	at software.amazon.awssdk.utils.CompletableFutureUtils.errorAsCompletionException(CompletableFutureUtils.java:62)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncExecutionFailureExceptionReportingStage.lambda$execute$0(AsyncExecutionFailureExceptionReportingStage.java:51)
	at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836)
	at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)

An interesting observation is that for x<=2 even a high y works well.
A higher x, on the other hand, quickly leads to the issue above if y>=2.

@mosche
Copy link

mosche commented Feb 16, 2022

@debora-ito I tried looking into the issue. The strange thing is, when using a custom build of netty-nio-client (tag 2.17.127, openjdk 1.8.0_322) everything works fine. As soon as I use the jar published to maven I'm reliably running into the issue above. But I don't see any difference except for some build metadata

@debora-ito debora-ito self-assigned this Feb 16, 2022
@debora-ito
Copy link
Member

@mosche the custom build of netty-nio-client is written by you/your team? How are you checking the differences?

@mosche
Copy link

mosche commented Feb 21, 2022

@debora-ito All I did was a mvn clean install -pl :netty-nio-client -P quick --am on the original code, though turns out i didn't pay close enough attention to transitive dependencies.

Digging further this turns out to be a classpath issue with different transitive versions of netty. In my case Spark 2.4.7 uses
io.netty:netty-all:4.1.47.Final, which is a fat jar containing all netty io classes of all submodules. netty-all:4.1.72.Final has become an uber jar with dependencies. It looks like this causes incompatible classes to be loaded at runtime, causing the issue.

In my case forcing any consistent version of netty solves the issue.

 io.netty:netty:3.9.9.Final
-io.netty:netty-all:4.1.47.Final
+io.netty:netty-all:4.1.72.Final
 io.netty:netty-buffer:4.1.72.Final
 io.netty:netty-codec:4.1.72.Final
+io.netty:netty-codec-dns:4.1.72.Final
+io.netty:netty-codec-haproxy:4.1.72.Final
 io.netty:netty-codec-http:4.1.72.Final
 io.netty:netty-codec-http2:4.1.72.Final
+io.netty:netty-codec-memcache:4.1.72.Final
+io.netty:netty-codec-mqtt:4.1.72.Final
+io.netty:netty-codec-redis:4.1.72.Final
+io.netty:netty-codec-smtp:4.1.72.Final
+io.netty:netty-codec-socks:4.1.72.Final
+io.netty:netty-codec-stomp:4.1.72.Final
+io.netty:netty-codec-xml:4.1.72.Final
 io.netty:netty-common:4.1.72.Final
 io.netty:netty-handler:4.1.72.Final
+io.netty:netty-handler-proxy:4.1.72.Final
 io.netty:netty-resolver:4.1.72.Final
+io.netty:netty-resolver-dns:4.1.72.Final
+io.netty:netty-resolver-dns-classes-macos:4.1.72.Final
+io.netty:netty-resolver-dns-native-macos:4.1.72.Final
 io.netty:netty-tcnative-classes:2.0.46.Final
 io.netty:netty-transport:4.1.72.Final
 io.netty:netty-transport-classes-epoll:4.1.72.Final
+io.netty:netty-transport-classes-kqueue:4.1.72.Final
+io.netty:netty-transport-native-epoll:4.1.72.Final
+io.netty:netty-transport-native-kqueue:4.1.72.Final
 io.netty:netty-transport-native-unix-common:4.1.72.Final
+io.netty:netty-transport-rxtx:4.1.72.Final
+io.netty:netty-transport-sctp:4.1.72.Final
+io.netty:netty-transport-udt:4.1.72.Final

@mosche
Copy link

mosche commented Feb 21, 2022

Looks like 4.1.69.Final is the first correct uber jar that doesn't contain classes itself ...

@mosche
Copy link

mosche commented Feb 21, 2022

@debora-ito Also wondering, is com.typesafe.netty » netty-reactive-streams-http an intentional dependency of netty-nio-client?

@debora-ito
Copy link
Member

Yes.

<dependency>
<groupId>com.typesafe.netty</groupId>
<artifactId>netty-reactive-streams-http</artifactId>
</dependency>

@debora-ito
Copy link
Member

@lbourdages @Bas83 we want to investigate this error further, but we need a sample code so we can reproduce in our side.

Do you have self-contained repro code that reliably show the error, using the latest version of the SDK?

@Bas83
Copy link

Bas83 commented Oct 6, 2022

@lbourdages @Bas83 we want to investigate this error further, but we need a sample code so we can reproduce in our side.

Do you have self-contained repro code that reliably show the error, using the latest version of the SDK?

I'm afraid we don't, it's been a while and we've been running on version 2.4.3 for a while now. The only problem with that version that we have (or rather with the 2.4.1 one before that, but we haven't tried again) is that updating to Java 17 (from 16) miraculously makes kinesis skip part of the records while consuming.

@yasminetalby yasminetalby added p2 This is a standard priority issue needs-discussion This issue/PR requires more discussion with community. labels Nov 28, 2022
@debora-ito
Copy link
Member

Not a lot of movement around this issue lately, so I'm closing this.

If anyone have a repro code that reproduces the error reliably, please send it to us in a new fresh github issue.

@debora-ito debora-ito closed this as not planned Won't fix, can't repro, duplicate, stale Dec 24, 2024
Copy link

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. needs-discussion This issue/PR requires more discussion with community. p2 This is a standard priority issue
Projects
None yet
Development

No branches or pull requests

5 participants