Read lines with default encoding from os #12942

breedx-splk · 2024-12-20T22:19:26Z

Repeat of #12419 which fixes #12418

@laurit had previously asked if a file could be provided to reproduce this. Unfortunately, it's problematic because it depends on the os encoding. The original problem was from z/OS.....which I have no reasonable way of reproducing with tests etc. It's odd that z/os wouldn't support or use UTF-8, but here we are.

This change should at least handle those edge cases.

laurit · 2024-12-21T11:51:30Z

To me using Charset.defaultCharset() doesn't seem like the best option. Since jdk18 the default charset is utf-8 unless user explicitly configures to use a different encoding, see https://openjdk.org/jeps/400. On older jdks users could use -Dfile.encoding=utf-8 even if the system encoding is something different.
Looking at https://github.com/moby/sys/blob/mountinfo/v0.7.2/mountinfo/mountinfo_linux.go#L23 I think they are also reading it as utf-8, could be wrong, don't really know any go. Without a sample that doesn't decode as with utf-8 it is hard to guess why it would fail. Could be that z/os does something weird. Also could be that the assumption of this file containing utf-8 is wrong and only works because everybody uses utf-8. Weren't paths just blobs from the linux kernel perspective? Instead of Charset.defaultCharset() I'd try StandardCharsets.ISO_8859_1. From what I understand we don't really care about the paths or whatever that could be in utf-8, container id is a hex string and the other symbols we use for parsing are all present in iso-8859-1.

read lines with default encoding from os

1789deb

breedx-splk requested a review from a team as a code owner December 20, 2024 22:19

trask approved these changes Dec 20, 2024

View reviewed changes

jaydeluca approved these changes Dec 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read lines with default encoding from os #12942

Read lines with default encoding from os #12942

breedx-splk commented Dec 20, 2024

laurit commented Dec 21, 2024

Read lines with default encoding from os #12942

Are you sure you want to change the base?

Read lines with default encoding from os #12942

Conversation

breedx-splk commented Dec 20, 2024

laurit commented Dec 21, 2024