Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor helper binaries to save 161MB of disk space when the agent is installed and reduce RPM by 48MB #1454

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

chadpatel
Copy link
Contributor

@chadpatel chadpatel commented Dec 3, 2024

Description of the issue

The cloudwatch-agent has 3 helper binaries that use an excessive amount of disk

  • amazon-cloudwatch-agent-config-wizard (14M)
  • config-downloader (37MB)
  • config-translator (117M)

The reason is they are directly and indirectly pulling dependencies from cloudwatch-agent. To solve this problem, I updated config-downloader, config-translator and the wizard to be shims that just redirect to amazon-cloudwatch-agent binary

This works fine, the main risk is these binaries are no longer "portable", they depend on finding the path to amazon-cloudwatch-agent at runtime. I am using the same method as start-amazon-cloudwatch-agent for finding the path

Description of changes

High level the approach is to maintain the same argument interface for the existing 3 commands and seamlessly move the logic in to the main binary. To do this the old commands need to keep the same args but we prefix the args when we call CWA so that there are no duplicate args.

The general approach I took was to create a new cmdwrapper which offers two methods.

  1. AddFlag - this method takes in a map of flags and a prefix. The prefix is blank for the binary to be replaced and the prefix is the command name when called by amazon-cloudwatch-agent. it hooks directly in to the flag API for pulling command line args. I considered using subcommands but things got too complicated and ugly, prefixing was simpler
  2. ExecuteAgentCommand - this takes in a set of flags and then it finds amazon-cloudwatch-agent and it calls it with the new flags. It remaps stdin/stdout/stderr so it appears seemless. Things like the wizard which rely on stdin still work

I moved the flags in to their own separate flags packages which have NO dependencies (keeping the binaries small). And then the old binaries and amazon-cloudwatch-agent pull in the flags and the commands and link everything together. For the wizard I had to move a few other common constants like the in linuxMigration.go and windows_migration.go

I added and updated unit tests wherever possible. The old translator_test.go was moved in to translatorutil_test.go as that is primarily what those tests were testing

Note: inline diff is VERY hard to follow because of all the moved code, I recommend split diff or we can do a code walk-through

License

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Tests

Manual testing to confirm cloudwatch agent still starts/loads

There were no prior tests that actually tested config-translate binary... the old tests basically just tested cmdutil so I moved those in to the cmdutil tests. We could write more tests for the shim and for translate.go. Tricky tests to write due to the OS coupling.

Before:

-rwxr-xr-x 1 patchad amazon 129M Dec  2 22:49 amazon-cloudwatch-agent
-rwxr-xr-x 1 patchad amazon  14M Dec  2 22:49 amazon-cloudwatch-agent-config-wizard
-rwxr-xr-x 1 patchad amazon  37M Dec  2 22:44 config-downloader
-rwxr-xr-x 1 patchad amazon 117M Dec  2 22:47 config-translator
-rwxr-xr-x 1 patchad amazon 2.1M Dec  2 22:49 start-amazon-cloudwatch-agent```

```➜  amazon-cloudwatch-agent git:(main) ✗ ls -lh ~/Downloads/amazon-cloudwatch-agent.rpm
Permissions Size User    Date Modified Name
.rw-r--r--@ 113M patchad  2 Dec 15:36  /Users/patchad/Downloads/amazon-cloudwatch-agent.rpm```

After:

total 136M
-rwxr-xr-x 1 patchad amazon 129M Dec 16 20:34 amazon-cloudwatch-agent
-rwxr-xr-x 1 patchad amazon 1.7M Dec 16 20:34 amazon-cloudwatch-agent-config-wizard
-rwxr-xr-x 1 patchad amazon 1.7M Dec 16 20:34 config-downloader
-rwxr-xr-x 1 patchad amazon 1.7M Dec 16 20:34 config-translator
-rwxr-xr-x 1 patchad amazon 2.1M Dec 16 20:34 start-amazon-cloudwatch-agent


-rw-r--r-- 1 patchad amazon 65M Dec 16 20:35 /local/home/patchad/workplace/cwa/amazon-cloudwatch-agent/build/bin/linux/amd64/amazon-cloudwatch-agent.rpm


Wizard still works:

2024/12/17 21:54:39 Starting config-wizard, this will map back to a call to amazon-cloudwatch-agent
2024/12/17 21:54:39 Executing /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent with arguments: [-config-wizard -config-wizard-is-non-interactive-windows-migration false -config-wizard-use-parameter-store false -config-wizard-is-non-interactive-linux-migration false -config-wizard-traces-only false -config-wizard-non-interactive-xray-migration false]

= Welcome to the Amazon CloudWatch Agent Configuration Manager =
= =
= CloudWatch Agent allows you to collect metrics and logs from =
= your host and send them to CloudWatch. Additional CloudWatch =
= charges may apply. =

On which OS are you planning to use the agent?

  1. linux
  2. windows
  3. darwin
    default choice: [1]:
    1
    Trying to fetch the default region based on ec2 metadata...
    I! imds retry client will retry 1 timesD! should retry false for imds error : RequestCanceled: EC2 IMDS access disabled via AWS_EC2_METADATA_DISABLED env varW! could not get region from ec2 metadata... EC2MetadataRequestError: failed to get EC2 instance identity document
    caused by: RequestCanceled: EC2 IMDS access disabled via AWS_EC2_METADATA_DISABLED env varAre you using EC2 or On-Premises hosts?
  4. EC2
  5. On-Premises
    default choice: [2]:

Downloader/Translator still work

➜ amazon-cloudwatch-agent git:(patchad-config-translate-refactor) ✗ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json

****** processing amazon-cloudwatch-agent ******
2024/12/17 21:56:21 Starting config-downloader, this will map back to a call to amazon-cloudwatch-agent
2024/12/17 21:56:21 Executing /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent with arguments: [-config-downloader -config-downloader-config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml -config-downloader-multi-config default -config-downloader-mode ec2 -config-downloader-download-source file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json -config-downloader-output-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d]
I! Trying to detect region from ec2 D! [EC2] Found active network interface I! imds retry client will retry 1 times
Start configuration validation...
2024/12/17 21:56:21 Starting config-translator, this will map back to a call to amazon-cloudwatch-agent
2024/12/17 21:56:21 Executing /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent with arguments: [-config-translator -config-translator-multi-config default -config-translator-input /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json -config-translator-input-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d -config-translator-output /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml -config-translator-mode ec2 -config-translator-config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml]
2024-12-17T21:56:21Z I! Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_amazon-cloudwatch-agent.json.tmp ...
2024-12-17T21:56:21Z I! Valid Json input schema.
2024-12-17T21:56:21Z I! Configuration validation first phase succeeded
I! Trying to detect region from ec2
D! [EC2] Found active network interface
I! imds retry client will retry 1 times
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent -schematest -config /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
Configuration validation second phase succeeded
Configuration validation succeeded


### Rough Edges

This is the flow if config-translator fails.  I tried to unify the code so behavior is slightly different.  We could change this if we needed to
**new**:

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json

****** processing amazon-cloudwatch-agent ******
2024/12/17 21:57:13 Starting config-downloader, this will map back to a call to amazon-cloudwatch-agent
2024/12/17 21:57:13 Executing /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent with arguments: [-config-downloader -config-downloader-mode ec2 -config-downloader-download-source file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json -config-downloader-output-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d -config-downloader-config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml -config-downloader-multi-config default]
2024-12-17T21:57:13Z E! Failed to initialize config downloader: fail to fetch/remove json config: open /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json: no such file or directory
2024/12/17 21:57:13 E! Translation process exited with non-zero status: 1, err: exit status 1
panic: E! Translation process exited with non-zero status: 1, err: exit status 1

goroutine 1 [running]:
log.Panicf({0x4e8cc3?, 0xc0000c8020?}, {0xc0000c3de0?, 0xc000098038?, 0x3?})
log/log.go:439 +0x65
github.com/aws/amazon-cloudwatch-agent/tool/cmdwrapper.ExecuteAgentCommand({0x4e165b, 0x11}, 0xc0000a4120)
github.com/aws/amazon-cloudwatch-agent/tool/cmdwrapper/cmdwrapper.go:59 +0x5df
main.main()
github.com/aws/amazon-cloudwatch-agent/cmd/config-downloader/downloader.go:20 +0xc8


➜ amazon-cloudwatch-agent git:(patchad-config-translate-refactor) ✗ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json

****** processing amazon-cloudwatch-agent ******
2024/12/17 21:58:07 E! Fail to fetch/remove json config: open /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json: no such file or directory


# Requirements
_Before commit the code, please do the following steps._
1. Run `make fmt` and `make fmt-sh`
2. Run `make lint`




@chadpatel chadpatel requested a review from a team as a code owner December 3, 2024 19:25
@@ -1,137 +1,61 @@
// Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: MIT
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can bring this back, lost it when moving code around

@@ -95,6 +96,16 @@ var fRunAsConsole = flag.Bool("console", false, "run as console application (win
var fSetEnv = flag.String("setenv", "", "set an env in the configuration file in the format of KEY=VALUE")
var fStartUpErrorFile = flag.String("startup-error-file", "", "file to touch if agent can't start")

// config-translator
var fConfigTranslator = flag.Bool("config-translator", false, "run in config-translator mode")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent some time trying to reduce the copy/paste aspect. Some of these constants are defined twice. There are some things to do to clean it up but they all end up less readable and add a lot of indirection. Open to ideas tho.

If I was forced to make one change I would put the descriptions in a constant

@@ -38,3 +40,195 @@ func TestTranslateJsonMapToEnvConfigFile(t *testing.T) {
assert.Equal(t, expectedJson[envconfig.CWAGENT_LOG_LEVEL], actualJson[envconfig.CWAGENT_LOG_LEVEL])
assert.Equal(t, expectedJson[envconfig.AWS_SDK_LOG_LEVEL], actualJson[envconfig.AWS_SDK_LOG_LEVEL])
}

func TestAgentConfig(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code was moved from translator_test.go

ctx *context.Context
}

func NewConfigTranslator(inputOs, inputJsonFile, inputJsonDir, inputTomlFile, inputMode, inputConfig, multiConfig string) (*ConfigTranslator, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code was moved from the old config-translator/translator.go

Copy link
Contributor

This PR was marked stale due to lack of activity.

@github-actions github-actions bot added the Stale label Dec 11, 2024
@chadpatel chadpatel changed the title Config Translate Refactor to save >100MB of disk space when the agent is installed Config Translate Refactor to save 161MB of disk space when the agent is installed and reduce RPM by 48MB Dec 16, 2024
@chadpatel chadpatel changed the title Config Translate Refactor to save 161MB of disk space when the agent is installed and reduce RPM by 48MB Refactor helper binaries to save 161MB of disk space when the agent is installed and reduce RPM by 48MB Dec 16, 2024
@github-actions github-actions bot removed the Stale label Dec 17, 2024
Copy link
Contributor

This PR was marked stale due to lack of activity.

@github-actions github-actions bot added the Stale label Dec 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant