Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tunnel closes too fast #10

Open
thecadams opened this issue Mar 7, 2023 · 6 comments
Open

Tunnel closes too fast #10

thecadams opened this issue Mar 7, 2023 · 6 comments

Comments

@thecadams
Copy link

thecadams commented Mar 7, 2023

Hi @AndrewChubatiuk,
Thanks for this module, hoping to make it work over here!

Looks like the tunnel is closed from the Terraform side, about 1-3 seconds after being opened.

Logs: https://gist.github.com/thecadams/e3dc630cadadc9018946fef98aea26ca
Of particular interest in the tf log is this line:

�[0m�[1mdata.ssh_tunnel.bastion_ssh_tunnel: Read complete after 1s [id=localhost:26127]�[0m

I have a config similar to this:

terraform {
  required_providers {
    ...
    grafana = {
      source  = "grafana/grafana"
      version = "~> 1.35.0"
    }
    ssh = {
        source = "AndrewChubatiuk/ssh"
    }
    ...
  }
  required_version = ">= 1.2.6"
}
data "ssh_tunnel" "bastion_ssh_tunnel" {
  user = "terraform"
  auth {
    private_key {
      content = var.bastion_ssh_private_key
    }
  }
  server {
    host = "bastion-test.revenuecat.com"
    port = 222
  }
  remote {
    host = "grafana.test.internal"
    port = 3000
  }
}

provider "grafana" {
  auth = var.grafana_auth
  url  = "http://${data.ssh_tunnel.bastion_ssh_tunnel.local.0.host}:${data.ssh_tunnel.bastion_ssh_tunnel.local.0.port}"
}

module "rc_prometheus_test" {
  source = "../../modules/rc_prometheus"
  ...
  dashboards = {"uid1": some_dashbord_json_1, "uid2": some_dashboard_json_2}
  ...
  providers {
    grafana = grafana
  }
}

The rc_prometheus module manages 1 grafana folder and several dashboards in that folder:

(in ../../modules/rc_prometheus/main.tf):
...
resource "grafana_folder" "dashboards" {
  title = "Generated: DO NOT EDIT"
}

resource "grafana_dashboard" "dashboards" {
  for_each = var.dashboards
  folder   = grafana_folder.dashboards.id
  config_json = each.value
  overwrite = true
}

Unfortunately despite the grafana provider getting the correct host and port, I get connection refused as it seems the connection shuts down too fast. I also tried using time_sleep resources and provisioners in various places, but nothing worked.

Expected Behavior

There should be a way to control when the tunnel closes.

Actual Behavior

Tunnel closes within 1-3 seconds, causing connection refused errors in the module.

Steps to Reproduce

Something like the config above should repro this.

Important Factoids

Looks like recent changes in this fork removed the "close connection" provider, maybe that should be reinstated to support this use case?

You'll also notice stuff in the logs like this, which is not related, it's because I moved the ssh tunnel out of the module since the previous apply:

2023-03-07T01:34:58.915Z [DEBUG] module.rc_prometheus_test.module.bastion_ssh_tunnel is no longer in configuration

References

@thecadams
Copy link
Author

thecadams commented Mar 7, 2023

Maybe the tunnel is torn down after one usage? Just saw this on the remote side from sshd:

Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug2: channel 0: input drain -> closed
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug2: channel 0: rcvd adjust 9127
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug3: receive packet: type 97
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug2: channel 0: rcvd close
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug2: channel 0: output open -> drain
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug3: channel 0: will not send data after close
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug2: channel 0: obuf empty
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug2: channel 0: close_write
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug2: channel 0: output drain -> closed
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug2: channel 0: send close
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug3: send packet: type 97
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug2: channel 0: is dead
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug2: channel 0: garbage collecting
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug1: channel 0: free: direct-tcpip, nchannels 8
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug3: channel 0: status: The following connections are open:\r\n  #0 direct-tcpip (t4 r0 i3/0 o3/0 fd 8/8 cc -1)\r\n  #1 direct-tcpip (t4 r1 i0/0 o0/0 fd 9/9 cc -1)\r\n  #2 direct-tcpip (t4 r2 i0/0 o0/0 fd 10/10 cc -1)\r\n  #3 direct-tcpip (t4 r3 i0/0 o0/0 fd 11/11 cc -1)\r\n  #4 direct-tcpip (t4 r4 i0/0 o0/0 fd 1
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: Connection closed by 50.17.68.142 port 39023

@Blefish
Copy link

Blefish commented Mar 7, 2023

I found that if I removed the redirectStd commands which redirect stdout/stderr of the child process back to the provider, the child process outlives the provider process. I think it is intended by the module, but for some reason does not work.

@thecadams
Copy link
Author

thecadams commented Mar 8, 2023

@Blefish based on what you mentioned, plus the bufio Scanner.Scan() docs I have a hypothesis:

  1. The parent process is experiencing a panic in one of the goroutines due to no input from the child after a while, per this from the docs:

Scan panics if the split function returns too many empty tokens without advancing the input. This is a common error mode for scanners.

Pretty sure it's talking about this panic.

  1. The child process writing to stdout/stderr, which is now closed on the read end, either blocks once the pipe fills up, or causes a crash (can't investigate which one in my setup as it's TF cloud)

If this is the case, the parent's stderr would have to not be in the logs, otherwise we'd see the panic. As well as that, it's reasonable for the child to die without anything in the tf logs, since the parent died first.

Thoughts on this?

@thecadams
Copy link
Author

@Blefish you were right, ignoring the child process stdout and stderr seems to prevent the child process crashing. My fork has the change you described, and that fixes the issue for me. Thanks for the suggestion!

@mvgijssel
Copy link

@thecadams thanks for putting up the fork! I've managed getting it to work when executing Terraform locally, but unfortunately Terraform Cloud with remote execution does not work. Terraform Cloud has the same behaviour as you are describing even with your fork installed, the ssh tunnel stops 2 or 3 seconds after it's started.

@AndrewChubatiuk
Copy link
Owner

AndrewChubatiuk commented May 4, 2023

You can try release v0.2.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants