Tunnel closes too fast #10

thecadams · 2023-03-07T03:57:24Z

Hi @AndrewChubatiuk,
Thanks for this module, hoping to make it work over here!

Looks like the tunnel is closed from the Terraform side, about 1-3 seconds after being opened.

Logs: https://gist.github.com/thecadams/e3dc630cadadc9018946fef98aea26ca
Of particular interest in the tf log is this line:

�[0m�[1mdata.ssh_tunnel.bastion_ssh_tunnel: Read complete after 1s [id=localhost:26127]�[0m

I have a config similar to this:

terraform {
  required_providers {
    ...
    grafana = {
      source  = "grafana/grafana"
      version = "~> 1.35.0"
    }
    ssh = {
        source = "AndrewChubatiuk/ssh"
    }
    ...
  }
  required_version = ">= 1.2.6"
}

data "ssh_tunnel" "bastion_ssh_tunnel" {
  user = "terraform"
  auth {
    private_key {
      content = var.bastion_ssh_private_key
    }
  }
  server {
    host = "bastion-test.revenuecat.com"
    port = 222
  }
  remote {
    host = "grafana.test.internal"
    port = 3000
  }
}

provider "grafana" {
  auth = var.grafana_auth
  url  = "http://${data.ssh_tunnel.bastion_ssh_tunnel.local.0.host}:${data.ssh_tunnel.bastion_ssh_tunnel.local.0.port}"
}

module "rc_prometheus_test" {
  source = "../../modules/rc_prometheus"
  ...
  dashboards = {"uid1": some_dashbord_json_1, "uid2": some_dashboard_json_2}
  ...
  providers {
    grafana = grafana
  }
}

The rc_prometheus module manages 1 grafana folder and several dashboards in that folder:

(in ../../modules/rc_prometheus/main.tf):
...
resource "grafana_folder" "dashboards" {
  title = "Generated: DO NOT EDIT"
}

resource "grafana_dashboard" "dashboards" {
  for_each = var.dashboards
  folder   = grafana_folder.dashboards.id
  config_json = each.value
  overwrite = true
}

Unfortunately despite the grafana provider getting the correct host and port, I get connection refused as it seems the connection shuts down too fast. I also tried using time_sleep resources and provisioners in various places, but nothing worked.

Expected Behavior

There should be a way to control when the tunnel closes.

Actual Behavior

Tunnel closes within 1-3 seconds, causing connection refused errors in the module.

Steps to Reproduce

Something like the config above should repro this.

Important Factoids

Looks like recent changes in this fork removed the "close connection" provider, maybe that should be reinstated to support this use case?

You'll also notice stuff in the logs like this, which is not related, it's because I moved the ssh tunnel out of the module since the previous apply:

2023-03-07T01:34:58.915Z [DEBUG] module.rc_prometheus_test.module.bastion_ssh_tunnel is no longer in configuration

References

The text was updated successfully, but these errors were encountered:

thecadams · 2023-03-07T04:30:22Z

Maybe the tunnel is torn down after one usage? Just saw this on the remote side from sshd:

Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug2: channel 0: input drain -> closed
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug2: channel 0: rcvd adjust 9127
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug3: receive packet: type 97
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug2: channel 0: rcvd close
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug2: channel 0: output open -> drain
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug3: channel 0: will not send data after close
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug2: channel 0: obuf empty
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug2: channel 0: close_write
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug2: channel 0: output drain -> closed
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug2: channel 0: send close
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug3: send packet: type 97
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug2: channel 0: is dead
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug2: channel 0: garbage collecting
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug1: channel 0: free: direct-tcpip, nchannels 8
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug3: channel 0: status: The following connections are open:\r\n  #0 direct-tcpip (t4 r0 i3/0 o3/0 fd 8/8 cc -1)\r\n  #1 direct-tcpip (t4 r1 i0/0 o0/0 fd 9/9 cc -1)\r\n  #2 direct-tcpip (t4 r2 i0/0 o0/0 fd 10/10 cc -1)\r\n  #3 direct-tcpip (t4 r3 i0/0 o0/0 fd 11/11 cc -1)\r\n  #4 direct-tcpip (t4 r4 i0/0 o0/0 fd 1
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: Connection closed by 50.17.68.142 port 39023

Blefish · 2023-03-07T07:49:34Z

I found that if I removed the redirectStd commands which redirect stdout/stderr of the child process back to the provider, the child process outlives the provider process. I think it is intended by the module, but for some reason does not work.

thecadams · 2023-03-08T04:19:46Z

@Blefish based on what you mentioned, plus the bufio Scanner.Scan() docs I have a hypothesis:

The parent process is experiencing a panic in one of the goroutines due to no input from the child after a while, per this from the docs:

Scan panics if the split function returns too many empty tokens without advancing the input. This is a common error mode for scanners.

Pretty sure it's talking about this panic.

The child process writing to stdout/stderr, which is now closed on the read end, either blocks once the pipe fills up, or causes a crash (can't investigate which one in my setup as it's TF cloud)

If this is the case, the parent's stderr would have to not be in the logs, otherwise we'd see the panic. As well as that, it's reasonable for the child to die without anything in the tf logs, since the parent died first.

Thoughts on this?

thecadams · 2023-03-09T04:00:17Z

@Blefish you were right, ignoring the child process stdout and stderr seems to prevent the child process crashing. My fork has the change you described, and that fixes the issue for me. Thanks for the suggestion!

mvgijssel · 2023-03-23T16:12:32Z

@thecadams thanks for putting up the fork! I've managed getting it to work when executing Terraform locally, but unfortunately Terraform Cloud with remote execution does not work. Terraform Cloud has the same behaviour as you are describing even with your fork installed, the ssh tunnel stops 2 or 3 seconds after it's started.

AndrewChubatiuk · 2023-05-04T11:20:03Z

You can try release v0.2.3

mvgijssel mentioned this issue Mar 23, 2023

Try to setup provisioner infrastructure with Terraform Cloud vgijssel/setup#177

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tunnel closes too fast #10

Tunnel closes too fast #10

thecadams commented Mar 7, 2023 •

edited

Loading

thecadams commented Mar 7, 2023 •

edited

Loading

Blefish commented Mar 7, 2023

thecadams commented Mar 8, 2023 •

edited

Loading

thecadams commented Mar 9, 2023

mvgijssel commented Mar 23, 2023

AndrewChubatiuk commented May 4, 2023 •

edited

Loading

Tunnel closes too fast #10

Tunnel closes too fast #10

Comments

thecadams commented Mar 7, 2023 • edited Loading

Expected Behavior

Actual Behavior

Steps to Reproduce

Important Factoids

References

thecadams commented Mar 7, 2023 • edited Loading

Blefish commented Mar 7, 2023

thecadams commented Mar 8, 2023 • edited Loading

thecadams commented Mar 9, 2023

mvgijssel commented Mar 23, 2023

AndrewChubatiuk commented May 4, 2023 • edited Loading

thecadams commented Mar 7, 2023 •

edited

Loading

thecadams commented Mar 7, 2023 •

edited

Loading

thecadams commented Mar 8, 2023 •

edited

Loading

AndrewChubatiuk commented May 4, 2023 •

edited

Loading