Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Require certificate registered within cluster before choosing CQL SSL #3699

Merged
merged 1 commit into from
Jan 31, 2024

Conversation

zimnx
Copy link
Contributor

@zimnx zimnx commented Jan 25, 2024

Previously, SSL was preferred when client_encryption_options.enabled
coming from ScyllaDB configuration was true and SSL port is open,
even when Scylla Manager did not have any client certificate registered
for particular cluster.

This caused issues when ScyllaDB cluster was exposing both CQL and CQL
SSL with mTLS, because even when Manager was not registered with
certificates, it still insisted to establish sessions using SSL port.
CQL healthchecks was also affected.

It can be further improved by implementing #3679 which will add a switch deciding whether user wants to use either SSL or non SSL port.

Fixes #3698

@zimnx
Copy link
Contributor Author

zimnx commented Jan 25, 2024

Seems like failures of CI tests, are also present on master. Is this a known issue?

@karol-kokoszka
Copy link
Collaborator

karol-kokoszka commented Jan 25, 2024

@zimnx restore schema in 5.4 is a known issue.
Ill review the PR tomorrow.

If it’s no restore schema related then it’s a regression

@Michal-Leszczynski
Copy link
Collaborator

I'm not an expert on this topic, but the following fragment of code:

 scyllaCluster.SslOpts = &gocql.SslOptions{
 	Config: &tls.Config{
 		InsecureSkipVerify: true,
 	},
 }

Makes it so that SM never validates Scylla certs. In general, is this desired or should be changed in the future? (I'm aware that it was always there)

@zimnx
Copy link
Contributor Author

zimnx commented Jan 26, 2024

Makes it so that SM never validates Scylla certs. In general, is this desired or should be changed in the future? (I'm aware that it was always there)

It should be changed. The thing is, Manager currently lacks Trusted CA parameter during cluster registration. Without it, it's not possible to trust certificates not signed by publicly trusted CAs, which is rarely the case.

@karol-kokoszka
Copy link
Collaborator

@Michal-Leszczynski that's a good point. But, on the other hand, manager desire is not to save some sensitive information to Scylla cluster. It just need connection to check if the node lives. We can ignore the certs.

Copy link
Collaborator

@karol-kokoszka karol-kokoszka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@zimnx
Copy link
Contributor Author

zimnx commented Jan 26, 2024

It just need connection to check if the node lives.

Certificate validation is also disabled for CQL session, which is serious as Manager might connect to different cluster when there's a MITM.

@Michal-Leszczynski
Copy link
Collaborator

Michal-Leszczynski commented Jan 26, 2024

What about a situation when:

  • SM doesn't have TLS certs
  • ClientEncryptionEnabled = true
  • ClientEncryptionRequireAuth = false

This implementation suggests to use regular CQL port, but is there anything stopping SM from using CQL SSL port without specifying certs? Wouldn't it be better/safer?

@Michal-Leszczynski
Copy link
Collaborator

This PR also removes the following part of CQL SSL port fallback:

// Scylla API always returns non-empty NativeTransportPortSSL even when
// value is explicitly disabled in configuration file.
// This makes impossible to determine which port is being used for CQL
// frontend. To workaround it, we try to dial SSL port when
// client encryption is enabled. If any error happens, assume this port
// is not used.
// Ref: https://github.com/scylladb/scylla/issues/7206

Which goes in pair with the following fragment of scylla.yaml:

# Enabling native transport encryption in client_encryption_options allows you to either use
# encryption for the standard port or to use a dedicated, additional port along with the unencrypted
# standard native_transport_port.
# Enabling client encryption and keeping native_transport_port_ssl disabled will use encryption
# for native_transport_port. Setting native_transport_port_ssl to a different value
# from native_transport_port will use encryption for native_transport_port_ssl while
# keeping native_transport_port unencrypted.
#native_transport_port_ssl: 9142

My understanding is that in some configurations NativeTransportPortSsl might be disabled even when ClientEncryptionEnabled is enabled (possibly with ClientEncryptionRequireAuth enabled as well).
Mentioned issue is still open, so is it safe to remove this fallback behavior?

…ing CQL SSL

Previously, SSL was preferred when client_encryption_options.enabled
coming from ScyllaDB configuration was true and SSL port is open,
even when Scylla Manager did not have any client certificate registered
for particular cluster.

This caused issues when ScyllaDB cluster was exposing both CQL and CQL
SSL with mTLS, because even when Manager was not registered with
certificates, it still insisted to establish sessions using SSL port.
CQL healthchecks was also affected.

Fixes scylladb#3698
@zimnx
Copy link
Contributor Author

zimnx commented Jan 26, 2024

What about a situation when:

* SM doesn't have TLS certs

* ClientEncryptionEnabled = true

* ClientEncryptionRequireAuth = false

This implementation suggests to use regular CQL port, but is there anything stopping SM from using CQL SSL port without specifying certs? Wouldn't it be better/safer?

Yep, that would be better. Added support for it.

@zimnx
Copy link
Contributor Author

zimnx commented Jan 26, 2024

This PR also removes the following part of CQL SSL port fallback:
My understanding is that in some configurations NativeTransportPortSsl might be disabled even when ClientEncryptionEnabled is enabled (possibly with ClientEncryptionRequireAuth enabled as well).
Mentioned scylladb/scylladb#7206 is still open, so is it safe to remove this fallback behavior?

This fallback is error prone and gives false negatives. Implemented probe had just 1s timeout and was sent from Manager Server which for case when given node is in different DC may return different results between runs.

Because referenced issue is not solved by Scylla, there's no reliable way to tell whether given port is open or not.
Instead on relying on unreliable probes, Manager will add switch where user will decide whether his cluster should use TLS or not. This will be solved as part of #3679.

@karol-kokoszka
Copy link
Collaborator

karol-kokoszka commented Jan 30, 2024

@Michal-Leszczynski there is unresolved conversation under your comment. Please let us know if it's ok, so that @zimnx can merge the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Manager uses TLS port even when it cannot establish a handshake
3 participants