Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NFS example needs updates #44

Closed
kingdonb opened this issue Jul 29, 2017 · 16 comments
Closed

NFS example needs updates #44

kingdonb opened this issue Jul 29, 2017 · 16 comments

Comments

@kingdonb
Copy link

Is this now the best place to raise: kubernetes/kubernetes#48161

The NFS example is not on the list of supported examples, but if you're looking for maintainer volunteers... I am pretty sure I am going to need this NFS example to work personally, and I hope to keep up with Kubernetes upgrades with several dev/prod environments, so I might be a good fit for supporting maintainer.

@ahmetb
Copy link
Member

ahmetb commented Aug 3, 2017

@kingdonb Thanks for showing interest. Our goal with examples like NFS that are currently in staging/ directory is to find them maintainers and ideally move them to another repository. If you think you have cycles to maintain this, I recommend you suggest creating a kubernetes-incubator/kubernetes-nfs repository by discussing this with [email protected]. Alternatively you can create a personal repository until that happens and then move things over.

Also please consider creating a Helm chart for easy installation of NFS in multiple environments if applicable. I see around many issues regarding NFS and a "community-maintained Helm chart" could help prevent anyone trying to run NFS on Kubernetes go through the same pains. Would this be something you're interested in?

@kingdonb
Copy link
Author

kingdonb commented Aug 3, 2017

Yes, I am interested in that.

I'll review your notes on the linked PR and start working on a Helm chart tonight!

@msau42
Copy link
Member

msau42 commented Aug 4, 2017

/sig storage

@ahmetb
Copy link
Member

ahmetb commented Aug 5, 2017

@msau42 I'm afraid we don't have SIG or area labels in this repository yet.

@kingdonb
Copy link
Author

I'm working my way back around to this from deis/workflow#856 and deis/workflow#857 ... these are the things I need personally so, hopefully I will need to maintain them and thereby make a great maintainer for these examples...

@ahmetb
Copy link
Member

ahmetb commented Sep 13, 2017

Any updates here?

@kingdonb
Copy link
Author

Not yet. Sorry, will make a point to look at it this week.

The goal is still to convert the example to work as a helm chart though, and with some parameters. (I'm learning ins and outs of building helm charts by porting Deis to OpenShift, which has been fun!)

@kingdonb
Copy link
Author

kingdonb commented Oct 3, 2017

I still owe a helm chart for this story

I've actually got a number of inquiries from people that need a PV that can be written into by simultaneously running pods that have the volume mounted concurrently. I'm trying to avoid coding a helm chart in a paper bag that only solves my problem, "engineering rule of three" says you can't make a general solution until you have at least three distinct customers with overlapping problems.

Not abandoned, but actively seeking that third customer if anyone needs this please reach out to me.

@kingdonb
Copy link
Author

kingdonb commented Oct 8, 2017

@kingdonb
Copy link
Author

kingdonb commented Oct 8, 2017

OK, I have composed PR #108 in order to avoid "perfect getting in the way of good"

It simply converts the existing structure into a helm chart, does not adjust any documentation yet.

You can helm install -n nfs examples/staging/volumes/nfs/chart/ and you get three RCs as in the original example: nfs-server, nfs-busybox, and nfs-web.

The NFS server binds itself to a 200GB PV, then exposes itself to the cluster as a service with service.clusterIP from your examples/staging/volumes/nfs/chart/values.yaml. This is the only parameter that the user is expected to set. I have not been able to find a way to assign this dynamically, but it is not a regression from the previous non-helm example.

I personally think this is a bug in the NFS PV driver, but I do not know how any other similar PV driver works so maybe this is working as designed.

@kingdonb
Copy link
Author

kingdonb commented Oct 8, 2017

The new example from #108 works exactly once per cluster boot (using Minikube)... something does not terminate upon helm delete --purge nfs and the result is, upon the second attempt to helm install the chart you get:

Unable to mount volumes for pod "nfs-busybox-3227049434-5dch7_default(9ffcb0cc-ac55-11e7-aaaa-080027f5463e)": timeout expired waiting for volumes to attach/mount for pod "default"/"nfs-busybox-3227049434-5dch7". list of unattached/unmounted volumes=[nfs]
Error syncing pod

Whatever hangs around also prevents a minikube stop from happening cleanly. If I terminate the VirtualBox instance and start it again, the service comes up and busybox/web workers all connect to the nfs-server successfully binding their PVCs to the ReadWriteMany service.

There may be many obstacles to overcome in order to make this example viable. Restarting minikube is akin to doing a down/up on the entire cluster, so I'm going to assume this type of failure condition is "very bad" in terms of what someone using this example on a live cluster would look like.

@kingdonb
Copy link
Author

kingdonb commented Oct 8, 2017

There's another proposed change in kingdonb#2

Is there any reason to keep these services as "replicationcontrollers" and have I done the conversion to Deployment resources correctly? The changes work for me (the example pods still come up and do their job.)

@msau42
Copy link
Member

msau42 commented Oct 9, 2017

  1. Regarding service name resolution, @jingxu97 should have fixed the issue on GCE/GKE COS images with Set up DNS server in containerized mounter path kubernetes#51645. The underlying issue is that mounts are done at the host level, so you need to configure the host's resolv.conf to also add the kube-dns server.

  2. Regarding termination, are you terminating everything in the right order? All the NFS clients must be terminated first, and then finally the NFS server. If you terminate the server first, then all the client Pods will fail to unmount during shutdown.

@kingdonb
Copy link
Author

kingdonb commented Oct 9, 2017

@msau42 That helps, thank you! I will try terminating the clients first next time.

I'm not on GKE COS though, so I'll seek out a solution on my local OS. That seems like a reasonable adjustment to make on the host OS. I also have other components that require the client (off-cluster) must be able to resolve the service in the same way as the on-cluster services that utilize them do. (I'm running a CAS server for federated HTTP authentication.)

Thanks very much!

@kingdonb
Copy link
Author

I think the other solution vs terminating the clients first, is to set the mount option for soft errors... that way the client will produce an error when it loses contact with the server, and not hang forever.

You might want your production deployments to wait for the server to come back instead of erroring out, so this can be one of the parameters to add to values.yaml

@kingdonb
Copy link
Author

kingdonb commented Oct 22, 2017

There is helm/charts#2559 now which can be a new home for this concern, noted that the examples/staging/nfs was updated and also works fine for me without helm at this point. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants