Train / evaluate multiple TF models in parallel #17

bentsherman · 2019-05-29T20:47:33Z

The 3-layer MLP that we use generally does not utilize the entire GPU bandwidth, which means that we might be able to run multiple models on the same GPU in parallel and get some speedup. I'm not sure if this is feasible with Tensorflow and its Graphs / Sessions but I'm guessing that each MLP instance would probably need its own TF Graph and TF Session.

Assuming that all works, in phase 1 we can add parallelism easily with the n_jobs parameter of cross_val_score(), and for phase 2 we'd probably have to do it ourselves with multiprocess.

The text was updated successfully, but these errors were encountered:

bentsherman · 2019-05-30T20:53:37Z

phase1-evaluate.py can now use multiple parallel jobs for cross validation. However, I don't think our MLP class is entirely thread-safe yet, as I can only use up to 2 jobs, and if I use more than that I get errors. Even so, being able to run two MLPs in parallel will be a big improvment. On top of that, all of the sklearn classifiers can now use all CPU cores.

Since I'm not a tensorflow expert, there's probably something wrong with my tensorflow code. If we can't fix that, we might be able to use keras, which hides all of the details about graphs and sessions. Alternatively, we might be able to specify a different parallel backend for cross_validate which uses multiprocess instead of multithreading.

bentsherman · 2019-05-31T16:12:07Z

Using tf.keras didn't change anything, I get the same errors. In order for this feature to work we'll need to use multiprocess.

bentsherman · 2019-05-31T16:58:13Z

As it turns out, the default parallel backend was already using process-based parallelism. So perhaps the issue is that the additional processes fail because they can't allocate GPU memory, since tensorflow allocates the entire GPU by default.

So maybe we should actually use a multi-threading backend, and try to make all threads use the same context but different graphs? I don't know. Further investigation required.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train / evaluate multiple TF models in parallel #17

Train / evaluate multiple TF models in parallel #17

bentsherman commented May 29, 2019

bentsherman commented May 30, 2019

bentsherman commented May 31, 2019

bentsherman commented May 31, 2019

Train / evaluate multiple TF models in parallel #17

Train / evaluate multiple TF models in parallel #17

Comments

bentsherman commented May 29, 2019

bentsherman commented May 30, 2019

bentsherman commented May 31, 2019

bentsherman commented May 31, 2019