[Bug] optimize_acqf erroring out with SingleTaskMultiFidelityGP #2402

esantorella · 2024-06-29T17:24:13Z

Thanks to ToennisStef for raising this in #2393.

🐛 Bug

I'm looking at an example with a SingleTaskMultiFidelityGP, evaluating acquisition values where both the x and the objective are at fidelities other than the highest fidelity. This produces NaN acquisition values and causes optimize_acqf to error out. While optimizing for a fidelity other than the highest may not make sense, this also happens when optimizing qMultiFidelityKnowledgeGradient for the highest fidelity. I'm seeing the following behavior:

The posterior variance is sometimes initially computed as negative and then clipped to 1e-10. When computing it, I get gpytorch/distributions/multivariate_normal.py:319: NumericalWarning: Negative variance values detected. This is likely due to numerical instabilities. Rounding negative variances up to 1e-10.
qLogEI returns a NaN at the same locations as the posterior variance can be negative.
optimize_acqf with a FixedFeatureAcquisitionFunction, fixing the fidelity to 0 and using qLogEI, errors out.
Following the setup of the multi-fidelity tutorial, optimizing qMultiFidelityKnowledgeGradient for the highest fidelity.

What the posterior looks like:

acqf values if we were to just work with fidelity=0:

To reproduce

See gist for full code. It ends with

candidates, _ = optimize_acqf_mixed(
    acq_function=mfkg_acqf,
    bounds=bounds_x,
    fixed_features_list=[{1: 0}],
    q=1,
    num_restarts=5,
    raw_samples=128,
    # batch_initial_conditions=X_init,
    options={"batch_limit": 5, "maxiter": 200},
)

Alternatively, skipping the cost function setup, the same error can be produced more simply with

acq_func = FixedFeatureAcquisitionFunction(
    acq_function=qLogExpectedImprovement(model=model, best_f=train_y[train_x[:,1]==3].max()), 
    d=1+1, 
    columns=[1],
    values=[0],
)

candidates, _ = optimize_acqf(
    acq_function=acq_func,
    bounds=torch.tensor([[0.], [1.]], dtype=torch.float64),
    q=1,
    num_restarts=20,
    raw_samples=512,
)

** Stack trace/error message **

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[12], line 1
----> 1 candidates, _ = optimize_acqf_mixed(
      2     acq_function=mfkg_acqf,
      3     bounds=bounds_x,
      4     # fixed_features_list=[{1: i} for i in range(3)],
      5     fixed_features_list=[{1: 0}],
      6     q=1,
      7     num_restarts=5,
      8     raw_samples=128,
      9     # batch_initial_conditions=X_init,
     10     options={"batch_limit": 5, "maxiter": 200},
     11 )

File ~/botorch/botorch/optim/optimize.py:926, in optimize_acqf_mixed(acq_function, bounds, q, num_restarts, fixed_features_list, raw_samples, options, inequality_constraints, equality_constraints, nonlinear_inequality_constraints, post_processing_func, batch_initial_conditions, ic_generator, ic_gen_kwargs)
    924 ff_candidate_list, ff_acq_value_list = [], []
    925 for fixed_features in fixed_features_list:
--> 926     candidate, acq_value = optimize_acqf(
    927         acq_function=acq_function,
    928         bounds=bounds,
    929         q=q,
    930         num_restarts=num_restarts,
    931         raw_samples=raw_samples,
    932         options=options or {},
    933         inequality_constraints=inequality_constraints,
    934         equality_constraints=equality_constraints,
    935         nonlinear_inequality_constraints=nonlinear_inequality_constraints,
    936         fixed_features=fixed_features,
    937         post_processing_func=post_processing_func,
    938         batch_initial_conditions=batch_initial_conditions,
    939         ic_generator=ic_generator,
    940         return_best_only=True,
    941         **ic_gen_kwargs,
    942     )
    943     ff_candidate_list.append(candidate)
    944     ff_acq_value_list.append(acq_value)

File ~/botorch/botorch/optim/optimize.py:543, in optimize_acqf(acq_function, bounds, q, num_restarts, raw_samples, options, inequality_constraints, equality_constraints, nonlinear_inequality_constraints, fixed_features, post_processing_func, batch_initial_conditions, return_best_only, gen_candidates, sequential, ic_generator, timeout_sec, return_full_tree, retry_on_optimization_warning, **ic_gen_kwargs)
    520     gen_candidates = gen_candidates_scipy
    521 opt_acqf_inputs = OptimizeAcqfInputs(
    522     acq_function=acq_function,
    523     bounds=bounds,
   (...)
    541     ic_gen_kwargs=ic_gen_kwargs,
    542 )
--> 543 return _optimize_acqf(opt_acqf_inputs)

File ~/botorch/botorch/optim/optimize.py:564, in _optimize_acqf(opt_inputs)
    561     return _optimize_acqf_sequential_q(opt_inputs=opt_inputs)
    563 # Batch optimization (including the case q=1)
--> 564 return _optimize_acqf_batch(opt_inputs=opt_inputs)

File ~/botorch/botorch/optim/optimize.py:255, in _optimize_acqf_batch(opt_inputs)
    252     batch_initial_conditions = opt_inputs.batch_initial_conditions
    253 else:
    254     # pyre-ignore[28]: Unexpected keyword argument `acq_function` to anonymous call.
--> 255     batch_initial_conditions = opt_inputs.get_ic_generator()(
    256         acq_function=opt_inputs.acq_function,
    257         bounds=opt_inputs.bounds,
    258         q=opt_inputs.q,
    259         num_restarts=opt_inputs.num_restarts,
    260         raw_samples=opt_inputs.raw_samples,
    261         fixed_features=opt_inputs.fixed_features,
    262         options=options,
    263         inequality_constraints=opt_inputs.inequality_constraints,
    264         equality_constraints=opt_inputs.equality_constraints,
    265         **opt_inputs.ic_gen_kwargs,
    266     )
    268 batch_limit: int = options.get(
    269     "batch_limit",
    270     (
   (...)
    274     ),
    275 )
    277 def _optimize_batch_candidates() -> Tuple[Tensor, Tensor, List[Warning]]:

File ~/botorch/botorch/optim/initializers.py:515, in gen_one_shot_kg_initial_conditions(acq_function, bounds, q, num_restarts, raw_samples, fixed_features, options, inequality_constraints, equality_constraints)
    512 q_aug = acq_function.get_augmented_q_batch_size(q=q)
    514 # TODO: Avoid unnecessary computation by not generating all candidates
--> 515 ics = gen_batch_initial_conditions(
    516     acq_function=acq_function,
    517     bounds=bounds,
    518     q=q_aug,
    519     num_restarts=num_restarts,
    520     raw_samples=raw_samples,
    521     fixed_features=fixed_features,
    522     options=options,
    523     inequality_constraints=inequality_constraints,
    524     equality_constraints=equality_constraints,
    525 )
    527 # compute maximizer of the value function
    528 value_function = _get_value_function(
    529     model=acq_function.model,
    530     objective=acq_function.objective,
   (...)
    533     project=getattr(acq_function, "project", None),
    534 )

File ~/botorch/botorch/optim/initializers.py:424, in gen_batch_initial_conditions(acq_function, bounds, q, num_restarts, raw_samples, fixed_features, options, inequality_constraints, equality_constraints, generator, fixed_X_fantasies)
    422         start_idx += batch_limit
    423     Y_rnd = torch.cat(Y_rnd_list)
--> 424 batch_initial_conditions = init_func(
    425     X=X_rnd, Y=Y_rnd, n=num_restarts, **init_kwargs
    426 ).to(device=device)
    427 if not any(issubclass(w.category, BadInitialCandidatesWarning) for w in ws):
    428     return batch_initial_conditions

File ~/botorch/botorch/optim/initializers.py:952, in initialize_q_batch(X, Y, n, eta)
    950     weights = torch.exp(etaZ)
    951 if batch_shape == torch.Size():
--> 952     idcs = torch.multinomial(weights, n)
    953 else:
    954     idcs = batched_multinomial(
    955         weights=weights.permute(*range(1, len(batch_shape) + 1), 0), num_samples=n
    956     ).permute(-1, *range(len(batch_shape)))

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Expected Behavior

Numerical inaccuracy is not uncommon in optimization; however, this typically should not lead to exceptions, since multi-restart optimization may allow for finding an optimum nonetheless. In this case, it is clear there is an optimum, so optimize_acqf should find it.

System information

Please complete the following information:

BoTorch Version--> 0.11.2.dev4+g80ac43eda.d20240614
GPyTorch Version --> 1.11
PyTorch Version --> 2.2.1
OS X

The text was updated successfully, but these errors were encountered:

Balandat · 2024-06-29T23:43:04Z

cc @SebastianAment re qLogEI having a "hole". The model actually seems fine here (thanks @esantorella for the great diagnostics), so this is probably just b/c the incumbent is so high (8.8638 in this case if I got that right from the other issue, by far the largest observed value).

As a first step I would recommend using qLogNoisyExpectedImprovement here, which usually has better numerical behavior.

esantorella added the bug Something isn't working label Jun 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] optimize_acqf erroring out with SingleTaskMultiFidelityGP #2402

[Bug] optimize_acqf erroring out with SingleTaskMultiFidelityGP #2402

esantorella commented Jun 29, 2024

Balandat commented Jun 29, 2024

[Bug] optimize_acqf erroring out with SingleTaskMultiFidelityGP #2402

[Bug] optimize_acqf erroring out with SingleTaskMultiFidelityGP #2402

Comments

esantorella commented Jun 29, 2024

🐛 Bug

To reproduce

Expected Behavior

System information

Balandat commented Jun 29, 2024