[FEATURE REQUEST]: Issues with Customizing Data Classes #2759

FriendLey · 2024-09-11T11:52:37Z

Motivation

In our current business scenario, the data type df contains not only the columns specified by COLUMN_DATA_TYPES in the BaseData class, but also many columns relevant to specific business scenarios. In the current implementation of ax, although a custom_data_class function is provided for users to define their own data classes, the Experiment class in ax.core.experiment and the Metric class in ax.core.metric do not support fetching data for custom data types.

As a result, in the following use case: if the user customizes the data type, they need to rewrite the fetch_data function in Experiment and various related functions, solely to support returning some custom data columns.

Describe the solution you'd like to see implemented in Ax.

Is there currently a plan for refactoring this? If not, do you think it’s necessary? ’m interested in working on this implementation.

Describe any alternatives you've considered to the above solution.

No response

Is this related to an existing issue in Ax or another repository? If so please include links to those Issues here.

No response

Code of Conduct

I agree to follow Ax's Code of Conduct

danielcohenlive · 2024-09-11T16:14:58Z

Hi @FriendLey, thanks for the feedback. I believe this may be simpler than you think. The base class Metric doesn't fetch data and you have to extend it if you want to implement custom data fetching. See 8. Defining custom metrics. I don't think you need to make any changes to experiment. You'd have to do something like

MyData: Type[Data] = custom_data_class(
    column_data_types={"my_column": str}
)
class MyMetric(Metric):
    data_constructor: Type[Data] = MyData
    
    # you'll have to write one of these anyway
    def fetch_trial_data(
        self, trial: core.base_trial.BaseTrial, **kwargs: Any
    ) -> MetricFetchResult:
        # construct a df `my_df` with "my_column"
        return Ok(
            value=MyData(df=my_df)
        )

I'm a little less sure about saving and loading an experiment with a custom data type. Is that a concern of yours? If so I can investigate.

There's also the issue that our models would not be using your custom column.

FriendLey · 2024-09-13T03:42:48Z

It seems that using the approach as you suggested doesn't work, In the fetch_data pipeline, certain places are hard-coded to the Data class, which leads to the following error: ValueError: Columns ['p_value', 'power'] are not supported. The scenario can be reproduced as follows:

from ax import (
    ChoiceParameter,
    ComparisonOp,
    Experiment,
    FixedParameter,
    Metric,
    Objective,
    OptimizationConfig,
    OrderConstraint,
    OutcomeConstraint,
    ParameterType,
    RangeParameter,
    SearchSpace,
    SumConstraint,
)
from ax.modelbridge.registry import Models
from ax.utils.notebook.plotting import init_notebook_plotting, render

init_notebook_plotting()

import pandas as pd
import numpy as np

from typing import Type

from ax import Data
from ax.core.data import BaseData, custom_data_class
from ax.utils.common.result import Err, Ok

MyData: Type[Data] = custom_data_class(
    column_data_types={
        **BaseData.COLUMN_DATA_TYPES,
        "p_value": float,
        "power": float,
    }
)


class BoothMetric(Metric):
    def fetch_trial_data(self, trial):
        records = []
        for arm_name, arm in trial.arms_by_name.items():
            params = arm.parameters
            records.append(
                {
                    "arm_name": arm_name,
                    "metric_name": self.name,
                    "trial_index": trial.index,
                    # in practice, the mean and sem will be looked up based on trial metadata
                    # but for this tutorial we will calculate them
                    "mean": (params["x1"] + 2 * params["x2"] - 7) ** 2
                    + (2 * params["x1"] + params["x2"] - 5) ** 2,
                    "sem": 0.0,
                    "p_value": 0.01,
                    "power": 0.8,
                }
            )
        return Ok(value=MyData(df=pd.DataFrame.from_records(records)))

    def is_available_while_running(self) -> bool:
        return True

search_space = SearchSpace(
    parameters=[
        RangeParameter(
            name=f"x{i}", parameter_type=ParameterType.FLOAT, lower=0.0, upper=1.0
        )
        for i in range(1, 3)
    ]
)

param_names = [f"x{i}" for i in range(1, 3)]
optimization_config = OptimizationConfig(
    objective=Objective(
        metric=BoothMetric(name="BoothMetric", lower_is_better=True),
        minimize=True,
    ),
)

from ax import Runner

class MyRunner(Runner):
    def run(self, trial):
        trial_metadata = {"name": str(trial.index)}
        return trial_metadata

exp = Experiment(
    name="test_hartmann",
    search_space=search_space,
    optimization_config=optimization_config,
    runner=MyRunner(),
)

from ax.modelbridge.registry import Models

NUM_SOBOL_TRIALS = 5
NUM_BOTORCH_TRIALS = 2

print(f"Running Sobol initialization trials...")
sobol = Models.SOBOL(search_space=exp.search_space)

for i in range(NUM_SOBOL_TRIALS):
    # Produce a GeneratorRun from the model, which contains proposed arm(s) and other metadata
    generator_run = sobol.gen(n=1)
    # Add generator run to a trial to make it part of the experiment and evaluate arm(s) in it
    trial = exp.new_trial(generator_run=generator_run)
    # Start trial run to evaluate arm(s) in the trial
    trial.run()
    # Mark trial as completed to record when a trial run is completed
    # and enable fetching of data for metrics on the experiment
    # (by default, trials must be completed before metrics can fetch their data,
    # unless a metric is explicitly configured otherwise)
    trial.mark_completed()

for i in range(NUM_BOTORCH_TRIALS):
    print(
        f"Running BO trial {i + NUM_SOBOL_TRIALS + 1}/{NUM_SOBOL_TRIALS + NUM_BOTORCH_TRIALS}..."
    )
    # Reinitialize GP+EI model at each step with updated data.
    gpei = Models.BOTORCH_MODULAR(experiment=exp, data=exp.fetch_data())
    generator_run = gpei.gen(n=1)
    trial = exp.new_trial(generator_run=generator_run)
    trial.run()
    trial.mark_completed()

print("Done!")

error details:

[ERROR 09-13 11:35:35] ax.core.experiment: Encountered ValueError Columns ['p_value', 'power'] are not supported. while attaching results. Proceeding and returning Results fetched without attaching.
Running Sobol initialization trials...
Running BO trial 6/7...
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[44], line 27
     23 print(
     24     f"Running BO trial {i + NUM_SOBOL_TRIALS + 1}[/](http://9.135.100.122:8080/){NUM_SOBOL_TRIALS + NUM_BOTORCH_TRIALS}..."
     25 )
     26 # Reinitialize GP+EI model at each step with updated data.
---> 27 gpei = Models.BOTORCH_MODULAR(experiment=exp, data=exp.fetch_data())
     28 generator_run = gpei.gen(n=1)
     29 trial = exp.new_trial(generator_run=generator_run)

File [/home/pengleizhao/miniconda3/envs/adaptive-py39/lib/python3.9/site-packages/ax/core/experiment.py:572](http://9.135.100.122:8080/home/pengleizhao/miniconda3/envs/adaptive-py39/lib/python3.9/site-packages/ax/core/experiment.py#line=571), in Experiment.fetch_data(self, metrics, combine_with_last_data, overwrite_existing_data, **kwargs)
    560 results = self._lookup_or_fetch_trials_results(
    561     trials=list(self.trials.values()),
    562     metrics=metrics,
   (...)
    565     **kwargs,
    566 )
    568 base_metric_cls = (
    569     MapMetric if self.default_data_constructor == MapData else Metric
    570 )
--> 572 return base_metric_cls._unwrap_experiment_data_multi(
    573     results=results,
    574 )

File [/home/pengleizhao/miniconda3/envs/adaptive-py39/lib/python3.9/site-packages/ax/core/metric.py:586](http://9.135.100.122:8080/home/pengleizhao/miniconda3/envs/adaptive-py39/lib/python3.9/site-packages/ax/core/metric.py#line=585), in Metric._unwrap_experiment_data_multi(cls, results)
    580     raise UnwrapError(errs) from (
    581         exceptions[0] if len(exceptions) == 1 else Exception(exceptions)
    582     )
    584 data = [ok.ok for ok in oks]
    585 return (
--> 586     cls.data_constructor.from_multiple_data(data=data)
    587     if len(data) > 0
    588     else cls.data_constructor()
    589 )

File [/home/pengleizhao/miniconda3/envs/adaptive-py39/lib/python3.9/site-packages/ax/core/data.py:529](http://9.135.100.122:8080/home/pengleizhao/miniconda3/envs/adaptive-py39/lib/python3.9/site-packages/ax/core/data.py#line=528), in Data.from_multiple_data(data, subset_metrics)
    516 @staticmethod
    517 def from_multiple_data(
    518     data: Iterable[Data], subset_metrics: Optional[Iterable[str]] = None
    519 ) -> Data:
    520     """Combines multiple objects into one (with the concatenated
    521     underlying dataframe).
    522 
   (...)
    527             in the underlying dataframe.
    528     """
--> 529     data_out = Data.from_multiple(data=data)
    530     if len(data_out.df.index) == 0:
    531         return data_out

File [/home/pengleizhao/miniconda3/envs/adaptive-py39/lib/python3.9/site-packages/ax/core/data.py:284](http://9.135.100.122:8080/home/pengleizhao/miniconda3/envs/adaptive-py39/lib/python3.9/site-packages/ax/core/data.py#line=283), in BaseData.from_multiple(cls, data)
    281 if len(dfs) == 0:
    282     return cls()
--> 284 return cls(df=pd.concat(dfs, axis=0, sort=True))

File [/home/pengleizhao/miniconda3/envs/adaptive-py39/lib/python3.9/site-packages/ax/core/data.py:92](http://9.135.100.122:8080/home/pengleizhao/miniconda3/envs/adaptive-py39/lib/python3.9/site-packages/ax/core/data.py#line=91), in BaseData.__init__(self, df, description)
     90 extra_columns = columns - self.supported_columns()
     91 if extra_columns:
---> 92     raise ValueError(f"Columns {list(extra_columns)} are not supported.")
     93 df = df.dropna(axis=0, how="all").reset_index(drop=True)
     94 df = self._safecast_df(df=df)

ValueError: Columns ['p_value', 'power'] are not supported.

danielcohenlive · 2024-09-13T16:34:14Z

@FriendLey I see what you're saying. We encode the data type as an int on experiment (https://github.com/facebook/Ax/blob/main/ax/core/experiment.py#L124) so it's loadable. We try not to encode classes directly in the db. Then we use that enum to look up what data type to use https://github.com/facebook/Ax/blob/main/ax/core/experiment.py#L589. Also from_multiple_data() should be a class method and should use its own type.

That would take a bit of a refactor. Alternatively, we might not need to raise if there are extra columns in data (https://github.com/facebook/Ax/blob/main/ax/core/data.py#L96).

Is there currently a plan for refactoring this? If not, do you think it’s necessary? ’m interested in working on this implementation.

If you wanted to implement this, I would recommend the path of just being more permissive with extra fields in Data and making sure they don't disappear when saved and reloaded.

FriendLey added the enhancement New feature or request label Sep 11, 2024

danielcohenlive self-assigned this Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE REQUEST]: Issues with Customizing Data Classes #2759

[FEATURE REQUEST]: Issues with Customizing Data Classes #2759

FriendLey commented Sep 11, 2024

danielcohenlive commented Sep 11, 2024

FriendLey commented Sep 13, 2024

danielcohenlive commented Sep 13, 2024

[FEATURE REQUEST]: Issues with Customizing Data Classes #2759

[FEATURE REQUEST]: Issues with Customizing Data Classes #2759

Comments

FriendLey commented Sep 11, 2024

Motivation

Describe the solution you'd like to see implemented in Ax.

Describe any alternatives you've considered to the above solution.

Is this related to an existing issue in Ax or another repository? If so please include links to those Issues here.

Code of Conduct

danielcohenlive commented Sep 11, 2024

FriendLey commented Sep 13, 2024

danielcohenlive commented Sep 13, 2024