Fix dtype of pd.Series or pd.DataFrame to a specific type

bertcoerver · February 13, 2024, 2:30pm

Hello,

Is there any way to fix the attribute of a param? As an example, lets say I create a param.ClassSelector called x which holds an empty pd.Series with dtype = int. Now I know that x will always be a pd.Series (otherwise param raises an error), but I want to be sure that the dtype of x is always an integer:

import pandas as pd
import param

class A(param.Parameterized):

    x = param.ClassSelector(default = pd.Series(dtype = int), class_ = pd.Series)

    def __init__(self, **params):
        super().__init__(**params)

    def change_dtype(self):
        print(self.x.dtype)
        self.x = pd.Series([1.2])
        print(self.x.dtype)

a = A()

a.change_dtype()

>>> int64
>>> float64

So, how to make sure the dtype attribute of x cannot change?

ahuang11 · February 13, 2024, 3:32pm

You’d probably have to inherit param.Series and write your own _validate_value
https://param.holoviz.org/_modules/param/parameters.html#Series

An example

github.com

holoviz/panel/blob/723768ef2c8966f9da437d8c4b07b79e67768e30/panel/_param.py#L18-L26


      
          def _validate_value(self, val, allow_None, valid=('auto', 'start', 'center', 'end')):
              if ((val is None and allow_None) or val in valid or
                  (isinstance(val, tuple) and len(val) == 2 and all(v in valid for v in val))):
                  return
              raise ValueError(
                  f"Align parameter {self.name!r} must be one of 'start', "
                  "'center', 'end' or a two-tuple specifying the (vertical, "
                  f"horizontal) values for the alignment, not {val!r}."
              )

bertcoerver · February 14, 2024, 9:46am

Thanks @ahuang11 ! With your example and the explanation in the docs over here I managed to make something for param.DataFrame.

With this class I can now give the columns a dictionary specifying the column names and the expected dtypes or pass a dictionary to dtypes directly:

class SuperDataFrame(param.DataFrame):

    __slots__ = ['dtypes']

    _slot_defaults = param._utils._dict_update(
        param.DataFrame._slot_defaults, dtypes = None
    )

    def __init__(self, **params):

        self.dtypes = params.pop("dtypes", {})
        if isinstance(params.get("columns", None), dict):
            self.dtypes = params["columns"]
            params["columns"] = set(params["columns"].keys())

        super().__init__(**params)

    def _validate(self, val):

        super()._validate(val)

        invalid_dtypes = [(k, val.dtypes[k], v) for k, v in self.dtypes.items() if not val.dtypes[k] == v]
        if invalid_dtypes:
            errors = [f"column `{x[0]}` should be of type `{getattr(x[2], "__name__")}` but is `{x[1]}`." for x in invalid_dtypes]
            raise ValueError(
                f'{param._utils._validate_error_prefix(self)}: {" and, ".join(errors)} '
            )

class A(param.Parameterized):

    df = SuperDataFrame(
            default = pd.DataFrame({"int_data": [1,2,3], "float_data": [1.2, 3.4, 5.6]}),
            columns = {"int_data": int, "float_data": float},
        )

    def __init__(self, **params):
        super().__init__(**params)

    def invalid_data_1(self):
        self.df = pd.DataFrame({"int_data": [1.1,2.1,3.1], "float_data": [1.2, 3.4, 5.6]})

    def invalid_data_2(self):
        self.df["int_data"] = pd.Series(["1","2","3"], dtype = str)
        # self.param.trigger("df")


a = A()

a.invalid_data_2()
a.invalid_data_1()

>>> ValueError: SuperDataFrame parameter 'A.df': column `int_data` should be of type `int` but is `float64`.

I did notice however that the _validate method only gets called when I change the df entirely and NOT when I change only 1 column (as is done in .invalid_data_2()).

Is there any simple way to also trigger the check when a single columns gets changed, besides adding self.param.trigger("df") everywhere?

bertcoerver · February 14, 2024, 9:57am

Think my follow up question is answered here, and so self.param.trigger("df") is the way to go.