# Clean Room (Python)

> **Beta Access** — Clean Room is currently in beta and not yet generally available. To request early access, email <info@gofigr.io>.

> Working in R instead? See [Clean Room (R)](https://docs.gofigr.io/features/clean-room-r).

## Overview

Clean Room turns a Python function into a self-contained, reproducible, interactive application. You decorate a function with `@reproducible`, and GoFigr draws a clean boundary around it: the function can only access the variables you pass in, the packages you declare, and nothing else. When the function produces a figure, everything—code, data, parameters, environment—is captured and published automatically.

The workflow:

1. **Explore** — work however you normally work in Jupyter
2. **Distill** — pull the core logic into a `@reproducible` function
3. **Run** — call the function; GoFigr captures everything automatically
4. **Share** — send a link; stakeholders interact with the figure in the browser

## Quick Start

Load the GoFigr extension and configure it:

```python
%load_ext gofigr
# configure() # only needed if overriding defaults
```

The `%load_ext gofigr` magic injects `reproducible`, `SliderParam`, `DropdownParam`, and other names into your notebook namespace.

### Simplest Example

```python
import seaborn as sns
penguins = sns.load_dataset("penguins")

@reproducible
def flipper_histogram(data, bins: int = 20):
    sns.histplot(data=data, x='flipper_length_mm', bins=bins)

flipper_histogram(penguins)
```

That's it. When the function runs, GoFigr captures:

* **Source code** — the function body, extracted from the notebook cell
* **Parameters** — types, defaults, and values for every argument
* **Data** — DataFrames passed as arguments, serialized alongside the revision
* **Environment** — package names and versions, Python version
* **Output** — the figures produced by the run

## Interactive Mode

Add `interactive=True` to render parameter widgets directly in Jupyter. Changing a widget re-runs the function immediately.

Requires the `anywidget` package:

```bash
pip install anywidget
```

**JupyterLab** — Restart JupyterLab (not just the kernel) and hard-reload the browser so the frontend extension registers.

**Classic Notebook (Jupyter Notebook <7)** — You must also enable the nbextension manually:

```bash
jupyter nbextension install --py anywidget --sys-prefix
jupyter nbextension enable --py anywidget --sys-prefix
```

Then restart the notebook server and refresh the browser.

If widgets still don't render in either environment, run the built-in health check:

```python
from gofigr.reproducible import check_anywidget_health
check_anywidget_health()
```

This verifies the Python package, traitlets, the Jupyter kernel, and prints guidance for fixing the frontend extension.

### Full Example

```python
from typing import Literal
import seaborn as sns

penguins = sns.load_dataset("penguins")

@reproducible(interactive=True)
def flipper_length_distribution(
    data,
    bins: int     = SliderParam(20, min=5, max=100, step=5),
    alpha: float  = SliderParam(0.7, min=0.1, max=1.0, step=0.05),
    show_kde: Literal["yes", "no", "auto"] = "yes",
    species: str  = DropdownParam("Adelie", choices=["Adelie", "Chinstrap", "Gentoo"]),
    show_grid: bool = True,
    title: str    = "Flipper Length Distribution"
):
    filtered = data[data['species'] == species]
    kde = True if show_kde == "yes" else (False if show_kde == "no" else None)

    ax = sns.histplot(
        data=filtered,
        x='flipper_length_mm',
        bins=bins,
        alpha=alpha,
        kde=kde,
    )
    ax.set_title(title)
    if show_grid:
        ax.grid(True, alpha=0.3)

flipper_length_distribution(penguins)
```

In Jupyter, this renders sliders, dropdowns, a checkbox, and a text input above the figure. Adjusting any control re-executes the function and updates the plot.

## Parameter Widgets

GoFigr maps Python types to interactive controls. You can rely on auto-inference or use explicit `Param` classes for more control.

### Auto-Inference

| Default value type       | Widget             | Example                         |
| ------------------------ | ------------------ | ------------------------------- |
| `int` or `float`         | Slider             | `bins: int = 20`                |
| `bool`                   | Checkbox           | `show_grid: bool = True`        |
| `str`                    | Text input         | `title: str = "My Chart"`       |
| `Literal[...]` type hint | Dropdown           | `mode: Literal["a", "b"] = "a"` |
| `pd.DataFrame`           | Static (read-only) | Passed at call time             |

### SliderParam

Numeric slider with explicit bounds.

```python
bins: int = SliderParam(20, min=5, max=100, step=5)
alpha: float = SliderParam(0.7, min=0.1, max=1.0, step=0.05)
```

If you omit bounds, they are resolved automatically:

|          | `int`                 | `float`               |
| -------- | --------------------- | --------------------- |
| **min**  | 0                     | 0                     |
| **max**  | `max(value * 2, 100)` | `max(value * 2, 1.0)` |
| **step** | 1                     | 0.1                   |

### DropdownParam

Categorical dropdown with explicit choices.

```python
species: str = DropdownParam("Adelie", choices=["Adelie", "Chinstrap", "Gentoo"])
```

You can also use a `Literal` type hint to create a dropdown automatically without `DropdownParam`:

```python
from typing import Literal

show_kde: Literal["yes", "no", "auto"] = "yes"
```

### CheckboxParam

Boolean toggle. Auto-inferred for `bool` defaults—you rarely need this explicitly.

```python
show_grid: bool = True  # auto-inferred as checkbox
```

### TextParam

Free-form text input. Auto-inferred for `str` defaults.

```python
title: str = "Flipper Length Distribution"  # auto-inferred as text input
```

### StaticParam

For DataFrames and other complex objects. Read-only in interactive mode—no widget is rendered. The value is available in the Clean Room studio for inspection.

```python
data = penguins  # DataFrame passed at call time → StaticParam
```

## Custom Packages

By default, the clean room environment includes:

| Alias | Package             |
| ----- | ------------------- |
| `pd`  | `pandas`            |
| `np`  | `numpy`             |
| `plt` | `matplotlib.pyplot` |
| `sns` | `seaborn`           |

### Adding Packages

Use the `packages` argument to add more. By default, your packages are merged with the defaults:

```python
@reproducible(packages={"gg": "plotnine"})
def penguin_plot(data, bins: int = 20):
    from plotnine import ggplot, aes, geom_histogram
    plot = (ggplot(data, aes(x='flipper_length_mm'))
            + geom_histogram(bins=bins))
    display(plot)
```

### Replacing Default Packages

Set `merge_packages=False` to use only the packages you specify:

```python
@reproducible(packages={"pd": "pandas", "gg": "plotnine"}, merge_packages=False)
def my_plot(data):
    ...
```

### Global Package Configuration

To change defaults for all `@reproducible` functions in a session:

```python
from gofigr.reproducible import set_default_packages, reset_default_packages

# Add plotnine to defaults
set_default_packages({"gg": "plotnine"})

# Replace defaults entirely
set_default_packages({"pd": "pandas", "gg": "plotnine"}, merge=False)

# Reset to built-in defaults
reset_default_packages()
```

## Publishing

### Automatic Capture (Jupyter)

With `configure(auto_publish=True)` (the default), figures are published automatically when a `@reproducible` function runs. No extra code needed.

### Explicit Publishing

For more control, use the `publisher` argument and call `publish()` inside the function:

```python
from gofigr.publisher import Publisher

pub = Publisher(workspace="Analytics", analysis="Penguins")

@reproducible(publisher=pub)
def flipper_histogram(data, bins: int = 20):
    sns.histplot(data=data, x='flipper_length_mm', bins=bins)
    publish(plt.gcf(), target="Flipper Histogram")

flipper_histogram(penguins)
```

The `publish()` function is injected into the clean room globals automatically. If a `publisher` is provided, it is used; otherwise the active GoFigr Jupyter extension's publisher is used. In scripts (no Jupyter extension), pass `publisher=` explicitly.

### What Gets Stored

Each published revision includes:

* **Source code** — the function body
* **Manifest** — JSON with parameter types, widget config, imports, and package versions
* **DataFrame parameters** — serialized as Parquet
* **Revision flag** — marks the revision as a Clean Room revision

## Nested Calls

Each `@reproducible` call gets its own context. If you nest `@reproducible` functions, the innermost context wins for `publish()`. The context is reset after the function returns or raises an exception.

## Edge Cases and Caveats

**Clean room isolation** — The function cannot access module-level variables from your notebook. Only declared packages, builtins, and function arguments are available. This is by design: it ensures the function is self-contained and reproducible.

**DataFrames are copied** — DataFrame arguments are round-tripped through Parquet serialization. The function receives a deserialized copy, not the original object. This ensures the clean room version matches what gets stored.

**100 MB limit** — Total DataFrame size (estimated via `memory_usage(deep=True)`) must be under 100 MB. If exceeded, a warning is issued and clean room is skipped—the function still runs normally but without isolation or capture.

**Unsupported parameter types** — Custom objects, numpy arrays, and lambdas cannot be serialized. If detected, a warning is issued and the function falls back to direct execution (no clean room).

**`interactive=True` outside Jupyter** — A warning is issued and the function runs non-interactively.

**`plt.show()` auto-called** — If matplotlib figures exist after execution, `plt.show()` is called automatically. You don't need to call it yourself.

**Source code extraction** — The function must be defined in importable source (a notebook cell or `.py` file). Dynamically created functions (e.g., via `exec`) are not supported.

**Return values** — In non-interactive mode, the function's return value is returned normally. In interactive mode, the return value is `None` (the output is rendered in the widget).

## Usage in Scripts

Clean Room works outside Jupyter with a few differences:

* **No interactive mode** — `interactive=True` is ignored (with a warning)
* **No automatic capture** — you must use the `publisher` argument explicitly
* **Explicit imports** — import from `gofigr.reproducible` instead of relying on the `%load_ext` magic

```python
from gofigr.publisher import Publisher
from gofigr.reproducible import reproducible, SliderParam
import seaborn as sns

penguins = sns.load_dataset("penguins")
pub = Publisher(workspace="Analytics", analysis="Penguins")

@reproducible(publisher=pub)
def flipper_histogram(data, bins: int = SliderParam(20, min=5, max=100, step=5)):
    sns.histplot(data=data, x='flipper_length_mm', bins=bins)
    publish(plt.gcf(), target="Flipper Histogram")

flipper_histogram(penguins)
```
