# Clean Room (Python)

> **Beta Access** — Clean Room is currently in beta and not yet generally available. To request early access, email <info@gofigr.io>.

> Working in R instead? See [Clean Room (R)](/features/clean-room-r.md).

## Overview

Clean Room turns a Python function into a self-contained, reproducible, interactive application. You decorate a function with `@reproducible`, and GoFigr draws a clean boundary around it: the function can only access the variables you pass in, the packages you declare, and nothing else. When the function produces a figure, everything—code, data, parameters, environment—is captured and published automatically.

The workflow:

1. **Explore** — work however you normally work in Jupyter
2. **Distill** — pull the core logic into a `@reproducible` function
3. **Run** — call the function; GoFigr captures everything automatically
4. **Share** — send a link; stakeholders interact with the figure in the browser

## Quick Start

Load the GoFigr extension and configure it:

```python
%load_ext gofigr
# configure() # only needed if overriding defaults
```

The `%load_ext gofigr` magic injects `reproducible`, `SliderParam`, `DropdownParam`, and other names into your notebook namespace.

### Simplest Example

```python
import seaborn as sns
penguins = sns.load_dataset("penguins")

@reproducible
def flipper_histogram(data, bins: int = 20):
    sns.histplot(data=data, x='flipper_length_mm', bins=bins)

flipper_histogram(penguins)
```

That's it. When the function runs, GoFigr captures:

* **Source code** — the function body, extracted from the notebook cell
* **Parameters** — types, defaults, and values for every argument
* **Data** — DataFrames passed as arguments, serialized alongside the revision
* **Environment** — package names and versions, Python version
* **Output** — the figures produced by the run

## Interactive Mode

Add `interactive=True` to render parameter widgets directly in Jupyter. Changing a widget re-runs the function immediately.

Requires the `anywidget` package:

```bash
pip install anywidget
```

**JupyterLab** — Restart JupyterLab (not just the kernel) and hard-reload the browser so the frontend extension registers.

**Classic Notebook (Jupyter Notebook <7)** — You must also enable the nbextension manually:

```bash
jupyter nbextension install --py anywidget --sys-prefix
jupyter nbextension enable --py anywidget --sys-prefix
```

Then restart the notebook server and refresh the browser.

If widgets still don't render in either environment, run the built-in health check:

```python
from gofigr.reproducible import check_anywidget_health
check_anywidget_health()
```

This verifies the Python package, traitlets, the Jupyter kernel, and prints guidance for fixing the frontend extension.

### Full Example

```python
from typing import Literal
import seaborn as sns

penguins = sns.load_dataset("penguins")

@reproducible(interactive=True)
def flipper_length_distribution(
    data,
    bins: int     = SliderParam(20, min=5, max=100, step=5),
    alpha: float  = SliderParam(0.7, min=0.1, max=1.0, step=0.05),
    show_kde: Literal["yes", "no", "auto"] = "yes",
    species: str  = DropdownParam("Adelie", choices=["Adelie", "Chinstrap", "Gentoo"]),
    show_grid: bool = True,
    title: str    = "Flipper Length Distribution"
):
    filtered = data[data['species'] == species]
    kde = True if show_kde == "yes" else (False if show_kde == "no" else None)

    ax = sns.histplot(
        data=filtered,
        x='flipper_length_mm',
        bins=bins,
        alpha=alpha,
        kde=kde,
    )
    ax.set_title(title)
    if show_grid:
        ax.grid(True, alpha=0.3)

flipper_length_distribution(penguins)
```

In Jupyter, this renders sliders, dropdowns, a checkbox, and a text input above the figure. Adjusting any control re-executes the function and updates the plot.

## Parameter Widgets

GoFigr maps Python types to interactive controls. You can rely on auto-inference or use explicit `Param` classes for more control.

### Auto-Inference

| Default value type       | Widget             | Example                         |
| ------------------------ | ------------------ | ------------------------------- |
| `int` or `float`         | Slider             | `bins: int = 20`                |
| `bool`                   | Checkbox           | `show_grid: bool = True`        |
| `str`                    | Text input         | `title: str = "My Chart"`       |
| `Literal[...]` type hint | Dropdown           | `mode: Literal["a", "b"] = "a"` |
| `pd.DataFrame`           | Static (read-only) | Passed at call time             |

### SliderParam

Numeric slider with explicit bounds.

```python
bins: int = SliderParam(20, min=5, max=100, step=5)
alpha: float = SliderParam(0.7, min=0.1, max=1.0, step=0.05)
```

If you omit bounds, they are resolved automatically:

|          | `int`                 | `float`               |
| -------- | --------------------- | --------------------- |
| **min**  | 0                     | 0                     |
| **max**  | `max(value * 2, 100)` | `max(value * 2, 1.0)` |
| **step** | 1                     | 0.1                   |

### DropdownParam

Categorical dropdown with explicit choices.

```python
species: str = DropdownParam("Adelie", choices=["Adelie", "Chinstrap", "Gentoo"])
```

You can also use a `Literal` type hint to create a dropdown automatically without `DropdownParam`:

```python
from typing import Literal

show_kde: Literal["yes", "no", "auto"] = "yes"
```

### CheckboxParam

Boolean toggle. Auto-inferred for `bool` defaults—you rarely need this explicitly.

```python
show_grid: bool = True  # auto-inferred as checkbox
```

### TextParam

Free-form text input. Auto-inferred for `str` defaults.

```python
title: str = "Flipper Length Distribution"  # auto-inferred as text input
```

### StaticParam

For DataFrames and other complex objects. Read-only in interactive mode—no widget is rendered. The value is available in the Clean Room studio for inspection.

```python
data = penguins  # DataFrame passed at call time → StaticParam
```

## Custom Packages

By default, the clean room environment includes:

| Alias | Package             |
| ----- | ------------------- |
| `pd`  | `pandas`            |
| `np`  | `numpy`             |
| `plt` | `matplotlib.pyplot` |
| `sns` | `seaborn`           |

### Adding Packages

Use the `packages` argument to add more. By default, your packages are merged with the defaults:

```python
@reproducible(packages={"gg": "plotnine"})
def penguin_plot(data, bins: int = 20):
    from plotnine import ggplot, aes, geom_histogram
    plot = (ggplot(data, aes(x='flipper_length_mm'))
            + geom_histogram(bins=bins))
    display(plot)
```

### Replacing Default Packages

Set `merge_packages=False` to use only the packages you specify:

```python
@reproducible(packages={"pd": "pandas", "gg": "plotnine"}, merge_packages=False)
def my_plot(data):
    ...
```

### Global Package Configuration

To change defaults for all `@reproducible` functions in a session:

```python
from gofigr.reproducible import set_default_packages, reset_default_packages

# Add plotnine to defaults
set_default_packages({"gg": "plotnine"})

# Replace defaults entirely
set_default_packages({"pd": "pandas", "gg": "plotnine"}, merge=False)

# Reset to built-in defaults
reset_default_packages()
```

## Publishing

### Automatic Capture (Jupyter)

With `configure(auto_publish=True)` (the default), figures are published automatically when a `@reproducible` function runs. No extra code needed.

### Explicit Publishing

For more control, use the `publisher` argument and call `publish()` inside the function:

```python
from gofigr.publisher import Publisher

pub = Publisher(workspace="Analytics", analysis="Penguins")

@reproducible(publisher=pub)
def flipper_histogram(data, bins: int = 20):
    sns.histplot(data=data, x='flipper_length_mm', bins=bins)
    publish(plt.gcf(), target="Flipper Histogram")

flipper_histogram(penguins)
```

The `publish()` function is injected into the clean room globals automatically. If a `publisher` is provided, it is used; otherwise the active GoFigr Jupyter extension's publisher is used. In scripts (no Jupyter extension), pass `publisher=` explicitly.

### What Gets Stored

Each published revision includes:

* **Source code** — the function body
* **Manifest** — JSON with parameter types, widget config, imports, and package versions
* **DataFrame parameters** — serialized as Parquet
* **Revision flag** — marks the revision as a Clean Room revision

## Nested Calls

Each `@reproducible` call gets its own context. If you nest `@reproducible` functions, the innermost context wins for `publish()`. The context is reset after the function returns or raises an exception.

## Edge Cases and Caveats

**Clean room isolation** — The function cannot access module-level variables from your notebook. Only declared packages, builtins, and function arguments are available. This is by design: it ensures the function is self-contained and reproducible.

**DataFrames are copied** — DataFrame arguments are round-tripped through Parquet serialization. The function receives a deserialized copy, not the original object. This ensures the clean room version matches what gets stored.

**100 MB limit** — Total DataFrame size (estimated via `memory_usage(deep=True)`) must be under 100 MB. If exceeded, a warning is issued and clean room is skipped—the function still runs normally but without isolation or capture.

**Unsupported parameter types** — Custom objects, numpy arrays, and lambdas cannot be serialized. If detected, a warning is issued and the function falls back to direct execution (no clean room).

**`interactive=True` outside Jupyter** — A warning is issued and the function runs non-interactively.

**`plt.show()` auto-called** — If matplotlib figures exist after execution, `plt.show()` is called automatically. You don't need to call it yourself.

**Source code extraction** — The function must be defined in importable source (a notebook cell or `.py` file). Dynamically created functions (e.g., via `exec`) are not supported.

**Return values** — In non-interactive mode, the function's return value is returned normally. In interactive mode, the return value is `None` (the output is rendered in the widget).

## Usage in Scripts

Clean Room works outside Jupyter with a few differences:

* **No interactive mode** — `interactive=True` is ignored (with a warning)
* **No automatic capture** — you must use the `publisher` argument explicitly
* **Explicit imports** — import from `gofigr.reproducible` instead of relying on the `%load_ext` magic

```python
from gofigr.publisher import Publisher
from gofigr.reproducible import reproducible, SliderParam
import seaborn as sns

penguins = sns.load_dataset("penguins")
pub = Publisher(workspace="Analytics", analysis="Penguins")

@reproducible(publisher=pub)
def flipper_histogram(data, bins: int = SliderParam(20, min=5, max=100, step=5)):
    sns.histplot(data=data, x='flipper_length_mm', bins=bins)
    publish(plt.gcf(), target="Flipper Histogram")

flipper_histogram(penguins)
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.gofigr.io/features/clean-room-python.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
