Quickstart

This guide walks through the core MICE workflow with three examples of increasing complexity.

Minimal Example

Start with the simplest possible setup — minimizing the expected squared distance from a random variable:

import numpy as np
from mice import MICE
from mice.policy import DropRestartClipPolicy

def gradient(x, thetas):
    return x - thetas

def sampler(n):
    return np.random.randn(n, 1)

estimator = MICE(
    grad=gradient,
    sampler=sampler,
    eps=0.577,              # relative error tolerance (1/√3)
    min_batch=10,
    policy=DropRestartClipPolicy(
        drop_param=0.5,
        restart_param=0.0,
        max_hierarchy_size=100,
    ),
    max_cost=10_000,        # maximum gradient evaluations
    stop_crit_norm=1e-6,    # stopping criterion
)

x = np.array([10.0])
for iteration in range(100):
    g = estimator(x)
    if estimator.terminate:
        print(f"Terminated early: {estimator.terminate_reason}")
        break
    x = x - 0.1 * g
    print(f"Iteration {iteration}: x = {x[0]:.6f}")

Key points:

MICE is imported from mice (top-level re-export).
DropRestartClipPolicy controls index-set operations: Add, Drop, Restart, and Clip behavior.
eps controls the relative error tolerance. Smaller values mean tighter error control but more gradient evaluations.
max_cost bounds total gradient evaluations; the estimator sets terminate = True when exhausted.
stop_crit_norm triggers early termination when the estimated gradient norm drops below the square root of this threshold.

Finite-Sum Problems

For finite datasets (e.g., empirical risk minimization on a fixed training set), pass the data array directly as the sampler argument instead of a callable:

import numpy as np
from mice import MICE

# Example: linear regression on a fixed dataset
rng = np.random.default_rng(42)
n_samples, n_features = 500, 5
X = rng.normal(size=(n_samples, n_features))
true_w = rng.normal(size=n_features)
y = X @ true_w + 0.1 * rng.normal(size=n_samples)
data = np.column_stack([y, X])

def grad(x, thetas):
    """Vectorized gradient of ||y - Xw||^2 / (2n)."""
    y_batch = thetas[:, 0]
    X_batch = thetas[:, 1:]
    residuals = X_batch @ x - y_batch
    return (X_batch * residuals[:, None]) / n_samples

estimator = MICE(
    grad=grad,
    sampler=data,  # pass array, not callable
    eps=0.577,
    min_batch=10,
    max_cost=50_000,
    stop_crit_norm=1e-6,
)

x = np.zeros(n_features)
for _ in range(200):
    g = estimator(x)
    if estimator.terminate:
        break
    x = x - 0.05 * g

When sampler is an array, MICE automatically detects the finite-sum setting and uses without-replacement sampling with optimized sample-size formulas that account for the finite population correction.

Policy Configuration

Fine-tune index-set management with DropRestartClipPolicy:

from mice.policy import DropRestartClipPolicy

policy = DropRestartClipPolicy(
    drop_param=0.5,           # threshold for dropping last iterate
    restart_param=0.0,        # threshold for restarting hierarchy
    max_hierarchy_size=100,   # maximum |L_k|
    clip_type="full",         # "full", "all", or None (disabled)
    aggr_cost=0.1,            # aggregation cost factor
)

estimator = MICE(grad=gradient, sampler=sampler, policy=policy)

Each parameter:

drop_param (float, default 0.5): Controls how aggressively the Drop operator removes near-redundant iterates. Higher values make dropping more likely.
restart_param (float, default 0.0): Controls Restart sensitivity. Non-zero values allow restarts even when the cost improvement is minor.
max_hierarchy_size (int, default 1000): Caps the number of retained iterates to bound memory and computation.
clip_type (str or None, default None): "full" clips when a level reaches the finite-data ceiling; "all" evaluates all possible clip points and picks the cheapest; None disables clipping.
aggr_cost (float, default 0.1): Penalty per level in cost computations, encouraging smaller hierarchies.

Resampling-Based Norm Estimation

Enable robust norm estimation for sizing and stopping:

estimator = MICE(
    grad=gradient,
    sampler=sampler,
    use_resampling=True,     # enabled by default
    re_part=5,               # number of jackknife partitions
    re_quantile=0.05,        # quantile for the error tolerance
    re_tot_cost=0.2,         # resampling cost budget fraction
    stop_crit_prob=0.05,     # probability threshold for stopping rule
)