.. _quickstart:

Quickstart
==========

This guide walks through the core MICE workflow with three examples of
increasing complexity.

Minimal Example
---------------

Start with the simplest possible setup — minimizing the expected squared
distance from a random variable:

.. code-block:: python

   import numpy as np
   from mice import MICE
   from mice.policy import DropRestartClipPolicy

   def gradient(x, thetas):
       return x - thetas

   def sampler(n):
       return np.random.randn(n, 1)

   estimator = MICE(
       grad=gradient,
       sampler=sampler,
       eps=0.577,              # relative error tolerance (1/√3)
       min_batch=10,
       policy=DropRestartClipPolicy(
           drop_param=0.5,
           restart_param=0.0,
           max_hierarchy_size=100,
       ),
       max_cost=10_000,        # maximum gradient evaluations
       stop_crit_norm=1e-6,    # stopping criterion
   )

   x = np.array([10.0])
   for iteration in range(100):
       g = estimator(x)
       if estimator.terminate:
           print(f"Terminated early: {estimator.terminate_reason}")
           break
       x = x - 0.1 * g
       print(f"Iteration {iteration}: x = {x[0]:.6f}")

Key points:

- ``MICE`` is imported from ``mice`` (top-level re-export).
- ``DropRestartClipPolicy`` controls index-set operations: Add, Drop, Restart,
  and Clip behavior.
- ``eps`` controls the relative error tolerance. Smaller values mean tighter
  error control but more gradient evaluations.
- ``max_cost`` bounds total gradient evaluations; the estimator sets
  ``terminate = True`` when exhausted.
- ``stop_crit_norm`` triggers early termination when the estimated gradient
  norm drops below the square root of this threshold.

Finite-Sum Problems
-------------------

For finite datasets (e.g., empirical risk minimization on a fixed training
set), pass the data array directly as the ``sampler`` argument instead of a
callable:

.. code-block:: python

   import numpy as np
   from mice import MICE

   # Example: linear regression on a fixed dataset
   rng = np.random.default_rng(42)
   n_samples, n_features = 500, 5
   X = rng.normal(size=(n_samples, n_features))
   true_w = rng.normal(size=n_features)
   y = X @ true_w + 0.1 * rng.normal(size=n_samples)
   data = np.column_stack([y, X])

   def grad(x, thetas):
       """Vectorized gradient of ||y - Xw||^2 / (2n)."""
       y_batch = thetas[:, 0]
       X_batch = thetas[:, 1:]
       residuals = X_batch @ x - y_batch
       return (X_batch * residuals[:, None]) / n_samples

   estimator = MICE(
       grad=grad,
       sampler=data,  # pass array, not callable
       eps=0.577,
       min_batch=10,
       max_cost=50_000,
       stop_crit_norm=1e-6,
   )

   x = np.zeros(n_features)
   for _ in range(200):
       g = estimator(x)
       if estimator.terminate:
           break
       x = x - 0.05 * g

When ``sampler`` is an array, MICE automatically detects the finite-sum
setting and uses without-replacement sampling with optimized sample-size
formulas that account for the finite population correction.

Policy Configuration
--------------------

Fine-tune index-set management with ``DropRestartClipPolicy``:

.. code-block:: python

   from mice.policy import DropRestartClipPolicy

   policy = DropRestartClipPolicy(
       drop_param=0.5,           # threshold for dropping last iterate
       restart_param=0.0,        # threshold for restarting hierarchy
       max_hierarchy_size=100,   # maximum |L_k|
       clip_type="full",         # "full", "all", or None (disabled)
       aggr_cost=0.1,            # aggregation cost factor
   )

   estimator = MICE(grad=gradient, sampler=sampler, policy=policy)

Each parameter:

- ``drop_param`` (float, default 0.5): Controls how aggressively the Drop
  operator removes near-redundant iterates. Higher values make dropping more
  likely.
- ``restart_param`` (float, default 0.0): Controls Restart sensitivity.
  Non-zero values allow restarts even when the cost improvement is minor.
- ``max_hierarchy_size`` (int, default 1000): Caps the number of retained
  iterates to bound memory and computation.
- ``clip_type`` (str or None, default None): ``"full"`` clips when a level
  reaches the finite-data ceiling; ``"all"`` evaluates all possible clip
  points and picks the cheapest; ``None`` disables clipping.
- ``aggr_cost`` (float, default 0.1): Penalty per level in cost computations,
  encouraging smaller hierarchies.

Resampling-Based Norm Estimation
--------------------------------

Enable robust norm estimation for sizing and stopping:

.. code-block:: python

   estimator = MICE(
       grad=gradient,
       sampler=sampler,
       use_resampling=True,     # enabled by default
       re_part=5,               # number of jackknife partitions
       re_quantile=0.05,        # quantile for the error tolerance
       re_tot_cost=0.2,         # resampling cost budget fraction
       stop_crit_prob=0.05,     # probability threshold for stopping rule
   )