Strategy

class lightning.fabric.strategies.Strategy(accelerator=None, checkpoint_io=None, precision=None)[source]

Bases: ABC

Base class for all strategies that change the behaviour of the training, validation and test- loop.

_configure_launcher()[source]

Attach the launcher based on Strategy.

Return type

None

abstract all_gather(tensor, group=None, sync_grads=False)[source]

Perform an all_gather on all processes.

Parameters
  • tensor (Tensor) – the tensor to all_gather

  • group (Optional[Any]) – the process group to gather results from

  • sync_grads (bool) – flag that allows users to synchronize gradients for all_gather op

Return type

Tensor

abstract all_reduce(tensor, group=None, reduce_op='mean')[source]

Reduces the given tensor (e.g. across GPUs/processes).

Parameters
  • tensor (Union[Tensor, Any]) – the tensor to sync and reduce

  • group (Optional[Any]) – the process group to reduce

  • reduce_op (Union[ReduceOp, str, None]) – the reduction operation. Defaults to ‘mean’. Can also be a string ‘sum’ or ReduceOp.

Return type

Union[Tensor, Any]

backward(tensor, module, *args, **kwargs)[source]

Forwards backward-calls to the precision plugin.

Return type

None

abstract barrier(name=None)[source]

Synchronizes all processes which blocks processes until the whole group enters this function.

Parameters

name (Optional[str]) – an optional name to pass into barrier.

Return type

None

batch_to_device(batch, device=None)[source]

Moves the batch to the correct device.

The returned batch is of the same type as the input batch, just having all tensors on the correct device.

Parameters
  • batch (Any) – The batch of samples to move to the correct device

  • device (Optional[device]) – The target device

Return type

Any

abstract broadcast(obj, src=0)[source]

Broadcasts an object to all processes.

Parameters
  • obj (TypeVar(TBroadcast)) – the object to broadcast

  • src (int) – source rank

Return type

TypeVar(TBroadcast)

clip_gradients_norm(module, optimizer, max_norm, norm_type=2.0, error_if_nonfinite=True)[source]

Clip gradients by norm.

Return type

Tensor

clip_gradients_value(module, optimizer, clip_val)[source]

Clip gradients by value.

Return type

None

get_module_state_dict(module)[source]

Returns model state.

Return type

Dict[str, Union[Any, Tensor]]

get_optimizer_state(optimizer)[source]

Returns state of an optimizer.

Allows for syncing/collating optimizer state from processes in custom plugins.

Return type

Dict[str, Tensor]

load_checkpoint(path, state=None, strict=True)[source]

Load the contents from a checkpoint and restore the state of the given objects.

Parameters
  • path (Union[str, Path]) – A path to where the file is located

  • state (Union[Module, Optimizer, Dict[str, Union[Module, Optimizer, Any]], None]) –

    Can be one of:

    • A dictionary of objects whose state will be restored in-place from the checkpoint path.

    • None or the empty dict: The loaded checkpoint will be returned in full.

    • A Module instance, if the checkpoint file contains a raw module state dict.

    • A Optimizer instance, if the checkpoint file contains a raw optimizer state.

  • strict (bool) – Whether to enforce that the keys in state match the keys in the checkpoint.

Return type

Dict[str, Any]

Returns

The remaining items that were not restored into the given state dictionary. If no state dictionary is given, the full checkpoint will be returned.

load_module_state_dict(module, state_dict, strict=True)[source]

Loads the given state into the model.

Return type

None

module_init_context(empty_init=None)[source]

A context manager wrapping the model instantiation.

Here, the strategy can control how the parameters of the model get created (device, dtype) and or apply other patches to the model.

Parameters

empty_init (Optional[bool]) – Whether to initialize the model with empty weights (uninitialized memory). If None, the strategy will decide. Some strategies may not support all options.

Return type

ContextManager

abstract module_to_device(module)[source]

Moves the model to the correct device.

Return type

None

optimizer_step(optimizer, **kwargs)[source]

Performs the actual optimizer step.

Parameters
  • optimizer (Optimizable) – the optimizer performing the step

  • **kwargs (Any) – Any extra arguments to optimizer.step

Return type

Any

process_dataloader(dataloader)[source]

Wraps the dataloader if necessary.

Parameters

dataloader (DataLoader) – iterable. Ideally of type: torch.utils.data.DataLoader

Return type

DataLoader

reduce_boolean_decision(decision, all=True)[source]

Reduce a boolean decision across all processes.

Return type

bool

save_checkpoint(path, state, storage_options=None, filter=None)[source]

Save model, optimizer, and other state as a checkpoint file.

Parameters
  • path (Union[str, Path]) – A path to where the file(s) should be saved

  • state (Dict[str, Union[Module, Optimizer, Any]]) – A dictionary with contents to be saved. If the dict contains modules or optimizers, their state-dict will be retrieved and converted automatically.

  • storage_options (Optional[Any]) – Additional options for the CheckpointIO plugin

  • filter (Optional[Dict[str, Callable[[str, Any], bool]]]) – An optional dictionary containing filter callables that return a boolean indicating whether the given item should be saved (True) or filtered out (False). Each filter key should match a state key, where its filter will be applied to the state_dict generated.

Return type

None

setup_environment()[source]

Setup any processes or distributed connections.

This must be called by the framework at the beginning of every process, before any distributed communication takes place.

Return type

None

setup_module(module)[source]

Performs setup for the model, e.g., by wrapping it by another class.

Return type

Module

setup_module_and_optimizers(module, optimizers)[source]

Set up a model and multiple optimizers together.

The returned objects are expected to be in the same order they were passed in. The default implementation will call setup_module() and setup_optimizer() on the inputs.

Return type

Tuple[Module, List[Optimizer]]

setup_optimizer(optimizer)[source]

Performs setup for the optimizer, e.g., by wrapping it by another class.

Return type

Optimizer

teardown()[source]

This method is called to teardown the training process.

It is the right place to release memory and free other resources.

Return type

None

tensor_init_context()[source]

Controls how tensors get created (device, dtype).

Return type

ContextManager

abstract property is_global_zero: bool

Whether the current process is the rank zero process not only on the local node, but for all nodes.

abstract property root_device: device

Returns the root device.