gluonnlp.utils

GluonNLP Toolkit provides tools for easily setting up task specific loss.

File Handling

glob

Return a list of paths matching a pathname pattern.

mkdir

Create a directory.

Parameter and Training

clip_grad_global_norm

Rescales gradients of parameters so that the sum of their 2-norm is smaller than max_norm.

Serialization and Deserialization

load_parameters

Load parameters from file previously saved by save_parameters.

load_states

Loads trainer states (e.g.

save_parameters

Save parameters to file.

save_states

Saves trainer states (e.g.

Setting Seed

set_seed

Sets the seed for reproducibility

API Reference

Module for utility functions.

class gluonnlp.utils.Parallelizable[source]

Base class for parallelizable unit of work, which can be invoked by Parallel. The subclass must implement the forward_backward method, and be used together with Parallel. For example:

class ParallelNet(Parallelizable):
    def __init__(self):
        self._net = Model()
        self._loss = gluon.loss.SoftmaxCrossEntropyLoss()

    def forward_backward(self, x):
        data, label = x
        with mx.autograd.record():
            out = self._net(data)
            loss = self._loss(out, label)
        loss.backward()
        return loss

net = ParallelNet()
ctx = [mx.gpu(0), mx.gpu(1)]
parallel = Parallel(len(ctx), net)
# Gluon block is initialized after forwarding the first batch
initialized = False

for batch in batches:
    for x in gluon.utils.split_and_load(batch, ctx):
        parallel.put(x)
    losses = [parallel.get() for _ in ctx]
    trainer.step()
forward_backward(x)[source]

Forward and backward computation.

class gluonnlp.utils.Parallel(num_workers, parallizable, serial_init=True)[source]

Class for parallel processing with Parallelizable`s. It invokes a `Parallelizable with multiple Python threads. For example:

class ParallelNet(Parallelizable):
    def __init__(self):
        self._net = Model()
        self._loss = gluon.loss.SoftmaxCrossEntropyLoss()

    def forward_backward(self, x):
        data, label = x
        mx.autograd.record():
            out = self._net(data)
            loss = self._loss(out, label)
        loss.backward()
        return loss

net = ParallelNet()
ctx = [mx.gpu(0), mx.gpu(1)]
parallel = Parallel(len(ctx), net)

for batch in batches:
    for x in gluon.utils.split_and_load(batch, ctx):
        parallel.put(x)
    losses = [parallel.get() for _ in ctx]
    trainer.step()
Parameters
  • num_workers (int) – Number of worker threads. If set to 0, the main thread is used as the worker for debugging purpose.

  • parallelizable – Parallelizable net whose forward and backward methods are invoked by multiple worker threads.

  • serial_init (bool, default True) – Execute the first num_workers inputs in main thread, so that the Block used in parallizable is initialized serially. Initialize a Block with multiple threads may cause unexpected behavior.

get()[source]

Get an output of previous parallizable.forward_backward calls. This method blocks if none of previous parallizable.forward_backward calls have return any result.

put(x)[source]

Assign input x to an available worker and invoke parallizable.forward_backward with x.

gluonnlp.utils.grad_global_norm(parameters, max_norm=None)[source]

Calculate the 2-norm of gradients of parameters, and how much they should be scaled down such that their 2-norm does not exceed max_norm, if max_norm if provided.

If gradients exist for more than one context for a parameter, user needs to explicitly call trainer.allreduce_grads so that the gradients are summed first before calculating the 2-norm.

Note

This function is only for use when update_on_kvstore is set to False in trainer.

Example:

trainer = Trainer(net.collect_params(), update_on_kvstore=False, ...)
for x, y in mx.gluon.utils.split_and_load(X, [mx.gpu(0), mx.gpu(1)]):
    with mx.autograd.record():
        y = net(x)
        loss = loss_fn(y, label)
    loss.backward()
trainer.allreduce_grads()
norm = grad_global_norm(net.collect_params().values())
...
Parameters
  • parameters (list of Parameters) –

  • max_norm (NDArray, optional) – The maximum L2 norm threshold. If provided, ratio and is_finite will be returned.

Returns

  • NDArray – Total norm. Shape is (1,)

  • NDArray – Ratio for rescaling gradients based on max_norm s.t. grad = grad / ratio. If total norm is NaN, ratio will be NaN, too. Returned if max_norm is provided. Shape is (1,)

  • NDArray – Whether the total norm is finite, returned if max_norm is provided. Shape is (1,)

gluonnlp.utils.clip_grad_global_norm(parameters, max_norm, check_isfinite=True)[source]

Rescales gradients of parameters so that the sum of their 2-norm is smaller than max_norm. If gradients exist for more than one context for a parameter, user needs to explicitly call trainer.allreduce_grads so that the gradients are summed first before calculating the 2-norm.

Note

This function is only for use when update_on_kvstore is set to False in trainer. In cases where training happens on multiple contexts, this method should be used in conjunction with trainer.allreduce_grads() and trainer.update(). (not trainer.step())

Example:

trainer = Trainer(net.collect_params(), update_on_kvstore=False, ...)
for x, y in mx.gluon.utils.split_and_load(X, [mx.gpu(0), mx.gpu(1)]):
    with mx.autograd.record():
        y = net(x)
        loss = loss_fn(y, label)
    loss.backward()
trainer.allreduce_grads()
nlp.utils.clip_grad_global_norm(net.collect_params().values(), max_norm)
trainer.update(batch_size)
...
Parameters
  • parameters (list of Parameters) –

  • max_norm (float) –

  • check_isfinite (bool, default True) – If True, check that the total_norm is finite (not nan or inf). This requires a blocking .asscalar() call.

Returns

Total norm. Return type is NDArray of shape (1,) if check_isfinite is False. Otherwise a float is returned.

Return type

NDArray or float

gluonnlp.utils.save_parameters(model, filename)[source]

Save parameters to file.

Saved parameters can only be loaded with Block.load_parameters. Note that this method only saves parameters, not model structure.

Both local file system path and S3 URI are supported. For example, ‘s3://mybucket/folder/net.params’, ‘./folder/net.params’.

Parameters
  • model (mx.gluon.Block) – The model to save.

  • uri (str) – Path to file.

gluonnlp.utils.save_states(trainer, fname)[source]

Saves trainer states (e.g. optimizer, momentum) to a file.

Both local file system path and S3 URI are supported. For example, ‘s3://mybucket/folder/net.states’, ‘./folder/net.states’.

Parameters
  • trainer (mxnet.gluon.Trainer) – The trainer whose states will be saved.

  • fname (str) – Path to output states file.

Note

optimizer.param_dict, which contains Parameter information (such as lr_mult and wd_mult) will not be saved.

gluonnlp.utils.load_parameters(model, filename, ctx=None, allow_missing=False, ignore_extra=False, cast_dtype=None)[source]

Load parameters from file previously saved by save_parameters.

Both local file system path and S3 URI are supported. For example, ‘s3://mybucket/folder/net.params’, ‘./folder/net.params’.

Parameters
  • filename (str) – Path to parameter file.

  • ctx (Context or list of Context, default cpu()) – Context(s) to initialize loaded parameters on.

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.

gluonnlp.utils.load_states(trainer, fname)[source]

Loads trainer states (e.g. optimizer, momentum) from a file.

Both local file system path and S3 URI are supported. For example, ‘s3://mybucket/folder/net.states’, ‘./folder/net.states’.

Parameters
  • trainer (mxnet.gluon.Trainer) – The trainer whose states will be loaded.

  • fname (str) – Path to input states file.

Note

optimizer.param_dict, which contains Parameter information (such as lr_mult and wd_mult) will not be loaded from the file, but rather set based on current Trainer’s parameters.

gluonnlp.utils.mkdir(dirname)[source]

Create a directory.

Parameters

dirname (str) – The name of the target directory to create.

gluonnlp.utils.glob(url, separator=', ')[source]

Return a list of paths matching a pathname pattern.

The pattern may contain simple shell-style wildcards. Input may also include multiple patterns, separated by separator.

Parameters
  • url (str) – The name of the files

  • separator (str, default is ',') – The separator in url to allow multiple patterns in the input

gluonnlp.utils.remove(filename)[source]

Remove a file

Parameters

filename (str) – The name of the target file to remove

gluonnlp.utils.check_version(min_version, warning_only=False, library=None)[source]

Check the version of gluonnlp satisfies the provided minimum version. An exception is thrown if the check does not pass.

Parameters
  • min_version (str) – Minimum version

  • warning_only (bool) – Printing a warning instead of throwing an exception.

  • library (optional module, default None) – The target library for version check. Checks gluonnlp by default

gluonnlp.utils.set_seed(seed=0)[source]

Sets the seed for reproducibility

Parameters

seed (int) – Value of the seed to set