Welcome to MXFusion’s documentation!¶
MXFusion is a library for integrating probabilistic modelling with deep learning.
MXFusion helps you rapidly build and test new methods at scale, by focusing on the modularity of probabilistic models and their integration with modern deep learning techniques.
Installation¶
Dependencies / Prerequisites¶
MXFusion’s primary dependencies are MXNet >= 1.2 and Networkx >= 2.1. See requirements.
Supported Architectures / Versions¶
MXFusion is tested on Python 3.5+ on MacOS and Amazon Linux.
pip¶
If you just want to use MXFusion and not modify the source, you can install through pip:
pip install mxfusion
From source¶
To install MXFusion from source, after cloning the repository run the following from the top-level directory:
pip install .
Tutorials!¶
Below is a list of tutorial / example notebooks demonstrating MXFusion’s functionality.
Getting Started¶
Zhenwen Dai (2018.10.22)
# Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License").
# You may not use this file except in compliance with the License.
# A copy of the License is located at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# or in the "license" file accompanying this file. This file is distributed
# on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
# express or implied. See the License for the specific language governing
# permissions and limitations under the License.
# ==============================================================================
Introduction¶
MXFusion is a probabilistic programming language. It provides a convenient interface for designing probabilistic models and applying them to real world problems.
Probabilistic models describe the relationships in data through probabilistic distributions of random variables. Probabilistic modeling is typically done by stating your prior belief about the data in terms of a probabilistic model and performing inference with the observations of some of the random variables.
In [1]:
import warnings
warnings.filterwarnings('ignore')
import os
os.environ['MXNET_ENGINE_TYPE'] = 'NaiveEngine'
A Simple Example¶
Let’s start with a toy example about estimating the mean and variance of a set of data. For simplicity, we generate 100 data points with a given mean and variance following a normal distribution.
In [2]:
import numpy as np
np.random.seed(0)
mean_groundtruth = 3.
variance_groundtruth = 5.
N = 100
data = np.random.randn(N)*np.sqrt(variance_groundtruth) + mean_groundtruth
Let’s visualize our data by building a histogram.
In [3]:
%matplotlib inline
from pylab import *
_=hist(data, 10)

Now, let’s pretend that we do not know the mean and variance that are used to generate the above data.
We still believe that the data come from a normal distribution, which is our model. It is formulated as
is the vector representing the data.
In MXFusion, the above model can be defined as follows:
In [4]:
from mxfusion import Variable, Model
from mxfusion.components.variables import PositiveTransformation
from mxfusion.components.distributions import Normal
from mxfusion.common import config
config.DEFAULT_DTYPE = 'float64'
m = Model()
m.mu = Variable()
m.s = Variable(transformation=PositiveTransformation())
m.Y = Normal.define_variable(mean=m.mu, variance=m.s, shape=(N,))
In the above definition, we start with defining a model by instantiated from the class Model. The variable \(\mu\) and \(s\) are created from the class Variable. Both of them are assigned as members of the model instance m. This is how variables are organized in MXFusion. The variable s is created by passing a PositiveTransformation instance to the transforamtion argument. This constrains the value of the variable s to be positive through a “soft-plus” transformation. The variable Y is created from a normal distribution by specifying the mean and variance and its shape.
Note that, in this example, the mean and variance variable are both scalar, with the shape (1,), while the random variable Y has the shape (100,). This indicates the mean and variance variable are broadcasted into the shape of the random variable, just like the broadcasting rule in numpy array operation. In this case, this means the individual entries of the random variable Y follows a scalar normal distribution with the same mean and variance.
To list the content that is defined in the model instance, just print the model instance as follows:
In [5]:
print(m)
Y ~ Normal(mean=mu, variance=s)
After defining the probabilistic model, we want to estimate the mean and variance of the normal distribution in our model conditioned on the data that we generated. In MXFusion, this is done by creating an inference algorithm and passing it into the creation of an Inference instance. An inference algorithm represents a specific algorithm for a probabilistic inference. In this example, we performs a maximum likelihood estimate by using the MAP class. The Inference class takes care of the initialization of parameters and the execution of inference.
In the following code, we created a MAP inference algorithm by specifying the model and the set of observed variable. Then, we created a GradBasedInference instance from the instantiated MAP infernece algorithm.
The execution of inference is done by calling the call function. The call function takes all observed data (specified when creating the inference algorithm) as the keyword arguments, where the keys are the names of the member variables of the model and the values are the corresponding MXNet NDArrays. In this example, we only observed the variable Y, then, we pass “Y” as the key and the generated data as the value. We also specify the configuration parameters for the gradient optimizer such as the learning rate, the maximum number of iterations and whether to print the optimization progress. The default optimizer is adam.
In [6]:
from mxfusion.inference import GradBasedInference, MAP
import mxnet as mx
infr = GradBasedInference(inference_algorithm=MAP(model=m, observed=[m.Y]))
infr.run(Y=mx.nd.array(data, dtype='float64'), learning_rate=0.1, max_iter=2000, verbose=True)
Iteration 201 loss: 226.00307424753382
Iteration 401 loss: 223.62512496838366
Iteration 601 loss: 223.23108001422028
Iteration 801 loss: 223.16242835266598
Iteration 1001 loss: 223.15215875874755
Iteration 1201 loss: 223.15098355232075
Iteration 1401 loss: 223.15088846215527
Iteration 1601 loss: 223.15088333213185
Iteration 1801 loss: 223.15088315658113
Iteration 2000 loss: 223.15088315295884
After optimization, the estimated parameters are stored in an instance of the class InferenceParameters, which can be access from an Inference instance by infr.params.
We collect the estimated mean and variance and compared with the generating parameters.
In [7]:
mean_estimated = infr.params[m.mu].asnumpy()
variance_estimated = infr.params[m.s].asnumpy()
print('The estimated mean and variance: %f, %f.' % (mean_estimated, variance_estimated))
print('The true mean and variance: %f, %f.' % (mean_groundtruth, variance_groundtruth))
The estimated mean and variance: 3.133735, 5.079126.
The true mean and variance: 3.000000, 5.000000.
The estimated parameters are close to the generating parameters, but still off by a small amount. This difference is due to the small size of dataset we used, a problem known as over-fitting.
A Bayesian model¶
From the above example, we have done a maximum likelihood estimate from the observed data. Due to the limited number of data, the estimated parameters are not the same as the true parameters. An interesting question here is that whether we can have an estimate about how big the difference is. One approach to provide such an estimate is via Bayesian inference.
Following the above example, we need to assume prior distributions for the mean and variance of the normal distribution. We assume the mean to be a normal distribution with a relative big variance, indicating that we do not have much knowledge about the parameter.
In [8]:
m = Model()
m.mu = Normal.define_variable(mean=mx.nd.array([0], dtype='float64'),
variance=mx.nd.array([100], dtype='float64'), shape=(1,))
Then, we need to specify a prior distribution for the variance. This is a bit more complicated as the variance needs to be positive. In principle, one can use a distribution of positive values such as the Gamma distribution. To enable inference with the reparameterization trick, we, instead, assume a random variable \(\hat{s}\) with a normal distribution and the variance \(s\) is a function of \(\hat{s}\),
which transforms a real number to a positive number. By applying the transformation, we indirectly specifies the prior distribution for the variance.
To implement the above prior in MXFusion, we first create the variable s_hat with a normal distribution. Then, we defines a function in the MXNet Gluon syntax, which is also called a Gluon block, for the “soft-plus” transformation. The MXNet function is brought into the MXFusion environment by applying a wrapper called MXFusionGluonFunction, in which we specify the number of outputs. We pass the variable s_hat as the input to the function and get the variable s as the return value.
In [9]:
from mxfusion.components.functions import MXFusionGluonFunction
m.s_hat = Normal.define_variable(mean=mx.nd.array([5], dtype='float64'),
variance=mx.nd.array([100], dtype='float64'),
shape=(1,), dtype=dtype)
trans_mxnet = mx.gluon.nn.HybridLambda(lambda F, x: F.Activation(x, act_type='softrelu'))
m.trans = MXFusionGluonFunction(trans_mxnet, num_outputs=1, broadcastable=True)
m.s = m.trans(m.s_hat)
We define the variable Y following a normal distribution with the mean mu and the variance s.
In [10]:
m.Y = Normal.define_variable(mean=m.mu, variance=m.s, shape=(N,), dtype=dtype)
print(m)
s_hat ~ Normal(mean=Variable(66591), variance=Variable(582f6))
s = GluonFunctionEvaluation(hybridlambda0_input_0=s_hat)
mu ~ Normal(mean=Variable(230f5), variance=Variable(e5c1f))
Y ~ Normal(mean=mu, variance=s)
Inference for the above model is more complex, as the exact inference is intractable. We use variational inference with a Gaussian mean field posterior.
We construct the variational posterior by calling the function create_Gaussian_meanfield, which defines a Gaussian distribution for both the mean and the variance as the variational posterior. The content in the generated posterior can be listed by printing the posterior.
In [11]:
from mxfusion.inference import create_Gaussian_meanfield
q = create_Gaussian_meanfield(model=m, observed=[m.Y])
print(q)
s_hat ~ Normal(mean=Variable(987b5), variance=Variable(bf47c))
mu ~ Normal(mean=Variable(9c88b), variance=Variable(3ce74))
Then, we created an instance of StochasticVariationalInference with both the model and the variational posterior. We also need to specify the number of samples used in inference, as it uses the Monte Carlo method for approximating the integral in the variational lower bound. The execution of inference follows the same interface.
In [12]:
from mxfusion.inference import StochasticVariationalInference
infr = GradBasedInference(inference_algorithm=StochasticVariationalInference(
model=m, posterior=q, num_samples=10, observed=[m.Y]))
infr.run(Y=mx.nd.array(data, dtype='float64'), learning_rate=0.1, verbose=True)
Iteration 201 loss: 234.52798823187217
Iteration 401 loss: 231.30663180752714
Iteration 601 loss: 230.14185436095775
Iteration 801 loss: 230.02993763459781
Iteration 1001 loss: 229.6937452192292
Iteration 1201 loss: 229.72574317072662
Iteration 1401 loss: 229.65264175269712
Iteration 1601 loss: 229.67285188868293
Iteration 1801 loss: 229.52906307607037
Iteration 2000 loss: 229.64091981034755
Let’s check the resulting posterior distribution.
In [13]:
mu_mean = infr.params[q.mu.factor.mean].asscalar()
mu_std = np.sqrt(infr.params[q.mu.factor.variance].asscalar())
s_hat_mean = infr.params[q.s_hat.factor.mean].asscalar()
s_hat_std = np.sqrt(infr.params[q.s_hat.factor.variance].asscalar())
s_15 = np.log1p(np.exp(s_hat_mean - s_hat_std))
s_50 = np.log1p(np.exp(s_hat_mean))
s_85 = np.log1p(np.exp(s_hat_mean + s_hat_std))
print('The mean and standard deviation of the mean parameter is %f(%f). ' % (mu_mean, mu_std))
print('The 15th, 50th and 85th percentile of the variance parameter is %f, %f and %f.'%(s_15, s_50, s_85))
The mean and standard deviation of the mean parameter is 3.120117(0.221690).
The 15th, 50th and 85th percentile of the variance parameter is 4.604521, 5.309114 and 6.016289.
The true parameter sits within one standard deviation of the estimated posterior distribution for both the mean and variance parameters. The above error gives a good indication about how much we could trust the parameters that we estimate.
Probabilistic PCA Tutorial¶
This tutorial will demonstrate Probabilistic PCA, a factor analysis technique.
Maths and notation following Machine Learning: A Probabilistic Perspective.
Installation¶
Follow the instrallation instructions in the README file to get setup.
# Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License").
# You may not use this file except in compliance with the License.
# A copy of the License is located at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# or in the "license" file accompanying this file. This file is distributed
# on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
# express or implied. See the License for the specific language governing
# permissions and limitations under the License.
# ==============================================================================
Probabalistic Modeling Introduction¶
Probabilistic Models can be categorized into directed graphical models (DGM, Bayes Net) and undirected graphical models (UGM). Most popular probabilistic models are DGMs, so MXFusion will only support the definition of DGMs unless there is a strong customer need of UGMs in future.
A DGM can be fully defined using 3 basic components: deterministic functions, probabilistic distributions, and random variables. We show the interface for defining a model using each of the three components below.
First lets import the basic libraries we’ll need to train our model and visualize some data.
In [1]:
import warnings
warnings.filterwarnings('ignore')
import mxfusion as mf
import mxnet as mx
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
Data Generation¶
We’ll take as our function to learn components of the log spiral function because it’s 2-dimensional and easy to visualize.
In [2]:
def log_spiral(a,b,t):
x = a * np.exp(b*t) * np.cos(t)
y = a * np.exp(b*t) * np.sin(t)
return np.vstack([x,y]).T
We parameterize the function with 100 data points and plot the resulting function.
In [3]:
N = 100
D = 100
K = 2
a = 1
b = 0.1
t = np.linspace(0,6*np.pi,N)
r = log_spiral(a,b,t)
In [4]:
r.shape
Out[4]:
(100, 2)
In [5]:
plt.plot(r[:,0], r[:,1],'.')
Out[5]:
[<matplotlib.lines.Line2D at 0x1a20ebbc88>]

We now project our \(K\) dimensional r
into a high-dimensional
\(D\) space using a random matrix of random weights \(W\). Now
that r
is embedded in a \(D\) dimensional space the goal of PPCA
will be to recover r
in it’s original low-dimensional \(K\)
space.
In [6]:
w = np.random.randn(K,N)
x_train = np.dot(r,w) + np.random.randn(N,N) * 1e-3
In [7]:
# from sklearn.decomposition import PCA
# pca = PCA(n_components=2)
# new_r = pca.fit_transform(x_train)
# plt.plot(new_r[:,0], new_r[:,1],'.')
You can explore the higher dimensional data manually by changing
dim1
and dim2
in the following cell.
In [8]:
dim1 = 79
dim2 = 11
plt.scatter(x_train[:,dim1], x_train[:,dim2], color='blue', alpha=0.1)
plt.axis([-10, 10, -10, 10])
plt.title("Simulated data set")
plt.show()

MXFusion Model Definition¶
Import MXFusion and MXNet modelling components
In [9]:
from mxfusion.models import Model
import mxnet.gluon.nn as nn
from mxfusion.components import Variable
from mxfusion.components.variables import PositiveTransformation
The primary data structure in MXFusion is the Model. Models hold ModelComponents, such as Variables, Distributions, and Functions which are the what define a probabilistic model.
The model we’ll be defining for PPCA is:
\(p(z)\) ~ \(N(\mathbf{\mu}, \mathbf{\Sigma)}\)
\(p(x | z,\theta)\) ~ \(N(\mathbf{Wz} + \mu, \Psi)\)
where:
\(z \in \mathbb{R}^{N x K}, \mathbf{\mu} \in \mathbb{R}^K, \mathbf{\Sigma} \in \mathbb{R}^{NxKxK}, x \in \mathbb{R}^{NxD}\)
\(\Psi \in \mathbb{R}^{NxDxD}, \Psi = [\Psi_0, \dots, \Psi_N], \Psi_i = \sigma^2\mathbf{I}\)
\(z\) here is our latent variable of interest, \(x\) is the observed data, and all other variables are parameters or constants of the model.
First we create an MXFusion Model object to build our PPCA model on.
In [10]:
m = Model()
We attach Variable
objects to our model to collect them in a
centralized place. Internally, these are organized into a factor graph
which is used during Inference.
In [11]:
m.w = Variable(shape=(K,D), initial_value=mx.nd.array(np.random.randn(K,D)))
Because the mean of \(x\)‘s distribution is composed of the dot
product of \(z\) and \(W\), we need to create a dot product
function. First we create a dot product function in MXNet and then wrap
the function into MXFusion using the MXFusionGluonFunction class.
m.dot
can then be called like a normal python function and will
apply to the variables it is called on.
In [12]:
dot = nn.HybridLambda(function='dot')
m.dot = mf.functions.MXFusionGluonFunction(dot, num_outputs=1, broadcastable=False)
Now we define m.z
which has an identity matrix covariance cov
and zero mean.
m.z
and sigma_2
are then used to define m.x
.
Note that both sigma_2
and cov
will be added implicitly into the
Model
because they are inputs to m.x
.
In [13]:
cov = mx.nd.broadcast_to(mx.nd.expand_dims(mx.nd.array(np.eye(K,K)), 0),shape=(N,K,K))
m.z = mf.distributions.MultivariateNormal.define_variable(mean=mx.nd.zeros(shape=(N,K)), covariance=cov, shape=(N,K))
sigma_2 = Variable(shape=(1,), transformation=PositiveTransformation())
m.x = mf.distributions.Normal.define_variable(mean=m.dot(m.z, m.w), variance=sigma_2, shape=(N,D))
Posterior Definition¶
Now that we have our model, we need to define a posterior with parameters for the inference algorithm to optimize. When constructing a Posterior, we pass in the Model it is defined over and ModelComponent’s from the original Model are accessible and visible in the Posterior.
The covariance matrix must continue to be positive definite throughout
the optimization process in order to succeed in the Cholesky
decomposition when drawing samples or computing the log pdf of q.z
.
To satisfy this, we pass the covariance matrix parameters through a
Gluon function that forces it into a Symmetric matrix for which suitable
initialization values should maintain positive definite-ness throughout
the optimization procedure.
In [14]:
from mxfusion.inference import BatchInferenceLoop, GradBasedInference, StochasticVariationalInference
class SymmetricMatrix(mx.gluon.HybridBlock):
def hybrid_forward(self, F, x, *args, **kwargs):
return F.sum((F.expand_dims(x, 3)*F.expand_dims(x, 2)), axis=-3)
While this model has an analytical solution, we will run Variational Inference to find the posterior to demonstrate inference in a setting where the answer is known.
We place a multivariate normal prior over \(z\) because that is \(z\)‘s prior in the model and we don’t need to approximate anything in this case. Because the form we’re optimizing over is the true model, the optimization is convex and will always converge to the same answer given by classical PCA given enough iterations.
In [15]:
q = mf.models.Posterior(m)
sym = mf.components.functions.MXFusionGluonFunction(SymmetricMatrix(), num_outputs=1, broadcastable=False)
cov = Variable(shape=(N,K,K), initial_value=mx.nd.broadcast_to(mx.nd.expand_dims(mx.nd.array(np.eye(K,K) * 1e-2), 0),shape=(N,K,K)))
q.post_cov = sym(cov)
q.post_mean = Variable(shape=(N,K), initial_value=mx.nd.array(np.random.randn(N,K)))
q.z.set_prior(mf.distributions.MultivariateNormal(mean=q.post_mean, covariance=q.post_cov))
We now take our posterior and model, along with an observation pattern
(in our case only m.x
is observed) and create an inference
algorithm. This inference algorithm is combined with a gradient loop to
create the Inference method infr
.
In [17]:
observed = [m.x]
alg = StochasticVariationalInference(num_samples=3, model=m, posterior=q, observed=observed)
infr = GradBasedInference(inference_algorithm=alg, grad_loop=BatchInferenceLoop())
The inference method is then initialized with our training data and we run optimiziation for a while until convergence.
In [18]:
infr.initialize(x=mx.nd.array(x_train))
In [19]:
infr.run(max_iter=1000, learning_rate=1e-2, x=mx.nd.array(x_train))
Once training completes, we retrieve the posterior mean (our trained representation for \(\mathbf{Wz} + \mu\)) from the inference method and plot it. As shown, the plot recovers (up to rotation) the original 2D data quite well.
In [20]:
post_z_mean = infr.params[q.z.factor.mean].asnumpy()
In [22]:
plt.plot(post_z_mean[:,0], post_z_mean[:,1],'.')
Out[22]:
[<matplotlib.lines.Line2D at 0x1a21bb3208>]

Bayesian Neural Network (VI) for classification (under Development)¶
# Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License").
# You may not use this file except in compliance with the License.
# A copy of the License is located at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# or in the "license" file accompanying this file. This file is distributed
# on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
# express or implied. See the License for the specific language governing
# permissions and limitations under the License.
# ==============================================================================
In [1]:
import warnings
warnings.filterwarnings('ignore')
import mxfusion as mf
import mxnet as mx
import numpy as np
import mxnet.gluon.nn as nn
import mxfusion.components
import mxfusion.inference
Generate Synthetic Data¶
In [2]:
import GPy
%matplotlib inline
from pylab import *
np.random.seed(4)
k = GPy.kern.RBF(1, lengthscale=0.1)
x = np.random.rand(200,1)
y = np.random.multivariate_normal(mean=np.zeros((200,)), cov=k.K(x), size=(1,)).T>0.
plot(x[:,0], y[:,0], '.')
Out[2]:
[<matplotlib.lines.Line2D at 0x11cb73748>]

In [3]:
D = 10
net = nn.HybridSequential(prefix='nn_')
with net.name_scope():
net.add(nn.Dense(D, activation="tanh", flatten=False, in_units=1))
net.add(nn.Dense(D, activation="tanh", flatten=False, in_units=D))
net.add(nn.Dense(2, flatten=False, in_units=D))
net.initialize(mx.init.Xavier(magnitude=1))
In [4]:
from mxfusion.components.variables.var_trans import PositiveTransformation
from mxfusion.inference import VariationalPosteriorForwardSampling
In [5]:
m = mf.Model()
m.N = mf.Variable()
m.f = mf.functions.MXFusionGluonFunction(net, num_outputs=1, broadcastable=False)
m.x = mf.Variable(shape=(m.N,1))
m.r = m.f(m.x)
for _,v in m.r.factor.parameters.items():
v.set_prior(mf.components.distributions.Normal(mean=mx.nd.array([0]),variance=mx.nd.array([3.])))
m.y = mf.distributions.Categorical.define_variable(log_prob=m.r, shape=(m.N,1), num_classes=2)
print(m)
Variable(e06bd) ~ Normal(mean=Variable(78b11), variance=Variable(961fd))
Variable(5f8c9) ~ Normal(mean=Variable(34ce9), variance=Variable(a783a))
Variable(d2e00) ~ Normal(mean=Variable(6bda0), variance=Variable(7f89e))
Variable(b44a6) ~ Normal(mean=Variable(6b08c), variance=Variable(93b41))
Variable(d3cf6) ~ Normal(mean=Variable(ad287), variance=Variable(6700e))
Variable(7ab0d) ~ Normal(mean=Variable(160fc), variance=Variable(cbaf9))
r = GluonFunctionEvaluation(nn_input_0=x, nn_dense0_weight=Variable(7ab0d), nn_dense0_bias=Variable(d3cf6), nn_dense1_weight=Variable(b44a6), nn_dense1_bias=Variable(d2e00), nn_dense2_weight=Variable(5f8c9), nn_dense2_bias=Variable(e06bd))
y ~ Categorical(log_prob=r)
In [6]:
from mxfusion.inference import BatchInferenceLoop, create_Gaussian_meanfield, GradBasedInference, StochasticVariationalInference, MAP
In [7]:
observed = [m.y, m.x]
q = create_Gaussian_meanfield(model=m, observed=observed)
alg = StochasticVariationalInference(num_samples=5, model=m, posterior=q, observed=observed)
infr = GradBasedInference(inference_algorithm=alg, grad_loop=BatchInferenceLoop())
In [8]:
infr.initialize(y=mx.nd.array(y), x=mx.nd.array(x))
In [9]:
for v_name, v in m.r.factor.parameters.items():
uuid = v.uuid
loc_uuid = infr.inference_algorithm.posterior[uuid].factor.variance.uuid
a = infr.params.param_dict[loc_uuid].data().asnumpy()
a[:] = 1e-8
infr.params[infr.inference_algorithm.posterior[uuid].factor.mean] = net.collect_params()[v_name].data()
infr.params[infr.inference_algorithm.posterior[uuid].factor.variance] = mx.nd.array(a)
In [10]:
infr.run(max_iter=500, learning_rate=1e-1, y=mx.nd.array(y), x=mx.nd.array(x), verbose=True)
Iteration 51 loss: 1066.1125488281255
Iteration 101 loss: 675.2722167968758
Iteration 151 loss: 345.30307006835945
Iteration 201 loss: 196.68641662597656
Iteration 251 loss: 155.23381042480478
Iteration 301 loss: 149.42289733886724
Iteration 351 loss: 159.71490478515625
Iteration 401 loss: 140.58926391601562
Iteration 451 loss: 174.78173828125438
Iteration 500 loss: 127.64309692382812
In [11]:
for uuid, v in infr.inference_algorithm.posterior.variables.items():
if uuid in infr.params.param_dict:
print(v.name, infr.params[v])
None
[[-4.2562084 ]
[-2.1897657 ]
[-2.7514694 ]
[-1.8618754 ]
[ 0.05935706]
[ 2.3460457 ]
[-2.6491752 ]
[-1.2179427 ]
[-0.08034295]
[-0.5979197 ]]
<NDArray 10x1 @cpu(0)>
None
[[0.01767311]
[0.00603309]
[0.00848725]
[0.05524571]
[0.68462664]
[0.00412718]
[0.03144019]
[0.11763541]
[2.0121076 ]
[0.37226263]]
<NDArray 10x1 @cpu(0)>
None
[ 1.6983228 1.4742194 1.775172 0.6392376 0.31661415 -1.6325905
1.398058 0.7429083 0.04331838 0.3991743 ]
<NDArray 10 @cpu(0)>
None
[0.00463977 0.00697836 0.00277748 0.04407028 0.45794868 0.00267045
0.01026547 0.07324851 0.6038953 0.06482685]
<NDArray 10 @cpu(0)>
None
[[-5.0367075e-01 -3.6483032e-01 -3.4889570e-01 -3.7278756e-01
-5.8295298e-01 2.0773776e-01 -5.1646495e-01 -5.6319767e-01
1.2088771e-01 -3.6126822e-02]
[ 3.2355504e-03 -3.6068845e-01 -1.8626985e-01 -1.8437026e-01
6.3100457e-04 4.4206291e-01 -2.7084729e-02 -3.1543028e-01
-2.8092265e-01 -2.6803422e-01]
[-1.4353344e-01 2.7556152e+00 2.7566373e+00 4.7164506e-01
3.9942378e-01 -3.5447137e+00 1.2198279e+00 2.0113483e-01
7.4260637e-02 1.0011230e-01]
[ 3.5851079e-01 7.9171979e-01 6.1348730e-01 5.8377886e-01
5.4714572e-01 -1.0298078e+00 3.3680087e-01 1.1881048e-02
4.9028376e-01 -1.4387065e-01]
[ 7.1333803e-02 2.1075387e-01 2.7103132e-02 -6.6015087e-02
1.6656926e-01 -3.9778087e-01 1.8710904e-01 4.3254908e-02
-1.5939955e-01 -2.0810342e-01]
[-1.2169343e-01 9.4294645e-02 3.3085659e-01 -1.9831542e-02
2.8470194e-01 -3.8632959e-01 -7.6368101e-02 1.3375407e-01
5.9273201e-01 -4.5699142e-02]
[ 1.9255243e-01 2.9560938e-01 2.5773695e-01 -1.0506964e-01
-2.5529373e-01 -3.1061968e-01 3.3579066e-02 3.4898770e-01
6.0322829e-02 2.8761932e-01]
[-7.0459640e-01 -6.9609299e-02 -3.6901351e-02 -4.2581716e-01
3.1552029e-01 -1.8861942e-01 -6.2215298e-01 -4.0387815e-01
-7.6213044e-01 1.4895415e-01]
[ 1.5128514e+00 2.8877625e-01 4.8491848e-01 4.6291590e-01
6.4278495e-01 -3.2827693e-01 8.9393836e-01 3.9123634e-01
-2.0554188e-01 1.9961188e-02]
[ 3.8393707e+00 1.5004866e+00 2.1594408e+00 1.6014071e+00
-2.5904796e-01 -1.5982518e+00 2.0201933e+00 1.7498626e-01
2.3529473e-01 2.0874260e-01]]
<NDArray 10x10 @cpu(0)>
None
[[2.708519 2.7912157 3.0016038 2.6399708 2.1374285 3.3056285
3.338311 3.417747 3.0662622 2.665338 ]
[2.8390265 2.83211 3.2312827 2.9462285 3.0979178 3.0666673
2.9610286 2.8243313 3.183116 3.1657238 ]
[0.09525167 0.5299614 0.18995798 0.10806751 0.0597569 0.66866034
0.05129567 0.24625055 0.0597211 0.13327149]
[3.2492511 3.2614036 3.3345017 3.371056 3.053195 2.24815
2.216518 3.073331 2.4673629 2.2618537 ]
[2.2471485 2.8116536 2.6036153 2.3638754 3.2123742 2.5266416
3.2636497 3.4483907 2.9033678 2.3266923 ]
[3.0316195 2.9135869 2.8787353 2.9725506 3.2761152 3.0925238
2.5114353 2.7284532 2.269938 3.5903633 ]
[3.1938636 2.8134184 3.1856582 3.7374485 2.2276769 3.2222536
2.922795 2.6769052 2.965645 2.879921 ]
[2.2802768 2.479438 3.3902857 3.9532485 2.38113 3.2590485
2.944619 2.6350875 2.6051073 3.4462888 ]
[1.8125408 2.109326 2.3153167 2.4868069 2.156431 2.2759461
2.7648149 2.4953694 2.2933164 3.1460094 ]
[0.77330697 0.21263564 0.27486432 0.37804347 0.09345496 0.18871015
1.4391748 0.44938883 0.19104742 0.3376724 ]]
<NDArray 10x10 @cpu(0)>
None
[ 0.19256005 -0.00668432 2.0988328 0.03577445 0.11586528 0.46011198
-0.0560313 0.2638263 -0.9565386 -1.5131551 ]
<NDArray 10 @cpu(0)>
None
[2.5745904 1.9814985 0.03306409 3.0929534 3.9594617 2.3214269
2.8359015 3.0308347 1.3825488 0.07675996]
<NDArray 10 @cpu(0)>
None
[[-0.03301222 0.0079538 2.3766918 0.07191022 0.01081306 -0.08140734
-0.01184111 -0.20975497 -0.10157776 -2.483351 ]
[ 0.02959426 -0.00993267 -2.3763514 -0.09460182 -0.00604838 0.04880186
0.02897908 0.20790808 0.07653924 2.4683366 ]]
<NDArray 2x10 @cpu(0)>
None
[[0.21937881 0.20881172 0.14296758 0.19930215 0.19051579 0.15455456
0.09204628 0.10818221 0.12479458 0.16975753]
[0.11583243 0.0629681 0.0659797 0.09927363 0.09680786 0.15992676
0.08846851 0.09655253 0.10265005 0.08145336]]
<NDArray 2x10 @cpu(0)>
None
[-2.461291 2.4445803]
<NDArray 2 @cpu(0)>
None
[0.09449326 0.08444152]
<NDArray 2 @cpu(0)>
In [12]:
xt = np.linspace(0,1,100)[:,None]
In [13]:
infr2 = VariationalPosteriorForwardSampling(10, [m.x], infr, [m.r])
res = infr2.run(x=mx.nd.array(xt))
In [14]:
yt = res[m.r].asnumpy()
In [15]:
yt_mean = yt.mean(0)
yt_std = yt.std(0)
for i in range(yt.shape[0]):
plot(xt[:,0],1./(1+np.exp(yt[i,:,0]-yt[i,:,1])),'k',alpha=0.2)
plot(x[:,0],y[:,0],'.')
Out[15]:
[<matplotlib.lines.Line2D at 0x11d47e5f8>]

Bayesian Neural Network (VI) for regression¶
Zhenwen Dai (2018-8-21)¶
# Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License").
# You may not use this file except in compliance with the License.
# A copy of the License is located at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# or in the "license" file accompanying this file. This file is distributed
# on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
# express or implied. See the License for the specific language governing
# permissions and limitations under the License.
# ==============================================================================
In [1]:
import warnings
warnings.filterwarnings('ignore')
import mxfusion as mf
import mxnet as mx
import numpy as np
import mxnet.gluon.nn as nn
import mxfusion.components
import mxfusion.inference
Generate Synthetic Data¶
In [2]:
import GPy
%matplotlib inline
from pylab import *
np.random.seed(0)
k = GPy.kern.RBF(1, lengthscale=0.1)
x = np.random.rand(1000,1)
y = np.random.multivariate_normal(mean=np.zeros((1000,)), cov=k.K(x), size=(1,)).T
plot(x[:,0], y[:,0], '.')
Out[2]:
[<matplotlib.lines.Line2D at 0x10b5d3710>]

Model definition¶
In [3]:
D = 50
net = nn.HybridSequential(prefix='nn_')
with net.name_scope():
net.add(nn.Dense(D, activation="tanh"))
net.add(nn.Dense(D, activation="tanh"))
net.add(nn.Dense(1, flatten=True))
net.initialize(mx.init.Xavier(magnitude=3))
_=net(mx.nd.array(x))
In [4]:
from mxfusion.components.variables.var_trans import PositiveTransformation
from mxfusion.inference import VariationalPosteriorForwardSampling
In [5]:
m = mf.Model()
m.N = mf.Variable()
m.f = mf.functions.MXFusionGluonFunction(net, num_outputs=1,broadcastable=False)
m.x = mf.Variable(shape=(m.N,1))
m.v = mf.Variable(shape=(1,), transformation=PositiveTransformation(), initial_value=mx.nd.array([0.01]))
#m.prior_variance = mf.core.Variable(shape=(1,), transformation=PositiveTransformation())
m.r = m.f(m.x)
for v in m.r.factor.parameters.values():
v.set_prior(mf.components.distributions.Normal(mean=mx.nd.array([0]),variance=mx.nd.array([1.])))
m.y = mf.distributions.Normal.define_variable(mean=m.r, variance=m.v, shape=(m.N,1))
print(m)
Variable(746e9) ~ Normal(mean=Variable(15143), variance=Variable(e7a3a))
Variable(adf9f) ~ Normal(mean=Variable(dfd4f), variance=Variable(65d0f))
Variable(f6028) ~ Normal(mean=Variable(7d834), variance=Variable(57cde))
Variable(0f787) ~ Normal(mean=Variable(5a470), variance=Variable(9ed1d))
Variable(707d8) ~ Normal(mean=Variable(0fb74), variance=Variable(b9249))
Variable(5868e) ~ Normal(mean=Variable(5f168), variance=Variable(9986c))
r = GluonFunctionEvaluation(nn_input_0=x, nn_dense0_weight=Variable(5868e), nn_dense0_bias=Variable(707d8), nn_dense1_weight=Variable(0f787), nn_dense1_bias=Variable(f6028), nn_dense2_weight=Variable(adf9f), nn_dense2_bias=Variable(746e9))
y ~ Normal(mean=r, variance=v)
Inference with Meanfield¶
In [6]:
from mxfusion.inference import BatchInferenceLoop, create_Gaussian_meanfield, GradBasedInference, StochasticVariationalInference
In [7]:
observed = [m.y, m.x]
q = create_Gaussian_meanfield(model=m, observed=observed)
alg = StochasticVariationalInference(num_samples=3, model=m, posterior=q, observed=observed)
infr = GradBasedInference(inference_algorithm=alg, grad_loop=BatchInferenceLoop())
In [8]:
infr.initialize(y=mx.nd.array(y), x=mx.nd.array(x))
In [9]:
for v_name, v in m.r.factor.parameters.items():
uuid = v.uuid
loc_uuid = infr.inference_algorithm.posterior[uuid].factor.variance.uuid
a = infr.params.param_dict[loc_uuid].data().asnumpy()
a[:] = 1e-6
infr.params[infr.inference_algorithm.posterior[uuid].factor.mean] = net.collect_params()[v_name].data()
infr.params[infr.inference_algorithm.posterior[uuid].factor.variance] = mx.nd.array(a)
In [10]:
infr.run(max_iter=2000, learning_rate=1e-2, y=mx.nd.array(y), x=mx.nd.array(x), verbose=True)
Iteration 201 loss: 15813.8652343755
Iteration 401 loss: 11816.2539062575
Iteration 601 loss: 8878.53613281255
Iteration 801 loss: 6882.62304687555
Iteration 1001 loss: 4587.88476562575
Iteration 1201 loss: 3141.454101562575
Iteration 1401 loss: 2384.041748046875
Iteration 1601 loss: 1506.3939208984375
Iteration 1801 loss: 1371.0895996093755
Iteration 2000 loss: 1076.2845458984375
Use prediction to visualize the resulting BNN¶
In [11]:
xt = np.linspace(0,1,100)[:,None]
In [12]:
infr2 = VariationalPosteriorForwardSampling(10, [m.x], infr, [m.r])
res = infr2.run(x=mx.nd.array(xt))
In [13]:
yt = res[m.r].asnumpy()
In [14]:
yt_mean = yt.mean(0)
yt_std = yt.std(0)
for i in range(yt.shape[0]):
plot(xt[:,0],yt[i,:,0],'k',alpha=0.2)
plot(x[:,0],y[:,0],'.')
Out[14]:
[<matplotlib.lines.Line2D at 0x10c2593c8>]

Variational Auto-Encoder (VAE)¶
Zhenwen Dai (2018-8-21)¶
# Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License").
# You may not use this file except in compliance with the License.
# A copy of the License is located at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# or in the "license" file accompanying this file. This file is distributed
# on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
# express or implied. See the License for the specific language governing
# permissions and limitations under the License.
# ==============================================================================
In [1]:
import warnings
warnings.filterwarnings('ignore')
import mxfusion as mf
import mxnet as mx
import numpy as np
import mxnet.gluon.nn as nn
import mxfusion.components
import mxfusion.inference
%matplotlib inline
from pylab import *
Load a toy dataset¶
In [2]:
import GPy
data = GPy.util.datasets.oil_100()
Y = data['X']
label = data['Y'].argmax(1)
In [3]:
N, D = Y.shape
Model Defintion¶
In [4]:
Q = 2
In [5]:
H = 50
encoder = nn.HybridSequential(prefix='encoder_')
with encoder.name_scope():
encoder.add(nn.Dense(H, activation="tanh"))
encoder.add(nn.Dense(H, activation="tanh"))
encoder.add(nn.Dense(Q, flatten=True))
encoder.initialize(mx.init.Xavier(magnitude=3))
_=encoder(mx.nd.array(np.random.rand(5,D)))
In [6]:
H = 50
decoder = nn.HybridSequential(prefix='decoder_')
with decoder.name_scope():
decoder.add(nn.Dense(H, activation="tanh"))
decoder.add(nn.Dense(H, activation="tanh"))
decoder.add(nn.Dense(D, flatten=True))
decoder.initialize(mx.init.Xavier(magnitude=3))
_=decoder(mx.nd.array(np.random.rand(5,Q)))
In [7]:
from mxfusion.components.variables.var_trans import PositiveTransformation
In [8]:
m = mf.models.Model()
m.N = mf.components.Variable()
m.decoder = mf.components.functions.MXFusionGluonFunction(decoder, num_outputs=1,broadcastable=False)
m.x = mf.components.distributions.Normal.define_variable(mean=mx.nd.array([0]), variance=mx.nd.array([1]), shape=(m.N, Q))
m.f = m.decoder(m.x)
m.noise_var = mf.components.Variable(shape=(1,), transformation=PositiveTransformation(), initial_value=mx.nd.array([0.01]))
m.y = mf.components.distributions.Normal.define_variable(mean=m.f, variance=m.noise_var, shape=(m.N, D))
print(m)
x ~ Normal(mean=Variable(e909c), variance=Variable(e90bf))
f = GluonFunctionEvaluation(decoder_input_0=x, decoder_dense0_weight=Variable(0f71b), decoder_dense0_bias=Variable(aee54), decoder_dense1_weight=Variable(8db61), decoder_dense1_bias=Variable(7c56e), decoder_dense2_weight=Variable(85b99), decoder_dense2_bias=Variable(21241))
y ~ Normal(mean=f, variance=noise_var)
In [9]:
q = mf.models.Posterior(m)
q.x_var = mf.components.Variable(shape=(1,), transformation=PositiveTransformation(), initial_value=mx.nd.array([1e-6]))
q.encoder = mf.components.functions.MXFusionGluonFunction(encoder, num_outputs=1, broadcastable=False)
q.x_mean = q.encoder(q.y)
q.x.set_prior(mf.components.distributions.Normal(mean=q.x_mean, variance=q.x_var))
print(q)
x_mean = GluonFunctionEvaluation(encoder_input_0=y, encoder_dense0_weight=Variable(9768f), encoder_dense0_bias=Variable(9f1a0), encoder_dense1_weight=Variable(18970), encoder_dense1_bias=Variable(bcff4), encoder_dense2_weight=Variable(3d2a8), encoder_dense2_bias=Variable(95031))
x ~ Normal(mean=x_mean, variance=x_var)
Variational Inference¶
In [10]:
from mxfusion.inference import BatchInferenceLoop, StochasticVariationalInference, GradBasedInference
In [11]:
observed = [m.y]
alg = StochasticVariationalInference(num_samples=3, model=m, posterior=q, observed=observed)
infr = GradBasedInference(inference_algorithm=alg, grad_loop=BatchInferenceLoop())
In [12]:
infr.run(max_iter=2000, learning_rate=1e-2, y=mx.nd.array(Y), verbose=True)
Iteration 201 loss: 1715.0395507812525
Iteration 401 loss: 599.87670898437525
Iteration 601 loss: 149.24291992187538
Iteration 801 loss: -44.793395996093755
Iteration 1001 loss: -202.39929199218755
Iteration 1201 loss: -314.48220825195315
Iteration 1401 loss: -301.41076660156255
Iteration 1601 loss: -585.94531250937585
Iteration 1801 loss: -702.51806640625525
Iteration 2000 loss: -775.11627197265625
Plot the training data in the latent space¶
In [13]:
q_x_mean = q.encoder.gluon_block(mx.nd.array(Y)).asnumpy()
In [14]:
for i in range(3):
plot(q_x_mean[label==i,0], q_x_mean[label==i,1], '.')

Gaussian Process Regression¶
Zhenwen Dai (2018-11-2)
Introduction¶
Gaussian process (GP) is a Bayesian non-parametric model used for various machine learning problems such as regression, classification. This notebook shows about how to use a Gaussian process regression model in MXFusion.
In [1]:
import warnings
warnings.filterwarnings('ignore')
import os
os.environ['MXNET_ENGINE_TYPE'] = 'NaiveEngine'
Toy data¶
We generate some synthetic data for our regression example. The data set is generate from a sine function with some additive Gaussian noise.
In [2]:
import numpy as np
%matplotlib inline
from pylab import *
np.random.seed(0)
X = np.random.uniform(-3.,3.,(20,1))
Y = np.sin(X) + np.random.randn(20,1)*0.05
The generated data are visualized as follows:
In [3]:
plot(X, Y, 'rx', label='data points')
_=legend()

Gaussian process regression with Gaussian likelihood¶
Denote a set of input points \(X \in \mathbb{R}^{N \times Q}\). A Gaussian process is often formulated as a multi-variate normal distribution conditioned on the inputs:
points of the Gaussian process and \(K\) is the covariance matrix computed on the set of inputs according to a chosen kernel function \(k(\cdot, \cdot)\).
For a regression problem, \(F\) is often referred to as the noise-free output and we usually assume an additional probability distribution as the observation noise. In this case, we assume the noise distribution to be Gaussian:
\(\sigma^2\) is the variance of the Gaussian distribution.
The following code defines the above GP regression in MXFusion. First, we change the default data dtype to double precision to avoid any potential numerical issues.
In [4]:
from mxfusion.common import config
config.DEFAULT_DTYPE = 'float64'
In the code below, the variable Y
is defined following the
probabilistic module GPRegression
. A probabilistic module in
MXFusion is a pre-built probabilistic model with dedicated inference
algorithms for computing log-pdf and drawing samples. In this case,
GPRegression
defines the above GP regression model with a Gaussian
likelihood. It understands that the log-likelihood after marginalizing
\(F\) is closed-form and exploits this property when computing
log-pdf.
The model is defined by the input variable X
with the shape
(m.N, 1)
, where the value of m.N
is discovered when data is
given during inference. A positive noise variance variable
m.noise_var
is defined with the initial value to be 0.01. For GP, we
define a RBF kernel with input dimensionality being one and initial
value of variance and lengthscale to be one. We define the variable
m.Y
following the GP regression distribution with the above
specified kernel, input variable and noise_variance.
In [5]:
from mxfusion import Model, Variable
from mxfusion.components.variables import PositiveTransformation
from mxfusion.components.distributions.gp.kernels import RBF
from mxfusion.modules.gp_modules import GPRegression
m = Model()
m.N = Variable()
m.X = Variable(shape=(m.N, 1))
m.noise_var = Variable(shape=(1,), transformation=PositiveTransformation(), initial_value=0.01)
m.kernel = RBF(input_dim=1, variance=1, lengthscale=1)
m.Y = GPRegression.define_variable(X=m.X, kernel=m.kernel, noise_var=m.noise_var, shape=(m.N, 1))
In the above model, we have not defined any prior distributions for any
hyper-parameters. To use the model for regrssion, we typically do a
maximum likelihood estimate for all the hyper-parameters conditioned on
the input and output variable. In MXFusion, this is done by first
creating an inference algorithm, which is MAP
in this case, by
specifying the observed variables. Then, we create an inference body for
gradient optimization inference methods, which is called
GradBasedInference
. The inference method is triggered by calling the
run
method, in which all the observed data are given as keyword
arguments and any necessary configruation parameters are specified.
In [6]:
import mxnet as mx
from mxfusion.inference import GradBasedInference, MAP
infr = GradBasedInference(inference_algorithm=MAP(model=m, observed=[m.X, m.Y]))
infr.run(X=mx.nd.array(X, dtype='float64'), Y=mx.nd.array(Y, dtype='float64'),
max_iter=100, learning_rate=0.05, verbose=True)
Iteration 11 loss: -13.523289192527265
Iteration 21 loss: -16.077990179961076
Iteration 31 loss: -16.784414553096843
Iteration 41 loss: -16.820970924702017
Iteration 51 loss: -16.859865329532193
Iteration 61 loss: -16.895666914166453
Iteration 71 loss: -16.899409131167452
Iteration 81 loss: -16.901728290347176
Iteration 91 loss: -16.903122097339737
Iteration 100 loss: -16.903135093930537
All the inference outcomes are in the attribute params
of the
inference body. The inferred value of a parameter can be access by
passing the reference of the queried parameter to the params
attribute. For example, to get the value m.noise_var
, we can call
inference.params[m.noise_var]
. The estimated parameters from the
above experiment are as follows:
In [7]:
print('The estimated variance of the RBF kernel is %f.' % infr.params[m.kernel.variance].asscalar())
print('The estimated length scale of the RBF kernel is %f.' % infr.params[m.kernel.lengthscale].asscalar())
print('The estimated variance of the Gaussian likelihood is %f.' % infr.params[m.noise_var].asscalar())
The estimated variance of the RBF kernel is 0.616992.
The estimated length scale of the RBF kernel is 1.649073.
The estimated variance of the Gaussian likelihood is 0.002251.
We can compare the estimated values with the same model implemented in GPy. The estimated values from GPy are very close to the ones from MXFusion.
In [8]:
import GPy
m_gpy = GPy.models.GPRegression(X, Y, kernel=GPy.kern.RBF(1))
m_gpy.optimize()
print(m_gpy)
Name : GP regression
Objective : -16.903456670910902
Number of Parameters : 3
Number of Optimization Parameters : 3
Updates : True
Parameters:
GP_regression. | value | constraints | priors
rbf.variance | 0.6148038604494702 | +ve |
rbf.lengthscale | 1.6500299722611123 | +ve |
Gaussian_noise.variance | 0.002270049772204339 | +ve |
Prediction¶
The above section shows how to estimate the model hyper-parameters of a GP regression model. This is often referred to as training. After training, we are often interested in using the inferred model to predict on unseen inputs. The GP modules offers two types of predictions: predicting the mean and variance of the output variable or drawing samples from the predictive posterior distributions.
Mean and variance of the posterior distribution¶
To estimate the mean and variance of the predictive posterior
distribution, we use the inference algorithm
ModulePredictionAlgorithm
, which takes the model, the observed
variables and the target variables of prediction as input arguments. We
use TransferInference
as the inference body, which allows us to take
the inference outcome from the previous inference. This is done by
passing the inference parameters infr.params
into the
infr_params
argument.
In [9]:
from mxfusion.inference import TransferInference, ModulePredictionAlgorithm
infr_pred = TransferInference(ModulePredictionAlgorithm(model=m, observed=[m.X], target_variables=[m.Y]),
infr_params=infr.params)
To visualize the fitted model, we make predictions on 100 points evenly spanned from -5 to 5. We estimate the mean and variance of the noise-free output \(F\).
In [10]:
xt = np.linspace(-5,5,100)[:, None]
res = infr_pred.run(X=mx.nd.array(xt, dtype='float64'))[0]
f_mean, f_var = res[0].asnumpy()[0], res[1].asnumpy()[0]
The resulting figure is shown as follows:
In [11]:
plot(xt, f_mean[:,0], 'b-', label='mean')
plot(xt, f_mean[:,0]-2*np.sqrt(f_var), 'b--', label='2 x std')
plot(xt, f_mean[:,0]+2*np.sqrt(f_var), 'b--')
plot(X, Y, 'rx', label='data points')
ylabel('F')
xlabel('X')
_=legend()

Posterior samples of Gaussian process¶
Apart from getting the mean and variance at every location, we may need to draw samples from the posterior GP. As the output variables at different locations are correlated with each other, each sample gives us some idea of a potential function from the posterior GP distribution.
To draw samples from the posterior distribution, we need to change the
prediction inference algorithm attached to the GP module. The default
prediction function estimate the mean and variance of the output
variable as shown above. We can attach another inference algorithm as
the prediction algorithm. In the following code, we attach the
GPRegressionSamplingPrediction
algorithm as the prediction
algorithm. The targets
and conditionals
arguments specify the
target variables of the algorithm and the conditional variables of the
algorithm. After spcifying a name in the alg_name
argument such as
gp_predict
, we can access this inference algorithm with the
specified name like gp.gp_predict
. In following code, we set the
diagonal_variance
attribute to be False
in order to draw samples
from a full covariace matrix. To avoid numerical issue, we set a small
jitter to help matrix inversion. Then, we create the inference body in
the same way as the above example.
In [12]:
from mxfusion.inference import TransferInference, ModulePredictionAlgorithm
from mxfusion.modules.gp_modules.gp_regression import GPRegressionSamplingPrediction
gp = m.Y.factor
gp.attach_prediction_algorithms(targets=gp.output_names, conditionals=gp.input_names,
algorithm=GPRegressionSamplingPrediction(
gp._module_graph, gp._extra_graphs[0], [gp._module_graph.X]),
alg_name='gp_predict')
gp.gp_predict.diagonal_variance = False
gp.gp_predict.jitter = 1e-8
infr_pred = TransferInference(ModulePredictionAlgorithm(model=m, observed=[m.X], target_variables=[m.Y], num_samples=5),
infr_params=infr.params)
We draw five samples on the 100 evenly spanned input locations.
In [13]:
xt = np.linspace(-5,5,100)[:, None]
y_samples = infr_pred.run(X=mx.nd.array(xt, dtype='float64'))[0].asnumpy()
We visualize the individual samples each with a different color.
In [14]:
for i in range(y_samples.shape[0]):
plot(xt, y_samples[i,:,0])

Gaussian process with a mean function¶
TBA
Variational sparse Gaussian process regression¶
TBA
Developer Tutorials¶
Writing a new Distribution¶
# Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License").
# You may not use this file except in compliance with the License.
# A copy of the License is located at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# or in the "license" file accompanying this file. This file is distributed
# on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
# express or implied. See the License for the specific language governing
# permissions and limitations under the License.
# ==============================================================================
To write and and use a new Distribution class in MXFusion, fill out the Distribution interface and either the Univariate or Multivariate interface, depending on the type of distribution you are creating.
There are 4 primary methods to fill out for a Distribution in MXFusion:
* __init__
- This is the constructor for the Distribution. It takes
in any parameters the distribution needs. It also defines names for the
input variable[s] that the distribution takes and the output variable[s]
it produces. * log_pdf
- This method returns the logarithm of
probabilistic density function for the distribution. This is called
during Inference time as necessary to perform the Inference algorithm.
* draw_samples
- This method returns drawn samples from the
distribution. This is called during Inference time as necessary to
perform the Inference algorithm. * define_variable
- This is used
to generate random variables drawn from the Distribution used during
model definition.
log_pdf
and draw_samples
are implemented using MXNet functions
to compute on the input variables, which at Inference time are MXNet
arrays or MXNet symbolic variables.
This notebook will take the Normal distribution as a reference.
File Structure¶
Code for distributions lives in the mxfusion/components/distributions directory.
If you’re implementing the FancyNew distribution then you should create a file called mxfusion/components/distributions/fancy_new.py for the class to live in.
Interface Implementation¶
Since this example is for a Univariate Normal distribution, our class extends the UnivatiateDistribution class.
The Normal distribution’s constructor takes in objects for its mean
and variance
, specifications for data type and context, and a random
number generator if not the default.
In addition, a distribution can take in additional parameters used for
calculating that aren’t inputs. We refer to these additional parameters
as the Distribution’s attributes
. The difference between an input
and an attribute is primarily that inputs are dynamic at inference time,
while attributes are static throughout a given inference run.
In this case, minibatch_ratio
is a static attribute, as it doesn’t
change for a given minibatch size during inference.
The mean and variance can be either Variables or MXNet arrays if they are constants.
As mentioned above, you define names for the input and output
variable[s] for the distribution here. These names are used when
printing and generally inspecting the model, so give meaningful names.
We prefer names like mean
and variance
to ones like location
and scale
or greek letters like mew
and sigma
.
In [ ]:
class Normal(UnivariateDistribution):
"""
The one-dimensional normal distribution. The normal distribution can be defined over a scalar random variable or an array of random variables. In case of an array of random variables, the mean and variance are broadcasted to the shape of the output random variable (array).
:param mean: Mean of the normal distribution.
:type mean: Variable
:param variance: Variance of the normal distribution.
:type variance: Variable
:param rand_gen: the random generator (default: MXNetRandomGenerator)
:type rand_gen: RandomGenerator
:param dtype: the data type for float point numbers
:type dtype: numpy.float32 or numpy.float64
:param ctx: the mxnet context (default: None/current context)
:type ctx: None or mxnet.cpu or mxnet.gpu
"""
def __init__(self, mean, variance, rand_gen=None, minibatch_ratio=1.,
dtype=None, ctx=None):
self.minibatch_ratio = minibatch_ratio
if not isinstance(mean, Variable):
mean = Variable(value=mean)
if not isinstance(variance, Variable):
variance = Variable(value=variance)
inputs = [('mean', mean), ('variance', variance)]
input_names = ['mean', 'variance']
output_names = ['random_variable']
super(Normal, self).__init__(inputs=inputs, outputs=None,
input_names=input_names,
output_names=output_names,
rand_gen=rand_gen, dtype=dtype, ctx=ctx)
If your distribution’s __init__
function only takes in parameters
that get passed onto its super constructor, you don’t need to implement
replicate_self
. If it does take additional parameters (as the Normal
distribution does for minibatch_ratio), those parameters need to be
copied over to the replicant Distribution before returning, as below.
In [ ]:
def replicate_self(self, attribute_map=None):
"""
Replicates this Factor, using new inputs, outputs, and a new uuid.
Used during model replication to functionally replicate a factor into a new graph.
:param inputs: new input variables of the factor
:type inputs: a dict of {'name' : Variable} or None
:param outputs: new output variables of the factor.
:type outputs: a dict of {'name' : Variable} or None
"""
replicant = super(Normal, self).replicate_self(attribute_map)
replicant.minibatch_ratio = self.minibatch_ratio
return replicant
log_pdf
and draw_samples
are relatively straightforward for
implementation. These are the meaningful parts of the Distribution that
you’re implementing, putting the math into code using MXNet operators
for the compute.
If it’s a distribution that isn’t well documented on Wikipedia, please add a link to a paper or other resource that explains what it’s doing and why.
In [ ]:
def log_pdf(self, mean, variance, random_variable, F=None):
"""
Computes the logarithm of the probability density function (PDF) of the normal distribution.
:param mean: the mean of the normal distribution
:type mean: MXNet NDArray or MXNet Symbol
:param variance: the variance of the normal distributions
:type variance: MXNet NDArray or MXNet Symbol
:param random_variable: the random variable of the normal distribution
:type random_variable: MXNet NDArray or MXNet Symbol
:param F: the MXNet computation mode (mxnet.symbol or mxnet.ndarray)
:returns: log pdf of the distribution
:rtypes: MXNet NDArray or MXNet Symbol
"""
F = get_default_MXNet_mode() if F is None else F
logvar = np.log(2 * np.pi) / -2 + F.log(variance) / -2
logL = F.broadcast_add(logvar, F.broadcast_div(F.square(
F.broadcast_minus(random_variable, mean)), -2 * variance))
return logL
In [ ]:
def draw_samples(self, mean, variance, rv_shape, num_samples=1, F=None):
"""
Draw samples from the normal distribution.
:param mean: the mean of the normal distribution
:type mean: MXNet NDArray or MXNet Symbol
:param variance: the variance of the normal distributions
:type variance: MXNet NDArray or MXNet Symbol
:param rv_shape: the shape of each sample
:type rv_shape: tuple
:param num_samples: the number of drawn samples (default: one)
:int num_samples: int
:param F: the MXNet computation mode (mxnet.symbol or mxnet.ndarray)
:returns: a set samples of the normal distribution
:rtypes: MXNet NDArray or MXNet Symbol
"""
F = get_default_MXNet_mode() if F is None else F
out_shape = (num_samples,) + rv_shape
return F.broadcast_add(F.broadcast_mul(self._rand_gen.sample_normal(
shape=out_shape, dtype=self.dtype, ctx=self.ctx),
F.sqrt(variance)), mean)
define_variable
is just a helper function for end users. All it does
is take in parameters for the distribution, create a distribution based
on those parameters, then return the output variables of that
distribution.
In [ ]:
@staticmethod
def define_variable(mean=0., variance=1., shape=None, rand_gen=None,
minibatch_ratio=1., dtype=None, ctx=None):
"""
Creates and returns a random variable drawn from a normal distribution.
:param mean: Mean of the distribution.
:param variance: Variance of the distribution.
:param shape: the shape of the random variable(s)
:type shape: tuple or [tuple]
:param rand_gen: the random generator (default: MXNetRandomGenerator)
:type rand_gen: RandomGenerator
:param dtype: the data type for float point numbers
:type dtype: numpy.float32 or numpy.float64
:param ctx: the mxnet context (default: None/current context)
:type ctx: None or mxnet.cpu or mxnet.gpu
:returns: the random variables drawn from the normal distribution.
:rtypes: Variable
"""
normal = Normal(mean=mean, variance=variance, rand_gen=rand_gen,
dtype=dtype, ctx=ctx)
normal._generate_outputs(shape=shape)
return normal.random_variable
Using your new distribution¶
At this point, you should be ready to start testing your new Distribution’s functionality by importing it like any other MXFusion component.
Testing¶
Before submitting your new code as a pull request, please write unit tests that verify it works as expected. This should include numerical checks against edge cases. See the existing test cases for the Normal or Categorical distributions for example tests.
In [ ]:
Design Overview¶
Topical Guides¶
Working in MXFusion breaks up into two primary phases. Model definition involves defining the variables, distributions, and functions that make up your model. Inference then takes in real values and learns parameters for your model or gives predictions over the data.
Model Definition¶
MXFusion is a library for doing probabilistic modelling.
Probabilistic Models can be categorized into directed graphical models (DGM, Bayes Net) and undirected graphical models (UGM). Most popular probabilistic models are DGMs, so MXFusion currently only supports DGMs.
A DGM can be fully defined using 3 basic components: deterministic functions, probabilistic distributions, and random variables. As such, those are the primary ModelComponents in MXFusion.
Model¶
The primary data structure of MXFusion is the FactorGraph. FactorGraphs contain Variables and Factors. The FactorGraph exposes methods that Inference algorithms call such as drawing samples from the FactorGraph or computing the log pdf of a set of Variables contained in the FactorGraph.
When you want to start modelling, construct a Model object and start attaching ModelComponents to it. You can then see the Model’s components by
m = Model()
m.v = Variable()
print(m.components)
When a ModelComponent is attached to a Model, it is automatically updated in the Model’s internal data structures and will be included in any subsequent inference operations over the model.
Model Components¶
All ModelComponents in MXFusion are identified uniquely by a UUID.
Variables¶
In a model, there are typically four types of variables: a random variable following a probabilistic distribution, a variable which is the outcome of a deterministic function, a parameter (with no prior distribution), and a constant. The definitions of first two types of variables will be discussed later. The latter two types of variables can be defined with the following statement:
m.v = Variable(shape=(10,), constraint=PositiveTransformation())
At this stage, you do not need to specify whether v is a parameter or constant, because, if it is a constant, its value will be provided during inference, otherwise it will be treated as a parameter.
A typical example of when a constant would be specified at inference time is the size (shape) of an observed variable, which is known when data is provided. In the above example, we specify the name of the variable, the shape of the variable and the constraint that the variable has. It defines a 10-dimension vector whose values are always positive (v>=0).
Factors¶
In a probabilistic model, random variables relate to each other through probabilistic distributions.
During model definition, the typical interface to generate a 2 dimensional random variable x from a zero mean unit variance Gaussian distribution looks like:
m.x = Normal.define_variable(mean=0, variance=1, shape=(2,))
The two dimensions are independent to each other and both follow the same Gaussian distribution. The parameters or shape of a distribution can also be variables, for example:
m.mean = Variable(shape=(2,))
m.y_shape = Variable()
m.y = Normal.define_variable(mean=m.mean, variance=1, shape=m.y_shape)
MXFusion also allows users to specify a prior distribution over pre-existing variables. This is particularly handy for interfacing with neural networks in MXNet because it allows you to set priors over parameters in an existing Gluon Block, such as a neural network implementation. The API for specifying a prior distribution looks like:
m.x = Variable(shape=(2,))
m.x.set_prior(Gaussian(mean=0, variance=1))
The above code defines a variable x and sets the prior distribution of each dimension of x to be a scalar unit Gaussian distribution.
Because Models are FactorGraphs, it is common to want to know what ModelComponents come before or after a particular component in the graph. These are accessed through the ModelComponent properties successors
and predecessors
.
m.mean = Variable()
m.var = Variable()
m.y = Normal.define_variable(mean=m.mean, variance=m.var)
The last building block of probabilistic models are deterministic functions. The ability to define sophisticated functions allows users to build expressive models with a family of standard probabilistic distributions. As MXNet already provides full functionality for defining a function and automatically evaluating its gradients, Functions in MXFusion are a wrapper over the functions in MXNet’s Gluon interface. Functions are defined in standard MXNet syntax and provided to the MXFusion Function wrapper as below.
First we define a function in MXNet Gluon syntax using a Block object:
class F(mx.gluon.Block):
def forward(self, x, y):
return x*2+y
Then we create an MXFusion Function instance by passing in our Gluon function instance:
f_gluon = F()
m.f_mf = MXFusionGluonFunction(f_gluon)
Then this MXFusion Function can be called using MXFusion variables and its outcome will another variable[s] representing the outcome of the function:
m.x = Variable(shape=(2,))
m.y = Variable(shape=(2,))
m.f = f_mf(x, y)
FAQ¶
- Why don’t you support undirected graphical models (UGM)?
- A UGM is typically defined in terms of a set of potential functions. Each potential function is a non-negative function that is defined on a subset of variables in a model. The joint probability distribution of an UGM is defined as the product of all the potential functions divided by a normalization term (known as a partition function).
- The notation of a DGM and an UGM can be unified into a factor graph, where a factor can be either a probabilistic distribution or a potential function. In our implementation, the distribution UI is inherited from the factor abstract class, which enables future extension to support UGM , although inference algorithms for UGM will be completely different.
Inference¶
Overview¶
Inference in MXFusion is broken down into a few logical pieces that can be combined together as necessary.
The highest level object you’ll deal with will be derived from the mxfusion.inference.Inference
class. This is the outer loop that drives the inference algorithm, holds the relevant parameters and models for training, and handles serialization after training. At a minimum, Inference
objects take as input the InferenceAlgorithm
to run. On creation, an InferenceParameters
object is created and attached to the Inference
method which will store and manage (MXNet) parameters during inference.
Currently there are two main Inference subclasses: GradBasedInference
and TransferInference
. An obvious third choice would be some kind of MCMC sampling Inference method.
The first primary class of Inference methods is GradBasedInference
, which is for those methods that involve a gradient-based optimization. We only support the gradient optimizers that are available in MXNet for now. When using gradient-based inference methods (GradBasedInference
), the Inference class takes in a GradLoop
in addition to the InferenceAlgorithm
. The GradLoop
determines how the gradient computed in the InferenceAlgorithm
is used to update model parameters. The two available implementations of GradLoop
are BatchInferenceLoop
and MinibatchInferenceLoop
, which correspond to gradient-based optimization in batch or mini-batch mode.
The second type of Inference method is TransferInference
. These are methods that take as an additional parameter the InferenceParameters
object from a previous Inference method. An example of a TransferInference
method is the VariationalPosteriorForwardSampling
method, which takes as input a VariationalInference method that has already been trained and performs forward sampling through the variational posterior.
A basic example to run variational inference with a meanfield posterior over some model looks like the following. See the next section for mathematical details on variational inference.
First Example¶
First we create the model. The model creation function is dummy here, but this applies to almost any model. See the Model Definiton file for details on model creation. Then we define the observed variables in our model, and apply the convenience method for creating a factorized Gaussian posterior to that model, and get the posterior q
.
m = make_model()
observed = [m.y, m.x]
q = create_Gaussian_meanfield(model=m, observed=observed)
Then we define what InferenceAlgorithm
we want to run, and initialize it with the model, posterior, and observation pattern we defined above. This is used to initialize the GradBasedInference
object, which creates a data structure to manage parameters of the model at this stage.
alg = StochasticVariationalInference(model=m, observed=observed, posterior=q)
infr = GradBasedInference(inference_algorithm=alg)
Then, we run the Inference method, passing in the data as keyword arguments, matching the observation pattern we defined previously. This will create and initialize parameters for the variational posterior and any model parameters, and optimize the standard KL-divergence loss function to match the variational posterior to the model’s posterior. We run it for 1000 iterations.
infr.run(max_iter=1000, y=y, x=x)
Inference Algorithms¶
MXFusion currently supports stochastic variational inference. We provide a convenience method to generate a Gaussian meanfield posterior for your model, but the interface is flexible enough to allow defining a specialized posterior over your model as required. See the mxfusion.inference
module of the documentation for a full list of supported inference methods.
Variational Inference¶
Variational inference is an approximate inference method that can serve as the inference method over generic models built in MXFusion. The main idea of variational inference is to approximate the (often intractable) posterior distribution of our model with a simpler parametric approximation, referred to as a variational posterior distribution. The goal is then to optimize the parameters of this variational posterior distribution to best approximate our true posterior distribution. This is typically done by minimizing the lower bound of the logarithm of the marginal distribution:
\begin{equation} \log p(y|z) = \log \int_x p(y|x) p(x|z) \geq \int_x q(x|y,z) \log \frac{p(y|x) p(x|z)}{q(x|y,z)} = \mathcal{L}(y,z), \label{eqn:lower_bound_1} \end{equation}
where $(y|x) p(x|z)$ forms a probabilistic model with $x$ as a latent variable, $q(x|y)$ is the variational posterior distribution, and the lower bound is denoted as $\mathcal{L}(y,z)$. By then taking a natural exponentiation of $\mathcal{L}(y,z)$, we get a lower bound of the marginal probability denoted as $\tilde{p}(y|z) = e^{\mathcal{L}(y,z)}$.
A technical challenge with VI is that the integral of the lower bound of a probabilistic module with respect to external latent variables may not always be tractable. Stochastic variational inference (SVI) offers an approximated solution to this new intractability by applying Monte Carlo Integration. Monte Carlo Integration is applicable to generic probabilistic distributions and lower bounds as long as we are able to draw samples from the variational posterior.
In this case, the lower bound is approximated as \begin{equation} \mathcal{L}(l, z) \approx \frac{1}{N} \sum_i \log \frac{p(l|y_i)e^{\mathcal{L}(y_i,z)}}{q(y_i|z)}, \quad \mathcal{L}(y_i, z) \approx \frac{1}{M} \sum_j \log \frac{p(y_i|x_j) p(x_j|z)}{q(x_j|y_i, z)} , \end{equation} where $y_i|z \sim q(y|z)$, $x_j|y_i,z \sim q(x|y_i,z)$ and $N$ is the number of samples of $y$ and $M$ is the number of samples of $x$ given $y$. Note that if there is a closed form solution of $\tilde{p}(y_i|z)$, the calculation of $\mathcal{L}(y_i,z)$ can be replaced with the closed-form solution.
Let’s look at a simple model and then see how we apply stochastic variational inference to it in practice using MXFusion.
Creating a Posterior¶
Variational inference is based around the idea that you can approximate your true model’s, possibly complex, posterior distribution with an approximate variational posterior that is easy to compute. A common choice of approximate posterior is the Gaussian meanfield, which factorizes each variable as being drawn from a Normal distribution independent from the rest.
This can be done easily for a given model by calling the mxfusion.inference.create_Gaussian_meanfield
function and passing in your model.
You can also define more complex posterior distributions to perform inference over if you know something more about your problem. See the [../../examples/notebooks/ppca_tutorial.ipynb](PPCA tutorial) for a detailed example of this process.
Saving and Loading Inference Results¶
Saving and reloading inference results is managed at the Inference
level in MXFusion. Once you have an Inference
object that has been trained, you save the whole thing by running:
inference.save('my_inference_prefix')
This will save down all relevent pieces of the inference algorithm to files beginning with the prefix passed in at save time. These files include: MXNet parameter files, json files containing the model’s topology, and any Inference configuration such as the number of samples it was run with.
When reloading a saved inference method, you must re-run the code used to generate the original models and Inference method, and then load the saved parameters back into the new objects. An example is shown below:
In process 1:
x = np.random.rand(1000, 1)
y = np.random.rand(1000, 1)
m = make_model()
observed = [m.y, m.x]
q = create_Gaussian_meanfield(model=m, observed=observed)
alg = StochasticVariationalInference(num_samples=3, model=m, observed=observed, posterior=q)
infr = GradBasedInference(inference_algorithm=alg, grad_loop=BatchInferenceLoop())
infr.initialize(y=y, x=x)
infr.run(max_iter=1, learning_rate=1e-2, y=y, x=x)
infr.save(prefix=PREFIX)
At some future time, in another process:
x = np.random.rand(1000, 1)
y = np.random.rand(1000, 1)
m2 = make_model()
observed2 = [m2.y, m2.x]
q2 = create_Gaussian_meanfield(model=m2, observed=observed2)
alg2 = StochasticVariationalInference(num_samples=3, model=m2, observed=observed2, posterior=q2)
infr2 = GradBasedInference(inference_algorithm=alg2, grad_loop=BatchInferenceLoop())
infr2.initialize(y=y, x=x)
# Load previous parameters
infr2.load(primary_model_file=PREFIX+'_graph_0.json',
secondary_graph_files=[PREFIX+'_graph_1.json'],
parameters_file=PREFIX+'_params.json',
inference_configuration_file=PREFIX+'_configuration.json',
mxnet_constants_file=PREFIX+'_mxnet_constants.json',
variable_constants_file=PREFIX+'_variable_constants.json')
Design Choices¶
Design Documents¶
Changing the design of variable broadcasting for factors¶
Zhenwen Dai (2018-10-26)
Motivation¶
The current design of the interface for creating a probabilistic distribution in MXFusion supports automatic broadcasting if the shapes of one or all the input variables are smaller than the expected shape. For example, we can create a random variable of the size (3, 4) with the shape of its mean being (1,) and the shape of its variance being (4,) as follows:
m = Model()
m.mean = Variable(shape=(1,))
m.variance = Variable(shape=(4,))
m.x = Normal.define_variable(mean=m.mean, variance=m.variance, shape=(3, 4))
Although it is a handy feature, it causes confusions for some users because they are not familiar with the common broadcasting rule.
It also leads a big challenge when implementing new distributions. The current design requires users to write one function decorator for each of the two main APIs: log_pdf and draw_samples for any multi-variate distributions. Such a function decorator does two things:
- If an input variable has a shape that is smaller than the expected one (computed from the shape of the random variable), it broadcasts the shape of the input variable to the expected shape.
- For any the variables, the number of samples is not one. If the numbers of samples of more than one variables are more than one, the number of samples of these variables are assumed to be the same. It broadcast the size of the first dimension to the number of samples.
This mechanism is complicated and makes it hard get contributions.
Proposed Changes¶
To simplify the logic of broadcasting, a natural choice is to let users take care of the first part of the broadcasting job. We still need to take care of the second part of the broadcasting as it is invisible from users. After the changes, the example with the new interface should be like
m = Model()
m.mean = Variable(shape=(1,))
m.variance = Variable(shape=(4,))
m.x = Normal.define_variable(mean=broadcast_to(m.mean, (3, 4)),
variance=broadcast_to(m.variance, (3, 4)), shape=(3, 4))
In the above example, the operator broadcast_to
takes care of broadcasting a variable to another shape and the Normal
instance expects that all the inputs have the same shape as the random variable, which is (3, 4) in this case. This simplifies the implementation of distributions.
For broadcasting the number of samples, the mechanism for distributions and function evaluations can be unified. A boolean class attribute broadcastable
will be added to Factor
. This attribute controls whether to expose the extra dimension of samples into internal computation logic. With this attribute being False
, a developer can implement the computation without the extra sample dimension, which is much straight-forward.
With this simplification, the function decorator is not necessary anymore. The class structure of a distribution will look like
class Distribution(Factor):
def log_pdf(self, F, variables, targets=None):
# Take care of broadcasting of the number of samples.
# or looping through the number of samples
# call _log_pdf_implementation
def _log_pdf_implementation(self, F, **kwargs):
# The inherited classes will implement the real computation here.
Rejected Alternatives¶
A possible solution to reduce the complexity of the broadcasting implementation without changing the interface is to add a shape inference function to each distribution. The shape inference will return the expected shapes of individual input variables given the shape of the random variable. Then, the broadcast_to
operator just needs to be called inside the log_pdf
function to hide the step of shape broadcasting.
The drawbacks of this approach is
- The shape inference needs to be implemented for each multi-variate distribution.
- The broadcasting logic is not consistent with function evaluations and modules, which causes confusions to users.
Overview¶
If you want to propose making a major change to the codebase rather than a simple feature addition, it’s helpful to fill out and send around a design proposal document before you go through all the work of implementing it. This allows the community to better evaluate the idea, highight any potential downsides, or propose alternative solutions ahead of time and save unneeded effort.
For smaller feature requests just file an issue and fill out the feature request template.
What is considered a “major change” that needs a design proposal?¶
Any of the following should be considered a major change:
- Anything that changes a public facing API such as model definition or inference.
- Any other major new feature, subsystem, or piece of functionality
Example issues that might need design proposals include #75, #40, #24, or #23.
Process to submit a design proposal¶
Fill out the template below, add it to this folder on your fork, and make a pull request against the main repo. If it is helpful, feel free to include some proof of concept or mockup code to demonstrate the idea more fully. The point isn’t to have fully functioning or tested code of your idea, but to help communicate the idea and how it might look with the rest of the community.
Template¶
The basic template should include the following things:
Motivation¶
Describe the problem to be solved.
Public Interfaces¶
Describe how this changes the public interfaces of the library (if at all). Also describe any backwards compatibility strategies here if this will break an existing API.
Proposed Changes¶
Describe the new thing you want to do. This may be fairly extensive and have large subsections of its own. Or it may be a few sentences, depending on the scope of the change.
Rejected Alternatives¶
What are the other alternatives you considered and why are they worse? The goal of this section is to help people understand why this is the best solution now, and also to prevent churn in the future when old alternatives are reconsidered.
Acknowledgements¶
This process is heavily inspired and taken from the Kafka Improvement Processes.