# Solve Partial Differential Equation with LBFGS Method and Deep Learning

This example shows how to train a Physics Informed Neural Network (PINN) to numerically compute the solution of the Burger's equation by using the limited-memory BFGS (LBFGS) algorithm.

The Burger's equation is a partial differential equation (PDE) that arises in different areas of applied mathematics. In particular, fluid mechanics, nonlinear acoustics, gas dynamics, and traffic flows.

Given the computational domain$\left[-1,1\right]×\left[0,1\right]$, this examples uses a physics informed neural network (PINN) [1] and trains a multilayer perceptron neural network that takes samples $\left(x,t\right)$ as input, where $x\in \left[-1,1\right]$ is the spatial variable, and $t\in \left[0,1\right]$ is the time variable, and returns $u\left(x,t\right)$, where u is the solution of the Burger's equation:

`$\frac{\partial u}{\partial t}+u\frac{\partial u}{\partial x}-\frac{0.01}{\pi }\frac{{\partial }^{2}u}{\partial {x}^{2}}=0,$`

with $u\left(x,t=0\right)=-sin\left(\pi x\right)$as the initial condition, and $u\left(x=-1,t\right)=0$ and $u\left(x=1,t\right)=0$ as the boundary conditions.

Instead of training the network using the `trainNetwork` function, or using a custom training loop that updates parametes using `sgdmupdate` or similar functions, this example estimates the learnable parameters by using the `fmincon` function (requires Optimization Toolbox™). The `fmincon` function finds the minimum of constrained nonlinear multivariable functions.

The example trains the model by enforcing that given an input $\left(x,t\right)$, the output of the network $u\left(x,t\right)$ fulfills the Burger's equation, the boundary conditions, and the initial condition. To train the model, the exmaple uses the limited-memory BFGS (LBFGS) algorithm which is a quasi-Newton method that approximates the Broyden-Fletcher-Goldfarb-Shanno algorithm.

Training this model does not require collecting data in advance. You can generate data using the definition of the PDE and the constraints.

### Generate Training Data

Training the model requires a data set of collocation points that enforce the boundary conditions, enforce the initial conditions, and fulfill the Burger's equation.

Select 25 equally spaced time points to enforce each of the boundary conditions $u\left(x=-1,t\right)=0$ and $u\left(x=1,t\right)=0$.

```numBoundaryConditionPoints = [25 25]; x0BC1 = -1*ones(1,numBoundaryConditionPoints(1)); x0BC2 = ones(1,numBoundaryConditionPoints(2)); t0BC1 = linspace(0,1,numBoundaryConditionPoints(1)); t0BC2 = linspace(0,1,numBoundaryConditionPoints(2)); u0BC1 = zeros(1,numBoundaryConditionPoints(1)); u0BC2 = zeros(1,numBoundaryConditionPoints(2));```

Select 50 equally spaced spatial points to enforce the initial condition $u\left(x,t=0\right)=-sin\left(\pi x\right)$.

```numInitialConditionPoints = 50; x0IC = linspace(-1,1,numInitialConditionPoints); t0IC = zeros(1,numInitialConditionPoints); u0IC = -sin(pi*x0IC);```

Group together the data for initial and boundary conditions.

```X0 = [x0IC x0BC1 x0BC2]; T0 = [t0IC t0BC1 t0BC2]; U0 = [u0IC u0BC1 u0BC2];```

Select 10,000 points to enforce the output of the network to fulfill the Burger's equation.

```numInternalCollocationPoints = 10000; pointSet = sobolset(2); points = net(pointSet,numInternalCollocationPoints); dataX = 2*points(:,1)-1; dataT = points(:,2);```

Create an array datastore containing the training data.

`ds = arrayDatastore([dataX dataT]);`

### Define Deep Learning Model

Define a multilayer perceptron architecture with 9 fully connect operations with 20 hidden neurons. The first fully connect operation has two input channels corresponding to the inputs $x$ and $t$. The last fully connect operation has one output $u\left(x,t\right)$.

#### Define and Initialize Model Parameters

Define the parameters for each of the operations and include them in a structure. Use the format `parameters.OperationName_ParameterName` where `parameters` is the structure, O`perationName` is the name of the operation (for example "fc1") and `ParameterName` is the name of the parameter (for example, "Weights").

The algorithm in this example requires learnable parameters to be in the first level of the stucture, so do not use nested structures in this step. The `fmincon` function requires the learnable to be doubles.

Specify the number of layers and the number of neurons for each layer.

```numLayers = 9; numNeurons = 20;```

Initialize the parameters for the first fully connect operation. The first fully connect operation has two input channels.

```parameters = struct; sz = [numNeurons 2]; parameters.fc1_Weights = initializeHe(sz,2,"double"); parameters.fc1_Bias = initializeZeros([numNeurons 1],"double");```

Initialize the parameters for each of the remaining intermediate fully connect operations.

```for layerNumber=2:numLayers-1 name = "fc"+layerNumber; sz = [numNeurons numNeurons]; numIn = numNeurons; parameters.(name + "_Weights") = initializeHe(sz,numIn,"double"); parameters.(name + "_Bias") = initializeZeros([numNeurons 1],"double"); end```

Initialize the parameters for the final fully connect operation. The final fully connect operation has one output channel.

```sz = [1 numNeurons]; numIn = numNeurons; parameters.("fc" + numLayers + "_Weights") = initializeHe(sz,numIn,"double"); parameters.("fc" + numLayers + "_Bias") = initializeZeros([1 1],"double");```

View the network parameters.

`parameters`
```parameters = struct with fields: fc1_Weights: [20×2 dlarray] fc1_Bias: [20×1 dlarray] fc2_Weights: [20×20 dlarray] fc2_Bias: [20×1 dlarray] fc3_Weights: [20×20 dlarray] fc3_Bias: [20×1 dlarray] fc4_Weights: [20×20 dlarray] fc4_Bias: [20×1 dlarray] fc5_Weights: [20×20 dlarray] fc5_Bias: [20×1 dlarray] fc6_Weights: [20×20 dlarray] fc6_Bias: [20×1 dlarray] fc7_Weights: [20×20 dlarray] fc7_Bias: [20×1 dlarray] fc8_Weights: [20×20 dlarray] fc8_Bias: [20×1 dlarray] fc9_Weights: [1×20 dlarray] fc9_Bias: [1×1 dlarray] ```

#### Define Model and Model Loss Functions

Create the function `model`, listed in the Model Function section at the end of the example, that computes the outputs of the deep learning model. The function `model` takes as input the model parameters and the network inputs, and returns the model output.

Create the function `modelLoss`, listed in the Model Loss Function section at the end of the example, that takes as input the model parameters, the network inputs, and the initial and boundary conditions, and returns the gradients of the loss with respect to the learnable parameters and the corresponding loss.

#### Define `fmincon` Objective Function

Create the function `objectiveFunction`, listed in the `fmincon` Objective Function section of the example that returns the loss and gradients of the model. The function `objectiveFunction` takes as input, a vector of learnable parameters, the network inputs, the initial conditions, and the names and sizes of the learnable parameters, and returns the loss to be minimized by the `fmincon` function and the gradients of the loss with respect to the learnable parameters.

### Specify Optimization Options

Specify the optimization options:

• Optimize using the `fmincon` optmizer with the LBFGS algorithm for no more than 7500 iterations and function evaluations.

• Evaluate with optimality tolerance `1e-5`.

• Provide the gradients to the algorithm.

```options = optimoptions("fmincon", ... HessianApproximation="lbfgs", ... MaxIterations=7500, ... MaxFunctionEvaluations=7500, ... OptimalityTolerance=1e-5, ... SpecifyObjectiveGradient=true);```

### Train Network Using `fmincon`

Train the network using the `fmincon` function.

The `fmincon` function requires the learnable parameters to be specified as a vector. Convert the parameters to a vector using the `paramsStructToVector` function (attached to this example as a supporting file). To convert back to a structure of parameters, also return the parameter names and sizes.

```[parametersV,parameterNames,parameterSizes] = parameterStructToVector(parameters); parametersV = extractdata(parametersV);```

Convert the training data to `dlarray` objects. Specify that the inputs `X` and `T` have format `"BC"` (batch, channel) and that the initial conditions have format `"CB"` (channel, batch).

```X = dlarray(dataX,"BC"); T = dlarray(dataT,"BC"); X0 = dlarray(X0,"CB"); T0 = dlarray(T0,"CB"); U0 = dlarray(U0,"CB");```

Create a function handle with one input that defines the objective function.

`objFun = @(parameters) objectiveFunction(parameters,X,T,X0,T0,U0,parameterNames,parameterSizes);`

Update the learnable parameters using the `fmincon` function. Depending on the number of iterations, this can take a while to run. To enable a detailed verbose output, set the `Display` optimization option to `"iter-detailed"`.

`parametersV = fmincon(objFun,parametersV,[],[],[],[],[],[],[],options);`
```Solver stopped prematurely. fmincon stopped because it exceeded the function evaluation limit, options.MaxFunctionEvaluations = 7.500000e+03. ```

For prediction, convert the vector of parameters to a structure using the `parameterVectorToStruct` function (attached to this example as a supporting file).

`parameters = parameterVectorToStruct(parametersV,parameterNames,parameterSizes);`

### Evaluate Model Accuracy

For values of $t$ at 0.25, 0.5, 0.75, and 1, compare the predicted values of the deep learning model with the true solutions of the Burger's equation using the relative ${l}^{2}$ error.

Set the target times to test the model at. For each time, calculate the solution at 1001 equally spaced points in the range [-1,1].

```tTest = [0.25 0.5 0.75 1]; numPredictions = 1001; XTest = linspace(-1,1,numPredictions); XTest = dlarray(XTest,"CB");```

Test the model.

```figure for i=1:numel(tTest) t = tTest(i); TTest = t*ones(1,numPredictions); TTest = dlarray(TTest,"CB"); % Make predictions. UPred = model(parameters,XTest,TTest); % Calcualte true values. UTest = solveBurgers(extractdata(XTest),t,0.01/pi); % Calculate error. UPred = extractdata(UPred); err = norm(UPred - UTest) / norm(UTest); % Plot predictions. subplot(2,2,i) plot(XTest,UPred,"-",LineWidth=2); ylim([-1.1, 1.1]) % Plot true values. hold on plot(XTest, UTest,"--",LineWidth=2) hold off title("t = " + t + ", Error = " + gather(err)); end subplot(2,2,2) legend(["Predicted" "True"])```

The plots show how close the predictions are to the true values.

### Solve Burger's Equation Function

The `solveBurgers` function returns the true solution of Burger's equation at times `t` as outlined in [2].

```function U = solveBurgers(X,t,nu) % Define functions. f = @(y) exp(-cos(pi*y)/(2*pi*nu)); g = @(y) exp(-(y.^2)/(4*nu*t)); % Initialize solutions. U = zeros(size(X)); % Loop over x values. for i = 1:numel(X) x = X(i); % Calculate the solutions using the integral function. The boundary % conditions in x = -1 and x = 1 are known, so leave 0 as they are % given by initialization of U. if abs(x) ~= 1 fun = @(eta) sin(pi*(x-eta)) .* f(x-eta) .* g(eta); uxt = -integral(fun,-inf,inf); fun = @(eta) f(x-eta) .* g(eta); U(i) = uxt / integral(fun,-inf,inf); end end end```

### `fmincon` Objective Function

The `objectiveFunction` function defines the objective function to be used by the LBFGS algorithm.

This `objectiveFunction` funtion takes as input, a vector of learnable parameters `parametersV`, the network inputs, `X` and `T`, the initial and boundary conditions `X0`, `T0`, and `U0`, and the names and sizes of the learnable parameters `parameterNames` and `parameterSizes`, respectively, and returns the loss to be minimized by the `fmincon` function `loss` and a vector containing the gradients of the loss with respect to the learnable parameters `gradientsV`.

```function [loss,gradientsV] = objectiveFunction(parametersV,X,T,X0,T0,U0,parameterNames,parameterSizes) % Convert parameters to structure of dlarray objects. parametersV = dlarray(parametersV); parameters = parameterVectorToStruct(parametersV,parameterNames,parameterSizes); % Evaluate model loss and gradients. [loss,gradients] = dlfeval(@modelLoss,parameters,X,T,X0,T0,U0); % Return loss and gradients for fmincon. gradientsV = parameterStructToVector(gradients); gradientsV = extractdata(gradientsV); loss = extractdata(loss); end```

### Model Loss Function

The model is trained by enforcing that given an input $\left(x,t\right)$ the output of the network $u\left(x,t\right)$ fulfills the Burger's equation, the boundary conditions, and the intial condition. In particular, two quantities contribute to the loss to be minimized:

$\text{loss}={\text{MSE}}_{f}+{\text{MSE}}_{u}$,

where ${\text{MSE}}_{f}=\frac{1}{{N}_{f}}\sum _{i=1}^{{N}_{f}}{|f\left({x}_{f}^{i},{t}_{f}^{i}\right)|}^{2}$ and ${\text{MSE}}_{u}=\frac{1}{{N}_{u}}\sum _{i=1}^{{N}_{u}}{|u\left({x}_{u}^{i},{t}_{u}^{i}\right)-{u}^{i}|}^{2}$.

Here, $\left\{{x}_{u}^{i},{t}_{u}^{i}{\right\}}_{i=1}^{{N}_{u}}$ correspond to collocation points on the boundary of the computational domain and account for both boundary and initial condition. $\left\{{x}_{f}^{i},{t}_{f}^{i}{\right\}}_{i=1}^{{N}_{f}}$ are points in the interior of the domain.

Calculating ${\text{MSE}}_{f}$ requires the derivatives $\frac{\partial u}{\partial t},\frac{\partial u}{\partial x},\frac{{\partial }^{2}u}{\partial {x}^{2}}$ of the output $u$ of the model.

The function `modelLoss` takes as input, the model parameters `parameters`, the network inputs `X` and `T`, the initial and boundary conditions `X0`, `T0`, and `U0`, and returns the loss and the gradients of the loss with respect to the learnable parameters.

```function [loss,gradients] = modelLoss(parameters,X,T,X0,T0,U0) % Make predictions with the initial conditions. U = model(parameters,X,T); % Calculate derivatives with respect to X and T. gradientsU = dlgradient(sum(U,"all"),{X,T},EnableHigherDerivatives=true); Ux = gradientsU{1}; Ut = gradientsU{2}; % Calculate second-order derivatives with respect to X. Uxx = dlgradient(sum(Ux,"all"),X,EnableHigherDerivatives=true); % Calculate mseF. Enforce Burger's equation. f = Ut + U.*Ux - (0.01./pi).*Uxx; zeroTarget = zeros(size(f),"like",f); mseF = l2loss(f, zeroTarget); % Calculate mseU. Enforce initial and boundary conditions. U0Pred = model(parameters,X0,T0); mseU = l2loss(U0Pred, U0); % Calculated loss to be minimized by combining errors. loss = mseF + mseU; % Calculate gradients with respect to the learnable parameters. gradients = dlgradient(loss,parameters); end```

### Model Function

The model trained in this example consists of a series of fully connect operations with a tanh operation between each one.

The model function takes as input the model parameters `parameters` and the network inputs `X` and `T`, and returns the model output `U`.

```function U = model(parameters,X,T) XT = [X; T]; numLayers = numel(fieldnames(parameters))/2; % First fully connect operation. weights = parameters.fc1_Weights; bias = parameters.fc1_Bias; U = fullyconnect(XT,weights,bias); % tanh and fully connect operations for remaining layers. for i=2:numLayers name = "fc" + i; U = tanh(U); weights = parameters.(name + "_Weights"); bias = parameters.(name + "_Bias"); U = fullyconnect(U, weights, bias); end end```

### References

1. Maziar Raissi, Paris Perdikaris, and George Em Karniadakis, Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations https://arxiv.org/abs/1711.10561

2. C. Basdevant, M. Deville, P. Haldenwang, J. Lacroix, J. Ouazzani, R. Peyret, P. Orlandi, A. Patera, Spectral and finite difference solutions of the Burgers equation, Computers & fluids 14 (1986) 23–41.