File Exchange

Boosted Generalized Additive Models (bgam) package

version 1.4.0.0 (25.5 KB) by Patrick Mineault

Patrick Mineault (view profile)

Boosting for the Generalized Additive and Linear Models (GAM and GLM).

Updated 02 Jul 2011

bgam - Boosted Generalized Additive Models package
---
Implements boosting for the Generalized Additive and Linear Models (GAM and GLM).
Extensible, fully documented. Implements linear and stub learners,
least-squares/logistic/Poisson regression.

The generalized linear model (GLM) is a flexible generalization of ordinary
least squares regression. The GLM generalizes linear regression by allowing
the linear model to be related to the response variable via a link function
and by allowing the magnitude of the variance of each measurement to be a
function of its predicted value. (Wikipedia)

A common example of a GLM is binomial-logistic distribution/inverse link
GLM (aka logistic regression), where:

eta = X*w, y ~ Binomial( logistic (eta ))

This GLM allows one to tackle classification problems (where the output is 0 or 1)
in a quasi-linear way.

The generalized additive model (GAM) is a generalization of the GLM where the internal
dynamics are nonlinear, but nevertheless additive:

eta_i = f_1(X^(i,1)) + f_2(X^(i,2)) + ...

f_i are known as smoothers or (in the context of boosting) as learners. Boosting is
a method of fitting GAMs and by extension GLMs by building up a model (eta) iteratively,
by, at every iteration, adding to the model the learner most similar to the gradient
of the likelihood with respect to eta. Regularization is usually done by early-stopping
where the optimal number of iterations is determined through validation.

bgam is a well-documented package that implements boosting with GAMs.
It currently implements linear learners and stubs (depth-1 trees). Implemented distro-link
combos include Gaussian/identity, Binomial/Logistic, Poisson/exponential. The package is
object-oriented and new distro-link combos and learners can be implemented and used
with ease. The package includes facilities for cross-validation, including a parallel implementation through the parallel computing toolbox. It also allows a subset
of the data to be used at any boosting iteration (stochastic gradient boosting).

Open up TestBgam.m in the editor for several usage examples.

Contributions and requests for new features are welcome.
Author: Patrick Mineault (patrick DOT mineault AT gmail DOT com)

References:

Friedman, Hastie and Tibshirani. Additive logistic regression: a
statistical view of boosting. Ann. Statist. Volume 28, Number 2 (2000),
337-407.
Bühlmann and Hothorn. Boosting Algorithms: Regularization, Prediction
and Model Fitting. Statist. Sci. Volume 22, Number 4 (2007), 477-505.
Wood. Generalized Additive Models: an introduction with R. CRC Press,
2006.
Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive
Models. Chapman & Hall/CRC.

Mohammad Saber (view profile)

I have the same question as Michael. I am searching for a package equivalent to 'gam' or 'mgcv' package in R. Is 'bgam' equivalent to above mentioned R packages?

Michael

Michael (view profile)

Hello Patrick!
I try to understand how your bgam package works and I have a question about learners. I started using Matlab a few month ago and my primary tool is R. So I just wondering is it possible to use your function in the same manner as the additive model in R, where the function call is like gam(formula,family=gaussian(),data...). Where in formula you may define the smoothing functions as (x_1^3+x_1*x_2+s(x_3)+ts(x_4)...), where you can use other function like splines and tensor products for example. Well, my question is how can I use other smoothing function in Matlab with your code and how can I define which variable from data is assigned to certain smoothing function? Thank you.

Patrick Mineault

Patrick Mineault (view profile)

I replied to Brendan by email, but for other having the same problem: yes, you need to add back in the offset of the model (thefit.cs(end)) manually to do your predictions, since thefit.evaluate evaluates *only* the learners.

Brendan

Brendan (view profile)

I created a toy example in an attempt to understand how the BGAM package works. The results from "evaluate" are not what I would have expected from the documentation. It seems like it does a poor job predicting (despite claiming very low deviation) when the output of evaluate is used directly. Adding a constant term (thefit.cs(end)) to the output of evaluate seems to restore the accuracy I would have expected. Am I misunderstanding how the package should be used?

Here's the code I ran:

trainer = bgam.train.Stub();

fparams = bgam.FitParams();
fparams.displayFreq = 1;
fparams.niters = 5;
fparams.beta = 1;
fparams.fitFraction = 1;

% Create a 100x2 input matrix X that lists every combination of
% [1..10] from each column.
[I,J] = ind2sub([10 10], 1:100);
X = [I' J'];

% Y is 1 if either of the two inputs is >=6
Y = double(X(:,1) >= 3 | X(:,2) >= 6);

thefit = fitbgam(Y,X,trainer,fparams);
Y_predict = thefit.evaluate(X);
% Is there a more direct way of calculating a prediction for Y using
Y_prob = 1 ./ (1 + exp(-Y_predict));
fprintf(1, 'Correlation = %f\n', corr(Y_prob, Y));

Y_predict2 = thefit.evaluate(X) + thefit.cs(end);
Y_prob2 = 1 ./ (1 + exp(-Y_predict2));
fprintf(1, 'Correlation (adding cs) = %f\n', corr(Y_prob2, Y));

% Output from above code
% Iter Deviance D^2
% 0 65.0 0.000
% 1 27.8 0.573
% 2 0.8 0.987
% 3 0.0 0.999
% 4 0.0 1.000
% 5 0.0 1.000
% Correlation = 0.275188
% Correlation (adding cs) = 1.000000

Patrick Mineault

Patrick Mineault (view profile)

+ is Matlab's convention for packages, which are a way to organize object-oriented code. This allows the class Fit in this package, for example, to be referred to as bgam.Fit without clashing with other classes also named Fit. You simply need to add the parent folder of +bgam to your path.

http://www.mathworks.com/help/techdoc/matlab_oop/brfynt_-1.html

Hyun Gu Kang

Hyun Gu Kang (view profile)

How to you call functions when they are in the folder +bgam? I realize this is a MATLAB question, but what does the + do?

 2 Jul 2011 1.4.0.0 Support for parallel cross-validation through Parallel Computing Toolbox supported 4 Mar 2011 1.3.0.0 Added .mex file for bgam.train.Stub Tweaked cross-validation code 25 Jan 2011 1.2.0.0 Updated requirements (2009b) 24 Jan 2011 1.1.0.0 Removed temporary .m~ files from .zip
MATLAB Release Compatibility
Created with R2009b
Compatible with any release
Platform Compatibility
Windows macOS Linux