fitrchains
Syntax
Description
        returns a trained multiresponse regression model Mdl = fitrchains(Tbl,ResponseVarNames)Mdl by using
        regression chains. The function trains the model using the predictors in the table
          Tbl and the response values in the
          ResponseVarNames table variables. For more information, see Regression Chains.
        specifies options using one or more name-value arguments in addition to any of the input
        argument combinations in previous syntaxes. For example, you can specify the type of model
        to use in the regression chains by setting the Mdl = fitrchains(___,Name=Value)Learner name-value
        argument.
Examples
Create a regression model with more than one response variable by using fitrchains.
Load the carbig data set, which contains measurements of cars made in the 1970s and early 1980s. Create a table containing the predictor variables Displacement, Horsepower, and so on, as well as the response variables Acceleration and MPG. Display the first eight rows of the table.
load carbig cars = table(Displacement,Horsepower,Model_Year, ... Origin,Weight,Acceleration,MPG); head(cars)
    Displacement    Horsepower    Model_Year    Origin     Weight    Acceleration    MPG
    ____________    __________    __________    _______    ______    ____________    ___
        307            130            70        USA         3504           12        18 
        350            165            70        USA         3693         11.5        15 
        318            150            70        USA         3436           11        18 
        304            150            70        USA         3433           12        16 
        302            140            70        USA         3449         10.5        17 
        429            198            70        USA         4341           10        15 
        454            220            70        USA         4354            9        14 
        440            215            70        USA         4312          8.5        14 
Categorize the cars based on whether they were made in the USA.
cars.Origin = categorical(cellstr(cars.Origin)); cars.Origin = mergecats(cars.Origin,["France","Japan",... "Germany","Sweden","Italy","England"],"NotUSA");
Partition the data into training and test sets. Use approximately 85% of the observations to train a multiresponse model, and 15% of the observations to test the performance of the trained model on new data. Use cvpartition to partition the data.
rng("default") % For reproducibility c = cvpartition(height(cars),"Holdout",0.15); carsTrain = cars(training(c),:); carsTest = cars(test(c),:);
Train a multiresponse regression model by passing the carsTrain training data to the fitrchains function. By default, the function uses bagged ensembles of trees in the regression chains.
Mdl = fitrchains(carsTrain,["Acceleration","MPG"])
Mdl = 
  RegressionChainEnsemble
           PredictorNames: {'Displacement'  'Horsepower'  'Model_Year'  'Origin'  'Weight'}
             ResponseName: ["Acceleration"    "MPG"]
    CategoricalPredictors: 4
                NumChains: 2
            LearnedChains: {2×2 cell}
          NumObservations: 338
  Properties, Methods
Mdl is a trained RegressionChainEnsemble model object. You can use dot notation to access the properties of Mdl. For example, you can specify Mdl.Learners to see the bagged ensembles used to train the model.
Evaluate the performance of the regression model on the test set by computing the test mean squared error (MSE). Smaller MSE values indicate better performance. Return the loss for each response variable separately by setting the OutputType name-value argument to "per-response".
testMSE = loss(Mdl,carsTest,["Acceleration","MPG"], ... OutputType="per-response")
testMSE = 1×2
    2.4921    9.0568
Predict the response values for the observations in the test set. Return the predicted response values as a table.
predictedY = predict(Mdl,carsTest,OutputType="table")predictedY=60×2 table
    Acceleration     MPG  
    ____________    ______
       12.573       16.109
        10.78       13.988
       11.282       12.963
       15.185       21.066
       12.203       13.773
       13.216       14.216
       17.117       30.199
       16.478       29.033
       13.439       14.208
       11.552       13.066
       13.398       13.271
       14.848       20.927
       16.552       24.603
       12.501       15.359
       15.778       19.328
       12.343       13.185
      ⋮
Train a multiresponse regression model using regression chains. Specify the type of regression models to use in the regression chains, and train the models with predicted values for response variables used as predictors.
Load the carbig data set, which contains measurements of cars made in the 1970s and early 1980s. Create a table containing the predictor variables Displacement, Horsepower, and so on, as well as the response variables Acceleration and MPG. Display the first eight rows of the table.
load carbig cars = table(Displacement,Horsepower,Model_Year, ... Origin,Weight,Acceleration,MPG); head(cars)
    Displacement    Horsepower    Model_Year    Origin     Weight    Acceleration    MPG
    ____________    __________    __________    _______    ______    ____________    ___
        307            130            70        USA         3504           12        18 
        350            165            70        USA         3693         11.5        15 
        318            150            70        USA         3436           11        18 
        304            150            70        USA         3433           12        16 
        302            140            70        USA         3449         10.5        17 
        429            198            70        USA         4341           10        15 
        454            220            70        USA         4354            9        14 
        440            215            70        USA         4312          8.5        14 
Categorize the cars based on whether they were made in the USA.
cars.Origin = categorical(cellstr(cars.Origin)); cars.Origin = mergecats(cars.Origin,["France","Japan",... "Germany","Sweden","Italy","England"],"NotUSA");
Remove observations with missing values.
cars = rmmissing(cars);
Train a multiresponse regression model by passing the cars data to the fitrchains function. Use regression chains composed of regression support vector machine (SVM) models with standardized numeric predictors. When training the SVM models, use the predicted values for the response variables that are treated as predictors.
Mdl = fitrchains(cars,["Acceleration","MPG"], ... Learner=templateSVM(Standardize=true), ... ChainPredictedResponse=true);
Mdl is a trained RegressionChainEnsemble model object. You can use dot notation to access the properties of Mdl.
Display the order of the response variables in the regression chains in Mdl, and display the trained regression SVM models in the regression chains.
Mdl.ChainOrders
ans = 2×2
     1     2
     2     1
Mdl.Learners
ans=2×2 cell array
    {1×1 classreg.learning.regr.CompactRegressionSVM}    {1×1 classreg.learning.regr.CompactRegressionSVM}
    {1×1 classreg.learning.regr.CompactRegressionSVM}    {1×1 classreg.learning.regr.CompactRegressionSVM}
In the first regression chain, the first SVM model uses Acceleration as the response variable. The second SVM model uses MPG as the response variable and the predicted values for Acceleration as a predictor variable. The first SVM model provides the predicted Acceleration values used by the second SVM model.
Recall that the SVM models use standardized numeric predictors. Find the means (Mu) and standard deviations (Sigma) used by the second model in the first regression chain.
Chain1Model2 = Mdl.Learners{1,2};
Mdl.PredictorNamesans = 1×5 cell
    {'Displacement'}    {'Horsepower'}    {'Model_Year'}    {'Origin'}    {'Weight'}
Chain1Model2.ExpandedPredictorNames
ans = 1×7 cell
    {'x1'}    {'x2'}    {'x3'}    {'x4 == 1'}    {'x4 == 2'}    {'x5'}    {'x6'}
Chain1Model2.Mu
ans = 1×7
103 ×
    0.1944    0.1045    0.0760         0         0    2.9776    0.0153
Chain1Model2.Sigma
ans = 1×7
  104.6440   38.4912    3.6837    1.0000    1.0000  849.4026    2.2190
The SVM model uses five numeric predictors: Displacement (x1), Horsepower (x2), Model_Year (x3), Weight (x5), and the predicted values for Acceleration (x6). The software uses the corresponding Mu and Sigma values to standardize the predictor data before predicting with the predict object function.
The categorical predictor Origin is split into two variables (x4 == 1 and x4 == 2) after categorical expansion. The corresponding Mu and Sigma values indicate that the two variables are unchanged after standardization.
Input Arguments
Sample data used to train the model, specified as a table. Each row of
              Tbl corresponds to one observation, and each column corresponds
            to one variable. Multicolumn variables and cell arrays other than cell arrays of
            character vectors are not allowed.
Tbl must contain columns for the response variables and can
            contain a column for the observation weights. Each response and observation weight
            variable must be a numeric vector.
You must specify the response variables in Tbl by using
              ResponseVarNames or formula, and specify the
            observation weights in Tbl by using Weights.
When you specify the response variables by using
ResponseVarNames,fitrchainsuses the remaining variables as predictors. To use a subset of the remaining variables inTblas predictors, specify predictor variables by usingPredictorNames.When you define a model specification by using
formula,fitrchainsuses a subset of the variables inTblas predictor variables and response variables, as specified informula.
Data Types: table
Names of the response variables, specified as the names of variables in
              Tbl. Each response variable must be a numeric vector.
You must specify ResponseVarNames as a string array or a cell
            array of character vectors. For example, if Tbl stores the response
            variables Y1 and Y2 as Tbl.Y1
            and Tbl.Y2, respectively, then specify
              ResponseVarNames as ["Y1","Y2"]. Otherwise,
            the software treats the Y1 and Y2 columns of
              Tbl as predictors when training the model.
Data Types: string | cell
Explanatory model of the response variables and a subset of the predictor variables,
            specified as character vector or string scalar in the form
              "Y1,Y2~x1+x2+x3". In this form, Y1 and
              Y2 represent the response variables, and x1,
              x2, and x3 represent the predictor
            variables.
To specify a subset of variables in Tbl as predictors for
            training the model, use a formula. If you specify a formula, then the software does not
            use any variables in Tbl that do not appear in
              formula, except for observation weights (if specified).
The variable names in the formula must be both variable names in Tbl
            (Tbl.Properties.VariableNames) and valid MATLAB® identifiers. You can verify the variable names in Tbl by
        using the isvarname function. If the variable names
        are not valid, then you can convert them by using the matlab.lang.makeValidName function.
Data Types: char | string
Response data, specified as a numeric matrix or table. Each row corresponds to an
            observation, and each column corresponds to a response variable. Y
            must have the same number of rows as the predictor data X.
Data Types: single | double | table
Predictor data, specified as a numeric matrix or table. Each row corresponds to an
            observation, and each column corresponds to a predictor. Optionally, when
              X is a table, it can contain a column for the observation
            weights. X and Y must have the same number of
            rows.
If
Xis a matrix, you can specify the names of the predictors in the order of their appearance inXby using thePredictorNamesname-value argument.If
Xis a table, you can use a subset of the variables inXas predictors. To do so, specify predictor variables by usingPredictorNames.
Data Types: single | double
Note
The software treats NaN, empty character vector
          (''), empty string (""),
          <missing>, and <undefined> elements as missing
        data. Before training Mdl, the software removes observations with
        missing values in the response data, although the model retains the observations in its data
        properties (for example, Mdl.X and Mdl.Y). The
        treatment of observations with missing values in the predictor data depends on the
        regression model type specified by the Learner
        name-value argument.
Name-Value Arguments
Specify optional pairs of arguments as
      Name1=Value1,...,NameN=ValueN, where Name is
      the argument name and Value is the corresponding value.
      Name-value arguments must appear after other arguments, but the order of the
      pairs does not matter.
    
Example: fitrchains(Tbl,["Y1","Y2"],Learner="svm",ChainPredictedResponse=true)
        creates a support vector machine (SVM) regression model with two response variables and uses
        predicted responses in the regression chains to train the model.
Order of the response variables in the regression chain, specified as a positive integer vector. For more information, see Regression Chains.
If you specify ChainOrder, Mdl contains
              only one regression chain.
Example: ChainOrder=[1 3 2]
Data Types: single | double
Flag to use predicted responses in the regression chains, specified as a numeric
              or logical 0 (false) or 1
                (true).
A value of
0indicates to train models with observed values for response variables used as predictors.A value of
1indicates to train models with predicted values for response variables used as predictors.
For more information, see Regression Chains.
Example: ChainPredictedResponse=true
Data Types: single | double | logical
Type of regression model to train, specified as one of the values in this table.
| Value | Regression Model Type | 
|---|---|
"bag" or templateEnsemble template (with
                        the method specified as "Bag" and the weak learners
                        specified as "Tree") | Bagged ensemble of trees | 
"gam" or templateGAM template | General additive model (GAM) | 
"gp" or templateGP template | Gaussian process regression (GPR) | 
"kernel" or templateKernel template | Kernel model | 
"linear" or templateLinear template | Linear model | 
"lsboost" or templateEnsemble template (with
                        the method specified as "LSBoost" and the weak learners
                        specified as "Tree") | Boosted ensemble of trees | 
"svm" or templateSVM template | Support vector machine (SVM) | 
"tree" or templateTree template | Decision tree | 
Example: Learner="svm"
Example: Learner=templateEnsemble("LSBoost",50,"Tree")
Maximum number of regression chains, specified as a positive scalar. Because each
              regression chain contains one regression model for each response variable, specify
                MaxNumChains to limit the total number of regression models to
              train.
Example: MaxNumChains=5
Data Types: single | double
Categorical predictors list, specified as one of the values in this table.
| Value | Description | 
|---|---|
| Vector of positive integers | 
                     Each entry in the vector is an index value indicating that the corresponding predictor is
        categorical. The index values are between 1 and  If   | 
| Logical vector | 
                     A   | 
| Character matrix | Each row of the matrix is the name of a predictor variable. The names must match the entries in PredictorNames. Pad the names with extra blanks so each row of the character matrix has the same length. | 
| String array or cell array of character vectors | Each element in the array is the name of a predictor variable. The names must match the entries in PredictorNames. | 
"all" | All predictors are categorical. | 
By default, if the predictor data is in a table, fitrchains
              assumes that a variable is categorical if it is a logical vector, categorical vector,
              character array, string array, or cell array of character vectors. However, learners
              that use decision trees assume that mathematically ordered categorical vectors are
              continuous variables. If the predictor data is a matrix,
                fitrchains assumes that all predictors are continuous. To
              identify any other predictors as categorical predictors, specify them by using the
                CategoricalPredictors name-value argument.
The software creates dummy variables based on the Learner
              name-value argument and the underlying fitting function used to create the regression
              models in the Learners property of Mdl. For more information on
              how fitting functions treat categorical predictors, see Automatic Creation of Dummy Variables.
Example: CategoricalPredictors="all"
Data Types: single | double | logical | char | string | cell
Options for computing in parallel and setting random streams, specified as a
            structure. Create the Options structure using statset. This table lists the option fields and their
                values.
| Field Name | Value | Default | 
|---|---|---|
UseParallel | Set this value to true to run computations in
                                parallel. | false | 
UseSubstreams | Set this value to  To compute
                                    reproducibly, set   | false | 
Streams | Specify this value as a RandStream object or
                                cell array of such objects. Use a single object except when the
                                    UseParallel value is true
                                and the UseSubstreams value is
                                    false. In that case, use a cell array that
                                has the same size as the parallel pool. | If you do not specify Streams, then
                                    fitrchains uses the default stream or
                                streams. | 
Note
You need Parallel Computing Toolbox™ to run computations in parallel.
Example: Options=statset(UseParallel=true,UseSubstreams=true,Streams=RandStream("mlfg6331_64"))
Data Types: struct
Predictor variable names, specified as a string array or a cell array of character vectors.
If you supply predictor data using a numeric matrix, then you can use
PredictorNamesto assign names to the predictor variables.The order of the names in
PredictorNamesmust correspond to the order of the columns in the matrix.By default,
PredictorNamesis{'x1','x2',...}.
If you supply predictor data using a table, then you can use
PredictorNamesto specify which variables to use as predictors during training.PredictorNamesmust be a subset of the variable names in the table and cannot include the names of response variables.By default,
PredictorNamescontains the names of all predictor variables.
Example: PredictorNames=["SepalLength","SepalWidth","PetalLength","PetalWidth"]
Data Types: string | cell
Response variable names, specified as a string array or a cell array of character vectors.
If you supply
Y, then you can useResponseNameto specify names for the response variables.If you supply
ResponseVarNamesorformula, then you cannot useResponseName.
Example: ResponseName=["Response1","Response2"]
Data Types: string | cell
Observation weights, specified as a nonnegative numeric vector or the name of a
              variable in X or Tbl. The software weights
              each observation in X or Tbl with the
              corresponding value in Weights. The length of
                Weights must equal the number of observations in
                X or Tbl.
If you specify the input data as a table, then Weights can be
              the name of a variable in the table that contains a numeric vector. In this case, you
              must specify Weights as a character vector or string scalar. For
              example, if the weights vector W is stored as
                Tbl.W, then specify it as "W". Otherwise, the
              software treats the W column of Tbl as a
              predictor during the training process.
By default, Weights is ones(n,1), where
                n is the number of observations in X or
                Tbl.
Before training, fitrchains normalizes the weights to sum to
              1.
Data Types: single | double | char | string
Output Arguments
Multiresponse regression model, returned as a RegressionChainEnsemble model object. To access the properties of
              Mdl, use dot notation.
Algorithms
A regression chain is a sequence of regression models in which the response variables for previous models become predictor variables for subsequent models. If the training data consists of p predictor variables and k response variables, then a regression chain includes exactly k models, each with a different response variable. The first model has p predictors, the second model has p+1 predictors, and so on, with the last model having p+k–1 predictors.
For example, suppose that the predictor data in X or
          Tbl consists of three variables, x1,
          x2, and x3, and the response data in
          Y or Tbl consists of two variables,
          y1 and y2. A regression chain with the chain order
          [2 1] (ChainOrder) consists of a model trained on
        the predictor data [x1, x2,
            x3] and the response variable y2, followed by a model
        trained on the predictor data [x1, x2, x3,
              y2] and the response variable y1.
If you specify to use predicted responses in regression chains
          (ChainPredictedResponse), the predictor data for the second model is [x1, x2, x3,
              yfit2], where yfit2 contains the predicted responses returned
        by the first model.
In general, fitrchains returns an ensemble of regression chains
          Mdl, where each row of Mdl.Learners corresponds to
        one regression chain.
References
[1] Spyromitros-Xioufis, Eleftherios, Grigorios Tsoumakas, William Groves, and Ioannis Vlahavas. "Multi-Target Regression via Input Space Expansion: Treating Targets as Inputs." Machine Learning 104, no. 1 (July 2016): 55–98. https://doi.org/10.1007/s10994-016-5546-z.
Extended Capabilities
To run in parallel, specify the Options name-value argument in the call to
                        this function and set the UseParallel field of the
                        options structure to true using
                                    statset:
Options=statset(UseParallel=true)
For more information about parallel computing, see Run MATLAB Functions with Automatic Parallel Support (Parallel Computing Toolbox).
Version History
Introduced in R2024b
See Also
RegressionChainEnsemble | CompactRegressionChainEnsemble | loss | predict
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Web サイトの選択
Web サイトを選択すると、翻訳されたコンテンツにアクセスし、地域のイベントやサービスを確認できます。現在の位置情報に基づき、次のサイトの選択を推奨します:
また、以下のリストから Web サイトを選択することもできます。
最適なサイトパフォーマンスの取得方法
中国のサイト (中国語または英語) を選択することで、最適なサイトパフォーマンスが得られます。その他の国の MathWorks のサイトは、お客様の地域からのアクセスが最適化されていません。
南北アメリカ
- América Latina (Español)
 - Canada (English)
 - United States (English)
 
ヨーロッパ
- Belgium (English)
 - Denmark (English)
 - Deutschland (Deutsch)
 - España (Español)
 - Finland (English)
 - France (Français)
 - Ireland (English)
 - Italia (Italiano)
 - Luxembourg (English)
 
- Netherlands (English)
 - Norway (English)
 - Österreich (Deutsch)
 - Portugal (English)
 - Sweden (English)
 - Switzerland
 - United Kingdom (English)