メインコンテンツ

countPredictorsAfterCategoricalEncoding

Number of predictors in tabular data after encoding categorical variables

Since R2026a

    Description

    N = countPredictorsAfterCategoricalEncoding(Tbl) returns the number of variables in Tbl after encoding the categorical variables.

    example

    N = countPredictorsAfterCategoricalEncoding(Tbl,ResponseVarNames) returns the number of variables in Tbl after encoding the categorical variables, ignoring the specified response variables. You can use an array ResponseVarNames to specify multiple response variables.

    example

    N = countPredictorsAfterCategoricalEncoding(Tbl,formula) returns the number of predictors in the specified formula after encoding the categorical variables.

    You can use the input argument formula as an explanatory model of the response and a subset of the predictor variables in Tbl used to fit a model.

    N = countPredictorsAfterCategoricalEncoding(X) returns the number of predictors in the specified numeric array. This is equivalent to the number of columns of X. To specify the orientation of the data in X or to treat integer-valued data in X as categorical, use the ObservationsIn and CategoricalPredictors name-value arguments, respectively.

    N = countPredictorsAfterCategoricalEncoding(___,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. For example, PredictorNames=["var1" "var2"] specifies that "var1" and "var2" are the predictor variables of the input table.

    Examples

    collapse all

    Count the number of predictors in a table after encoding the categorical variables.

    Load the carbig data and convert it to a table.

    load carbig
    Tbl = table(Acceleration,Displacement,Horsepower, ...
        Model_Year,Origin,Weight,MPG);

    View the first few rows of the table. The table has seven variables.

    head(Tbl)
        Acceleration    Displacement    Horsepower    Model_Year    Origin     Weight    MPG
        ____________    ____________    __________    __________    _______    ______    ___
    
              12            307            130            70        USA         3504     18 
            11.5            350            165            70        USA         3693     15 
              11            318            150            70        USA         3436     18 
              12            304            150            70        USA         3433     16 
            10.5            302            140            70        USA         3449     17 
              10            429            198            70        USA         4341     15 
               9            454            220            70        USA         4354     14 
             8.5            440            215            70        USA         4312     14 
    

    The software creates dummy variables using two different schemes, depending on whether a categorical variable is unordered or ordered. The carbig data has one unordered categorical variable (Origin), which is encoded as full dummy variables (also known as one-hot encoded vectors). For more information, see Dummy Variables.

    Count the number of predictors in a table after encoding the categorical variables.

    N = countPredictorsAfterCategoricalEncoding(Tbl)
    N = 
    13
    

    The value of 13 corresponds to the 6 numeric-valued variables of the table plus the 7 categories of the Origin variable.

    For neural networks with complex architecture (such as, neural networks with skip connections), you can specify the architecture for the fitrnet and fitcnet functions using the Network name-value argument with a layer array or dlnetwork (Deep Learning Toolbox) object.

    For data with categorical predictors, the neural network architecture must support inputs where the categorical predictor are encoded as numeric vectors.

    There are two approaches:

    • Specify a neural network architecture that does not have an input layer. In this case, the software automatically determines the network input size based on the training data and adds an input layer with the appropriate size. This is usually the easiest approach.

    • Specify a neural network architecture that has an input layer with a size that is consistent with the training data after encoding the categorical predictors. Use this option when you want to use functionality provided by the input layer.

    Load the carbig data and convert it to a table.

    load carbig
    Tbl = table(Acceleration,Displacement,Horsepower, ...
        Model_Year,Origin,Weight,MPG);

    Specify Neural Network Without Input Layer

    Create a multilayer perceptron (MLP) neural network with a skip connection. Do not include an input layer.

    outputSize = 1;
    
    net = dlnetwork;
    
    layers = [
        fullyConnectedLayer(12)
        reluLayer(Name="relu1")
        
        fullyConnectedLayer(12)
        
        additionLayer(2,Name="add2")
        reluLayer(Name="relu2")
        
        fullyConnectedLayer(12)
        additionLayer(2,Name="add3")
        reluLayer
        
        fullyConnectedLayer(outputSize)];
    
    net = addLayers(net,layers);
    net = connectLayers(net,"relu1","add2/in2");
    net = connectLayers(net,"relu2","add3/in2");

    Train the neural network.

    Mdl = fitrnet(Tbl,"MPG",Network=net,Standardize=true)
    Mdl = 
      RegressionNeuralNetwork
               PredictorNames: {'Acceleration'  'Displacement'  'Horsepower'  'Model_Year'  'Origin'  'Weight'}
                 ResponseName: 'MPG'
        CategoricalPredictors: 5
            ResponseTransform: 'none'
              NumObservations: 398
                   LayerSizes: []
                  Activations: ''
        OutputLayerActivation: ''
                       Solver: 'LBFGS'
              ConvergenceInfo: [1×1 struct]
              TrainingHistory: [1000×7 table]
    
      View network information using dlnetwork.
    
    
      Properties, Methods
    
    

    Convert the model to a dlnetwork object and view the first layer. The first layer of the neural network is an input layer of size 12. The 12 channels correspond to the 5 numeric-valued predictors of the table plus the 7 categories of the Origin variable.

    net = dlnetwork(Mdl);
    net.Layers(1)
    ans = 
      FeatureInputLayer with properties:
    
                          Name: 'input'
                     InputSize: 12
            SplitComplexInputs: 0
    
       Hyperparameters
                 Normalization: 'zscore'
        NormalizationDimension: 'auto'
                          Mean: [15.5413 194.4120 104.4694 75.9796 0 0 0 0 0 0 0 2.9776e+03]
             StandardDeviation: [2.7589 104.6440 38.4912 3.6837 1 1 1 1 1 1 1 849.4026]
    
    

    Specify Neural Network With Input Layer

    Alternatively, create the same neural network that also includes an input layer. To determine the size for the input layer, use the countPredictorsAfterCategoricalEncoding function. To ensure that the function counts the predictors only, specify the response variable.

    inputSize = countPredictorsAfterCategoricalEncoding(Tbl,"MPG");
    outputSize = 1;
    
    net = dlnetwork;
    
    layers = [
        featureInputLayer(inputSize)
        fullyConnectedLayer(12)
        reluLayer(Name="relu1")
        
        fullyConnectedLayer(12)
        
        additionLayer(2,Name="add2")
        reluLayer(Name="relu2")
        
        fullyConnectedLayer(12)
        additionLayer(2,Name="add3")
        reluLayer
        
        fullyConnectedLayer(outputSize)];
    
    net = addLayers(net,layers);
    net = connectLayers(net,"relu1","add2/in2");
    net = connectLayers(net,"relu2","add3/in2");

    Train the neural network.

    Mdl = fitrnet(Tbl,"MPG",Network=net,Standardize=true)
    Mdl = 
      RegressionNeuralNetwork
               PredictorNames: {'Acceleration'  'Displacement'  'Horsepower'  'Model_Year'  'Origin'  'Weight'}
                 ResponseName: 'MPG'
        CategoricalPredictors: 5
            ResponseTransform: 'none'
              NumObservations: 398
                   LayerSizes: []
                  Activations: ''
        OutputLayerActivation: ''
                       Solver: 'LBFGS'
              ConvergenceInfo: [1×1 struct]
              TrainingHistory: [1000×7 table]
    
      View network information using dlnetwork.
    
    
      Properties, Methods
    
    

    Copyright 2025 The MathWorks, Inc.

    Input Arguments

    collapse all

    Sample data, specified as a table. Each row of Tbl corresponds to one observation, and each column corresponds to one predictor variable. Multicolumn variables and cell arrays other than cell arrays of character vectors are not allowed.

    • Optionally, Tbl can contain columns for the response variables and a column for the observation weights. Each response variable and the weight values must be numeric vectors.

      You must specify the response variables in Tbl by using ResponseVarName or formula and specify the observation weights in Tbl by using Weights.

      • When you specify the response variables by using ResponseVarName, countPredictorsAfterCategoricalEncoding uses the remaining variables as predictors. To use a subset of the remaining variables in Tbl as predictors, specify predictor variables by using PredictorNames.

      • When you define a model specification by using formula, countPredictorsAfterCategoricalEncoding uses a subset of the variables in Tbl as predictor variables and response variables, as specified in formula.

    Response variables names, specified as the names of variables in Tbl as a character vector, string array, or cell array of character vectors. The countPredictorsAfterCategoricalEncoding function ignores the response variables when it counts predictors. For example, if the response variable Y is stored as Tbl.Y, then specify it as "Y". Otherwise, the software treats all columns of Tbl, including Y, as predictors.

    Data Types: char | string

    Explanatory model of the response variable and a subset of the predictor variables, specified as a character vector or string scalar in the form "Y1,Y2~x1+x2+x3". In this form, Y1 and Y2 represent the response variables, and x1, x2, and x3 represent the predictor variables.

    To specify a subset of variables in Tbl as predictors for training the model, you can use a formula. If you specify a formula, then the countPredictorsAfterCategoricalEncoding function returns the number of predictors in the specified formula after encoding the categorical variables in Tbl.

    The variable names in the formula must be both variable names in Tbl (Tbl.Properties.VariableNames) and valid MATLAB® identifiers. You can verify the variable names in Tbl by using the isvarname function. If the variable names are not valid, then you can convert them by using the matlab.lang.makeValidName function.

    Data Types: char | string

    Predictor data, specified as a numeric matrix. The software treats the rows and columns of X according to the ObservationsIn name-value argument.

    Data Types: single | double

    Name-Value Arguments

    collapse all

    Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    Example: countPredictorsAfterCategoricalEncoding(Tbl,PredictorNames=["var1" "var2"]) specifies that "var1" and "var2" are the predictor variables of the input table

    Predictor variable names, specified as a string array of unique names or cell array of unique character vectors. The countPredictorsAfterCategoricalEncoding function ignores the remaining variables when it counts predictors.

    • PredictorNames must be a subset of Tbl.Properties.VariableNames and must not include the name of a response variable.

    • By default, PredictorNames contains the names of all the predictor variables.

    • A good practice is to specify the predictors using either PredictorNames or formula, but not both.

    This argument supports data specified as a table only.

    Example: PredictorNames=["SepalLength","SepalWidth","PetalLength","PetalWidth"]

    Data Types: string | cell

    Categorical predictors list, specified as one of the values in this table. The descriptions assume that the predictor data has observations in rows and predictors in columns.

    ValueDescription
    Vector of positive integers

    Each entry in the vector is an index value indicating that the corresponding predictor is categorical. The index values are between 1 and p, where p is the number of predictors in the input data before encoding.

    If countPredictorsAfterCategoricalEncoding uses a subset of input variables as predictors, then the function indexes the predictors using only the subset. The CategoricalPredictors values do not count any response variable or other variable that the function does not use.

    Logical vector

    A true entry means that the corresponding predictor is categorical. The length of the vector is the number of predictors in the input data before encoding.

    Character matrixEach row of the matrix is the name of a predictor variable. The names must match the entries in PredictorNames. Pad the names with extra blanks so each row of the character matrix has the same length.
    String array or cell array of character vectorsEach element in the array is the name of a predictor variable. The names must match the entries in PredictorNames.
    "all"All predictors are categorical.

    By default, if the predictor data is in a table (Tbl), countPredictorsAfterCategoricalEncoding assumes that a variable is categorical if it is a logical vector, categorical vector, character array, string array, or cell array of character vectors. If the predictor data is a matrix (X), countPredictorsAfterCategoricalEncoding assumes that all predictors are continuous. To identify any other predictors as categorical predictors, specify them by using the CategoricalPredictors name-value argument.

    Example: CategoricalPredictors="all"

    Data Types: single | double | logical | char | string | cell

    Predictor data observation dimension, specified as one of these values:

    • "rows" — Rows correspond to observations and columns correspond to predictors.

    • "columns" — Columns correspond to observations and rows correspond to predictors.

    This argument supports data specified as a numeric array only.

    Data Types: char | string

    Weights variable name, specified as the name of a variable in Tbl. The countPredictorsAfterCategoricalEncoding function ignores the weights variable when it counts predictors.

    Data Types: char | string

    Output Arguments

    collapse all

    Number of predictors in a table after encoding categorical variables, returned as a nonnegative integer.

    Data Types: double

    Algorithms

    collapse all

    Extended Capabilities

    expand all

    Version History

    Introduced in R2026a