countPredictorsAfterCategoricalEncoding

Number of predictors in tabular data after encoding categorical variables

Since R2026a

collapse all in page

Syntax

N = countPredictorsAfterCategoricalEncoding(Tbl)

N = countPredictorsAfterCategoricalEncoding(Tbl,ResponseVarNames)

N = countPredictorsAfterCategoricalEncoding(Tbl,formula)

N = countPredictorsAfterCategoricalEncoding(X)

N = countPredictorsAfterCategoricalEncoding(___,Name=Value)

Description

N = countPredictorsAfterCategoricalEncoding(Tbl) returns the number of variables in Tbl after encoding the categorical variables.

example

N = countPredictorsAfterCategoricalEncoding(Tbl,ResponseVarNames) returns the number of variables in Tbl after encoding the categorical variables, ignoring the specified response variables. You can use an array ResponseVarNames to specify multiple response variables.

example

N = countPredictorsAfterCategoricalEncoding(Tbl,formula) returns the number of predictors in the specified formula after encoding the categorical variables.

You can use the input argument formula as an explanatory model of the response and a subset of the predictor variables in Tbl used to fit a model.

N = countPredictorsAfterCategoricalEncoding(X) returns the number of predictors in the specified numeric array. This is equivalent to the number of columns of X. To specify the orientation of the data in X or to treat integer-valued data in X as categorical, use the ObservationsIn and CategoricalPredictors name-value arguments, respectively.

N = countPredictorsAfterCategoricalEncoding(___,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. For example, PredictorNames=["var1" "var2"] specifies that "var1" and "var2" are the predictor variables of the input table.

Examples

collapse all

Count Categorical Predictors After Encoding

Open Live Script

Count the number of predictors in a table after encoding the categorical variables.

Load the carbig data and convert it to a table.

load carbig
Tbl = table(Acceleration,Displacement,Horsepower, ...
    Model_Year,Origin,Weight,MPG);

View the first few rows of the table. The table has seven variables.

head(Tbl)

    Acceleration    Displacement    Horsepower    Model_Year    Origin     Weight    MPG
    ____________    ____________    __________    __________    _______    ______    ___

          12            307            130            70        USA         3504     18 
        11.5            350            165            70        USA         3693     15 
          11            318            150            70        USA         3436     18 
          12            304            150            70        USA         3433     16 
        10.5            302            140            70        USA         3449     17 
          10            429            198            70        USA         4341     15 
           9            454            220            70        USA         4354     14 
         8.5            440            215            70        USA         4312     14

The software creates dummy variables using two different schemes, depending on whether a categorical variable is unordered or ordered. The carbig data has one unordered categorical variable (Origin), which is encoded as full dummy variables (also known as one-hot encoded vectors). For more information, see Dummy Variables.

Count the number of predictors in a table after encoding the categorical variables.

N = countPredictorsAfterCategoricalEncoding(Tbl)

N = 
13

The value of 13 corresponds to the 6 numeric-valued variables of the table plus the 7 categories of the Origin variable.

Define Custom Neural Network Architecture for Categorical Predictors

This example uses:

Open Live Script

For neural networks with complex architecture (such as, neural networks with skip connections), you can specify the architecture for the fitrnet and fitcnet functions using the Network name-value argument with a layer array or dlnetwork (Deep Learning Toolbox) object.

For data with categorical predictors, the neural network architecture must support inputs where the categorical predictor are encoded as numeric vectors.

There are two approaches:

Specify a neural network architecture that does not have an input layer. In this case, the software automatically determines the network input size based on the training data and adds an input layer with the appropriate size. This is usually the easiest approach.
Specify a neural network architecture that has an input layer with a size that is consistent with the training data after encoding the categorical predictors. Use this option when you want to use functionality provided by the input layer.

Load the carbig data and convert it to a table.

load carbig
Tbl = table(Acceleration,Displacement,Horsepower, ...
    Model_Year,Origin,Weight,MPG);

Specify Neural Network Without Input Layer

Create a multilayer perceptron (MLP) neural network with a skip connection. Do not include an input layer.

outputSize = 1;

net = dlnetwork;

layers = [
    fullyConnectedLayer(12)
    reluLayer(Name="relu1")
    
    fullyConnectedLayer(12)
    
    additionLayer(2,Name="add2")
    reluLayer(Name="relu2")
    
    fullyConnectedLayer(12)
    additionLayer(2,Name="add3")
    reluLayer
    
    fullyConnectedLayer(outputSize)];

net = addLayers(net,layers);
net = connectLayers(net,"relu1","add2/in2");
net = connectLayers(net,"relu2","add3/in2");

Train the neural network.

Mdl = fitrnet(Tbl,"MPG",Network=net,Standardize=true)

Mdl = 
  RegressionNeuralNetwork
           PredictorNames: {'Acceleration'  'Displacement'  'Horsepower'  'Model_Year'  'Origin'  'Weight'}
             ResponseName: 'MPG'
    CategoricalPredictors: 5
        ResponseTransform: 'none'
          NumObservations: 398
               LayerSizes: []
              Activations: ''
    OutputLayerActivation: ''
                   Solver: 'LBFGS'
          ConvergenceInfo: [1×1 struct]
          TrainingHistory: [1000×7 table]

  View network information using dlnetwork.


  Properties, Methods

Convert the model to a dlnetwork object and view the first layer. The first layer of the neural network is an input layer of size 12. The 12 channels correspond to the 5 numeric-valued predictors of the table plus the 7 categories of the Origin variable.

net = dlnetwork(Mdl);
net.Layers(1)

ans = 
  FeatureInputLayer with properties:

                      Name: 'input'
                 InputSize: 12
        SplitComplexInputs: 0

   Hyperparameters
             Normalization: 'zscore'
    NormalizationDimension: 'auto'
                      Mean: [15.5413 194.4120 104.4694 75.9796 0 0 0 0 0 0 0 2.9776e+03]
         StandardDeviation: [2.7589 104.6440 38.4912 3.6837 1 1 1 1 1 1 1 849.4026]

Specify Neural Network With Input Layer

Alternatively, create the same neural network that also includes an input layer. To determine the size for the input layer, use the countPredictorsAfterCategoricalEncoding function. To ensure that the function counts the predictors only, specify the response variable.

inputSize = countPredictorsAfterCategoricalEncoding(Tbl,"MPG");
outputSize = 1;

net = dlnetwork;

layers = [
    featureInputLayer(inputSize)
    fullyConnectedLayer(12)
    reluLayer(Name="relu1")
    
    fullyConnectedLayer(12)
    
    additionLayer(2,Name="add2")
    reluLayer(Name="relu2")
    
    fullyConnectedLayer(12)
    additionLayer(2,Name="add3")
    reluLayer
    
    fullyConnectedLayer(outputSize)];

net = addLayers(net,layers);
net = connectLayers(net,"relu1","add2/in2");
net = connectLayers(net,"relu2","add3/in2");

Train the neural network.

Mdl = fitrnet(Tbl,"MPG",Network=net,Standardize=true)

Mdl = 
  RegressionNeuralNetwork
           PredictorNames: {'Acceleration'  'Displacement'  'Horsepower'  'Model_Year'  'Origin'  'Weight'}
             ResponseName: 'MPG'
    CategoricalPredictors: 5
        ResponseTransform: 'none'
          NumObservations: 398
               LayerSizes: []
              Activations: ''
    OutputLayerActivation: ''
                   Solver: 'LBFGS'
          ConvergenceInfo: [1×1 struct]
          TrainingHistory: [1000×7 table]

  View network information using dlnetwork.


  Properties, Methods

Input Arguments

collapse all

`Tbl` — Sample data
table

Sample data, specified as a table. Each row of Tbl corresponds to one observation, and each column corresponds to one predictor variable. Multicolumn variables and cell arrays other than cell arrays of character vectors are not allowed.

Optionally, Tbl can contain columns for the response variables and a column for the observation weights. Each response variable and the weight values must be numeric vectors.
You must specify the response variables in Tbl by using ResponseVarName or formula and specify the observation weights in Tbl by using Weights.
- When you specify the response variables by using ResponseVarName, countPredictorsAfterCategoricalEncoding uses the remaining variables as predictors. To use a subset of the remaining variables in Tbl as predictors, specify predictor variables by using PredictorNames.
- When you define a model specification by using formula, countPredictorsAfterCategoricalEncoding uses a subset of the variables in Tbl as predictor variables and response variables, as specified in formula.

`ResponseVarNames` — Response variables names
names of variable in `Tbl`

Response variables names, specified as the names of variables in Tbl as a character vector, string array, or cell array of character vectors. The countPredictorsAfterCategoricalEncoding function ignores the response variables when it counts predictors. For example, if the response variable Y is stored as Tbl.Y, then specify it as "Y". Otherwise, the software treats all columns of Tbl, including Y, as predictors.

Data Types: char | string

`formula` — Explanatory model of response variable and subset of predictor variables
character vector | string scalar

Explanatory model of the response variable and a subset of the predictor variables, specified as a character vector or string scalar in the form "Y1,Y2~x1+x2+x3". In this form, Y1 and Y2 represent the response variables, and x1, x2, and x3 represent the predictor variables.

To specify a subset of variables in Tbl as predictors for training the model, you can use a formula. If you specify a formula, then the countPredictorsAfterCategoricalEncoding function returns the number of predictors in the specified formula after encoding the categorical variables in Tbl.

The variable names in the formula must be both variable names in Tbl (Tbl.Properties.VariableNames) and valid MATLAB^® identifiers. You can verify the variable names in Tbl by using the isvarname function. If the variable names are not valid, then you can convert them by using the matlab.lang.makeValidName function.

Data Types: char | string

`X` — Predictor data
numeric matrix

Predictor data, specified as a numeric matrix. The software treats the rows and columns of X according to the ObservationsIn name-value argument.

Data Types: single | double

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: countPredictorsAfterCategoricalEncoding(Tbl,PredictorNames=["var1" "var2"]) specifies that "var1" and "var2" are the predictor variables of the input table

`PredictorNames` — Predictor variable names
string array of unique names | cell array of unique character vectors

Predictor variable names, specified as a string array of unique names or cell array of unique character vectors. The countPredictorsAfterCategoricalEncoding function ignores the remaining variables when it counts predictors.

PredictorNames must be a subset of Tbl.Properties.VariableNames and must not include the name of a response variable.
By default, PredictorNames contains the names of all the predictor variables.
A good practice is to specify the predictors using either PredictorNames or formula, but not both.

This argument supports data specified as a table only.

Example: PredictorNames=["SepalLength","SepalWidth","PetalLength","PetalWidth"]

Data Types: string | cell

`CategoricalPredictors` — Categorical predictors list
vector of positive integers | logical vector | character matrix | string array | cell array of character vectors | `"all"`

Categorical predictors list, specified as one of the values in this table. The descriptions assume that the predictor data has observations in rows and predictors in columns.

Value	Description
Vector of positive integers	Each entry in the vector is an index value indicating that the corresponding predictor is categorical. The index values are between 1 and `p`, where `p` is the number of predictors in the input data before encoding. If `countPredictorsAfterCategoricalEncoding` uses a subset of input variables as predictors, then the function indexes the predictors using only the subset. The `CategoricalPredictors` values do not count any response variable or other variable that the function does not use.
Logical vector	A `true` entry means that the corresponding predictor is categorical. The length of the vector is the number of predictors in the input data before encoding.
Character matrix	Each row of the matrix is the name of a predictor variable. The names must match the entries in `PredictorNames`. Pad the names with extra blanks so each row of the character matrix has the same length.
String array or cell array of character vectors	Each element in the array is the name of a predictor variable. The names must match the entries in `PredictorNames`.
`"all"`	All predictors are categorical.

By default, if the predictor data is in a table (Tbl), countPredictorsAfterCategoricalEncoding assumes that a variable is categorical if it is a logical vector, categorical vector, character array, string array, or cell array of character vectors. If the predictor data is a matrix (X), countPredictorsAfterCategoricalEncoding assumes that all predictors are continuous. To identify any other predictors as categorical predictors, specify them by using the CategoricalPredictors name-value argument.

Example: CategoricalPredictors="all"

`ObservationsIn` — Predictor data observation dimension
`"rows"` (default) | `"columns"`

Predictor data observation dimension, specified as one of these values:

"rows" — Rows correspond to observations and columns correspond to predictors.
"columns" — Columns correspond to observations and rows correspond to predictors.

This argument supports data specified as a numeric array only.

Data Types: char | string

`Weights` — Weights variable name
`""` (default) | name of variable in `Tbl`

Weights variable name, specified as the name of a variable in Tbl. The countPredictorsAfterCategoricalEncoding function ignores the weights variable when it counts predictors.

Data Types: char | string

Output Arguments

collapse all

`N` — Number of predictors in a table after encoding categorical variables
nonnegative integer

Number of predictors in a table after encoding categorical variables, returned as a nonnegative integer.

Data Types: double

Algorithms

collapse all

Dummy Variables

Different machine learning algorithms use different schemes to encode dummy variables. For example, the fitting functions fitckernel, fitclinear, fitcnet, fitcsvm, fitrgp, fitrkernel, fitrlinear, fitrnet, and fitrsvm use two different schemes to create dummy variables:

For an unordered categorical variable, these functions represent the categorical variable uses full dummy variables with one variable for each category. For more details, see Full Dummy Variables.
For an ordered categorical variable, these functions uses 1 and –1 values, and uses more 1s for higher categories, to indicate the ordering. For more details, see Dummy Variables for Ordered Categorical Variable.

The countPredictorsAfterCategoricalEncoding function uses the same schemes as for these functions. For more details, see Automatic Creation of Dummy Variables.

Extended Capabilities

expand all

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

The countPredictorsAfterCategoricalEncoding function fully supports GPU arrays. To run the function on a GPU, specify the input data as a gpuArray (Parallel Computing Toolbox). For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2026a

countPredictorsAfterCategoricalEncoding

Syntax

Description

Examples

Count Categorical Predictors After Encoding

Define Custom Neural Network Architecture for Categorical Predictors

Input Arguments

`Tbl` — Sample data
table

`ResponseVarNames` — Response variables names
names of variable in `Tbl`

`formula` — Explanatory model of response variable and subset of predictor variables
character vector | string scalar

`X` — Predictor data
numeric matrix

Name-Value Arguments

`PredictorNames` — Predictor variable names
string array of unique names | cell array of unique character vectors

`CategoricalPredictors` — Categorical predictors list
vector of positive integers | logical vector | character matrix | string array | cell array of character vectors | `"all"`

`ObservationsIn` — Predictor data observation dimension
`"rows"` (default) | `"columns"`

`Weights` — Weights variable name
`""` (default) | name of variable in `Tbl`

Output Arguments

`N` — Number of predictors in a table after encoding categorical variables
nonnegative integer

Algorithms

Dummy Variables

Extended Capabilities

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

See Also

Topics

countPredictorsAfterCategoricalEncoding

Syntax

Description

Examples

Count Categorical Predictors After Encoding

Define Custom Neural Network Architecture for Categorical Predictors

Input Arguments

Tbl — Sample data table

ResponseVarNames — Response variables names names of variable in Tbl

formula — Explanatory model of response variable and subset of predictor variables character vector | string scalar

X — Predictor data numeric matrix

Name-Value Arguments

PredictorNames — Predictor variable names string array of unique names | cell array of unique character vectors

CategoricalPredictors — Categorical predictors list vector of positive integers | logical vector | character matrix | string array | cell array of character vectors | "all"

ObservationsIn — Predictor data observation dimension "rows" (default) | "columns"

Weights — Weights variable name "" (default) | name of variable in Tbl

Output Arguments

N — Number of predictors in a table after encoding categorical variables nonnegative integer

Algorithms

Dummy Variables

Extended Capabilities

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

See Also

Topics

`Tbl` — Sample data
table

`ResponseVarNames` — Response variables names
names of variable in `Tbl`

`formula` — Explanatory model of response variable and subset of predictor variables
character vector | string scalar

`X` — Predictor data
numeric matrix

`PredictorNames` — Predictor variable names
string array of unique names | cell array of unique character vectors

`CategoricalPredictors` — Categorical predictors list
vector of positive integers | logical vector | character matrix | string array | cell array of character vectors | `"all"`

`ObservationsIn` — Predictor data observation dimension
`"rows"` (default) | `"columns"`

`Weights` — Weights variable name
`""` (default) | name of variable in `Tbl`

`N` — Number of predictors in a table after encoding categorical variables
nonnegative integer

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.