countPredictorsAfterCategoricalEncoding
Number of predictors in tabular data after encoding categorical variables
Since R2026a
Syntax
Description
returns the number of variables in N = countPredictorsAfterCategoricalEncoding(Tbl,ResponseVarNames)Tbl after encoding the categorical
variables, ignoring the specified response variables. You can use an array
ResponseVarNames to specify multiple response variables.
returns the number of predictors in the specified formula after encoding the categorical
variables.N = countPredictorsAfterCategoricalEncoding(Tbl,formula)
You can use the input argument formula as an explanatory model of
the response and a subset of the predictor variables in Tbl used to fit
a model.
returns the number of predictors in the specified numeric array. This is equivalent to the
number of columns of N = countPredictorsAfterCategoricalEncoding(X)X. To specify the orientation of the data in
X or to treat integer-valued data in X as
categorical, use the ObservationsIn and
CategoricalPredictors name-value arguments, respectively.
specifies options using one or more name-value arguments in addition to any of the input
argument combinations in previous syntaxes. For example, N = countPredictorsAfterCategoricalEncoding(___,Name=Value)PredictorNames=["var1"
"var2"] specifies that "var1" and "var2"
are the predictor variables of the input table.
Examples
Count the number of predictors in a table after encoding the categorical variables.
Load the carbig data and convert it to a table.
load carbig Tbl = table(Acceleration,Displacement,Horsepower, ... Model_Year,Origin,Weight,MPG);
View the first few rows of the table. The table has seven variables.
head(Tbl)
Acceleration Displacement Horsepower Model_Year Origin Weight MPG
____________ ____________ __________ __________ _______ ______ ___
12 307 130 70 USA 3504 18
11.5 350 165 70 USA 3693 15
11 318 150 70 USA 3436 18
12 304 150 70 USA 3433 16
10.5 302 140 70 USA 3449 17
10 429 198 70 USA 4341 15
9 454 220 70 USA 4354 14
8.5 440 215 70 USA 4312 14
The software creates dummy variables using two different schemes, depending on whether a categorical variable is unordered or ordered. The carbig data has one unordered categorical variable (Origin), which is encoded as full dummy variables (also known as one-hot encoded vectors). For more information, see Dummy Variables.
Count the number of predictors in a table after encoding the categorical variables.
N = countPredictorsAfterCategoricalEncoding(Tbl)
N = 13
The value of 13 corresponds to the 6 numeric-valued variables of the table plus the 7 categories of the Origin variable.
For neural networks with complex architecture (such as, neural networks with skip connections), you can specify the architecture for the fitrnet and fitcnet functions using the Network name-value argument with a layer array or dlnetwork (Deep Learning Toolbox) object.
For data with categorical predictors, the neural network architecture must support inputs where the categorical predictor are encoded as numeric vectors.
There are two approaches:
Specify a neural network architecture that does not have an input layer. In this case, the software automatically determines the network input size based on the training data and adds an input layer with the appropriate size. This is usually the easiest approach.
Specify a neural network architecture that has an input layer with a size that is consistent with the training data after encoding the categorical predictors. Use this option when you want to use functionality provided by the input layer.
Load the carbig data and convert it to a table.
load carbig Tbl = table(Acceleration,Displacement,Horsepower, ... Model_Year,Origin,Weight,MPG);
Specify Neural Network Without Input Layer
Create a multilayer perceptron (MLP) neural network with a skip connection. Do not include an input layer.
outputSize = 1;
net = dlnetwork;
layers = [
fullyConnectedLayer(12)
reluLayer(Name="relu1")
fullyConnectedLayer(12)
additionLayer(2,Name="add2")
reluLayer(Name="relu2")
fullyConnectedLayer(12)
additionLayer(2,Name="add3")
reluLayer
fullyConnectedLayer(outputSize)];
net = addLayers(net,layers);
net = connectLayers(net,"relu1","add2/in2");
net = connectLayers(net,"relu2","add3/in2");Train the neural network.
Mdl = fitrnet(Tbl,"MPG",Network=net,Standardize=true)Mdl =
RegressionNeuralNetwork
PredictorNames: {'Acceleration' 'Displacement' 'Horsepower' 'Model_Year' 'Origin' 'Weight'}
ResponseName: 'MPG'
CategoricalPredictors: 5
ResponseTransform: 'none'
NumObservations: 398
LayerSizes: []
Activations: ''
OutputLayerActivation: ''
Solver: 'LBFGS'
ConvergenceInfo: [1×1 struct]
TrainingHistory: [1000×7 table]
View network information using dlnetwork.
Properties, Methods
Convert the model to a dlnetwork object and view the first layer. The first layer of the neural network is an input layer of size 12. The 12 channels correspond to the 5 numeric-valued predictors of the table plus the 7 categories of the Origin variable.
net = dlnetwork(Mdl); net.Layers(1)
ans =
FeatureInputLayer with properties:
Name: 'input'
InputSize: 12
SplitComplexInputs: 0
Hyperparameters
Normalization: 'zscore'
NormalizationDimension: 'auto'
Mean: [15.5413 194.4120 104.4694 75.9796 0 0 0 0 0 0 0 2.9776e+03]
StandardDeviation: [2.7589 104.6440 38.4912 3.6837 1 1 1 1 1 1 1 849.4026]
Specify Neural Network With Input Layer
Alternatively, create the same neural network that also includes an input layer. To determine the size for the input layer, use the countPredictorsAfterCategoricalEncoding function. To ensure that the function counts the predictors only, specify the response variable.
inputSize = countPredictorsAfterCategoricalEncoding(Tbl,"MPG"); outputSize = 1; net = dlnetwork; layers = [ featureInputLayer(inputSize) fullyConnectedLayer(12) reluLayer(Name="relu1") fullyConnectedLayer(12) additionLayer(2,Name="add2") reluLayer(Name="relu2") fullyConnectedLayer(12) additionLayer(2,Name="add3") reluLayer fullyConnectedLayer(outputSize)]; net = addLayers(net,layers); net = connectLayers(net,"relu1","add2/in2"); net = connectLayers(net,"relu2","add3/in2");
Train the neural network.
Mdl = fitrnet(Tbl,"MPG",Network=net,Standardize=true)Mdl =
RegressionNeuralNetwork
PredictorNames: {'Acceleration' 'Displacement' 'Horsepower' 'Model_Year' 'Origin' 'Weight'}
ResponseName: 'MPG'
CategoricalPredictors: 5
ResponseTransform: 'none'
NumObservations: 398
LayerSizes: []
Activations: ''
OutputLayerActivation: ''
Solver: 'LBFGS'
ConvergenceInfo: [1×1 struct]
TrainingHistory: [1000×7 table]
View network information using dlnetwork.
Properties, Methods
Copyright 2025 The MathWorks, Inc.
Input Arguments
Sample data, specified as a table. Each row of Tbl corresponds
to one observation, and each column corresponds to one predictor variable. Multicolumn
variables and cell arrays other than cell arrays of character vectors are not
allowed.
Optionally,
Tblcan contain columns for the response variables and a column for the observation weights. Each response variable and the weight values must be numeric vectors.You must specify the response variables in
Tblby usingResponseVarNameorformulaand specify the observation weights inTblby usingWeights.When you specify the response variables by using
ResponseVarName,countPredictorsAfterCategoricalEncodinguses the remaining variables as predictors. To use a subset of the remaining variables inTblas predictors, specify predictor variables by usingPredictorNames.When you define a model specification by using
formula,countPredictorsAfterCategoricalEncodinguses a subset of the variables inTblas predictor variables and response variables, as specified informula.
Response variables names, specified as the names of variables in
Tbl as a character vector, string array, or cell array of
character vectors. The countPredictorsAfterCategoricalEncoding function ignores the response
variables when it counts predictors. For example, if the response variable
Y is stored as Tbl.Y, then specify it as
"Y". Otherwise, the software treats all columns of
Tbl, including Y, as predictors.
Data Types: char | string
Explanatory model of the response variable and a subset of the predictor variables,
specified as a character vector or string scalar in the form
"Y1,Y2~x1+x2+x3". In this form, Y1 and
Y2 represent the response variables, and x1,
x2, and x3 represent the predictor
variables.
To specify a subset of variables in Tbl as predictors for
training the model, you can use a formula. If you specify a formula, then the
countPredictorsAfterCategoricalEncoding function returns the number of predictors in the
specified formula after encoding the categorical variables in
Tbl.
The variable names in the formula must be both variable names in Tbl
(Tbl.Properties.VariableNames) and valid MATLAB® identifiers. You can verify the variable names in Tbl by
using the isvarname function. If the variable names
are not valid, then you can convert them by using the matlab.lang.makeValidName function.
Data Types: char | string
Predictor data, specified as a numeric matrix. The software treats the rows and
columns of X according to the ObservationsIn
name-value argument.
Data Types: single | double
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN, where Name is
the argument name and Value is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Example: countPredictorsAfterCategoricalEncoding(Tbl,PredictorNames=["var1"
"var2"]) specifies that "var1" and "var2"
are the predictor variables of the input table
Predictor variable names, specified as a string array of unique names or cell
array of unique character vectors. The countPredictorsAfterCategoricalEncoding function
ignores the remaining variables when it counts predictors.
PredictorNamesmust be a subset ofTbl.Properties.VariableNamesand must not include the name of a response variable.By default,
PredictorNamescontains the names of all the predictor variables.A good practice is to specify the predictors using either
PredictorNamesorformula, but not both.
This argument supports data specified as a table only.
Example: PredictorNames=["SepalLength","SepalWidth","PetalLength","PetalWidth"]
Data Types: string | cell
Categorical predictors list, specified as one of the values in this table. The descriptions assume that the predictor data has observations in rows and predictors in columns.
| Value | Description |
|---|---|
| Vector of positive integers |
Each entry in the vector is an index value indicating that the
corresponding predictor is categorical. The index values are between 1
and If |
| Logical vector |
A |
| Character matrix | Each row of the matrix is the name of a predictor variable. The names
must match the entries in PredictorNames. Pad the names
with extra blanks so each row of the character matrix has the same
length. |
| String array or cell array of character vectors | Each element in the array is the name of a predictor variable. The
names must match the entries in PredictorNames. |
"all" | All predictors are categorical. |
By default, if the
predictor data is in a table (Tbl), countPredictorsAfterCategoricalEncoding
assumes that a variable is categorical if it is a logical vector, categorical vector, character
array, string array, or cell array of character vectors. If the predictor data is a matrix
(X), countPredictorsAfterCategoricalEncoding assumes that all predictors are
continuous. To identify any other predictors as categorical predictors, specify them by using
the CategoricalPredictors name-value argument.
Example: CategoricalPredictors="all"
Data Types: single | double | logical | char | string | cell
Predictor data observation dimension, specified as one of these values:
"rows"— Rows correspond to observations and columns correspond to predictors."columns"— Columns correspond to observations and rows correspond to predictors.
This argument supports data specified as a numeric array only.
Data Types: char | string
Weights variable name, specified as the name of a variable in
Tbl. The countPredictorsAfterCategoricalEncoding function ignores the
weights variable when it counts predictors.
Data Types: char | string
Output Arguments
Number of predictors in a table after encoding categorical variables, returned as a nonnegative integer.
Data Types: double
Algorithms
By default, if the
predictor data is in a table (Tbl), countPredictorsAfterCategoricalEncoding
assumes that a variable is categorical if it is a logical vector, categorical vector, character
array, string array, or cell array of character vectors. If the predictor data is a matrix
(X), countPredictorsAfterCategoricalEncoding assumes that all predictors are
continuous. To identify any other predictors as categorical predictors, specify them by using
the CategoricalPredictors name-value argument.
Different machine learning algorithms use different schemes to encode dummy variables.
For example, the fitting functions fitckernel,
fitclinear, fitcnet,
fitcsvm, fitrgp, fitrkernel,
fitrlinear, fitrnet, and
fitrsvm use two different schemes to create dummy variables:
For an unordered categorical variable, these functions represent the categorical variable uses full dummy variables with one variable for each category. For more details, see Full Dummy Variables.
For an ordered categorical variable, these functions uses 1 and –1 values, and uses more 1s for higher categories, to indicate the ordering. For more details, see Dummy Variables for Ordered Categorical Variable.
The countPredictorsAfterCategoricalEncoding function uses the same schemes as for these
functions. For more details, see Automatic Creation of Dummy Variables.
Extended Capabilities
The countPredictorsAfterCategoricalEncoding function
fully supports GPU arrays. To run the function on a GPU, specify the input data as a gpuArray (Parallel Computing Toolbox). For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
Version History
Introduced in R2026a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Web サイトの選択
Web サイトを選択すると、翻訳されたコンテンツにアクセスし、地域のイベントやサービスを確認できます。現在の位置情報に基づき、次のサイトの選択を推奨します:
また、以下のリストから Web サイトを選択することもできます。
最適なサイトパフォーマンスの取得方法
中国のサイト (中国語または英語) を選択することで、最適なサイトパフォーマンスが得られます。その他の国の MathWorks のサイトは、お客様の地域からのアクセスが最適化されていません。
南北アメリカ
- América Latina (Español)
- Canada (English)
- United States (English)
ヨーロッパ
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)