Advantages of Using Tables
Conveniently Store Mixed-Type Data in Single Container
You can use the table
data type to collect mixed-type data and metadata properties, such as variable name, row names, descriptions, and variable units, in a single container. Tables are suitable for column-oriented or tabular data that is often stored as columns in a text file or in a spreadsheet. For example, you can use a table to store experimental data, with rows representing different observations and columns representing different measured variables.
Tables consist of rows and column-oriented variables. Each variable in a table can have a different data type and a different size, but each variable must have the same number of rows.
For example, load sample patients data.
load patients
Then, combine the workspace variables, Systolic
and Diastolic
into a single BloodPressure
variable and convert the workspace variable, SelfAssessedHealthStatus
, from a cell array of character vectors to a categorical
array.
BloodPressure = [Systolic Diastolic]; SelfAssessedHealthStatus = categorical(SelfAssessedHealthStatus); whos("Age","Smoker","BloodPressure","SelfAssessedHealthStatus")
Name Size Bytes Class Attributes Age 100x1 800 double BloodPressure 100x2 1600 double SelfAssessedHealthStatus 100x1 560 categorical Smoker 100x1 100 logical
The variables Age
, BloodPressure
, SelfAssessedHealthStatus
, and Smoker
have varying data types and are candidates to store in a table since they all have the same number of rows, 100.
Now, create a table from the variables and display it.
T = table(Age,Smoker,BloodPressure,SelfAssessedHealthStatus)
T=100×4 table
Age Smoker BloodPressure SelfAssessedHealthStatus
___ ______ _____________ ________________________
38 true 124 93 Excellent
43 false 109 77 Fair
38 false 125 83 Good
40 false 117 75 Fair
49 false 122 80 Good
46 false 121 70 Good
33 true 130 88 Good
40 false 115 82 Good
28 false 115 78 Excellent
31 false 118 86 Excellent
45 false 114 77 Excellent
42 false 115 68 Poor
25 false 127 74 Poor
39 true 130 95 Excellent
36 false 114 79 Good
48 true 130 92 Good
⋮
The table displays in a tabular format with the variable names at the top.
Each variable in a table is a single data type. If you add a new row to the table, MATLAB® forces consistency of the data type between the new data and the corresponding table variables. For example, if you try to add information for a new patient where the first column contains the patient's health status instead of age, as in the expression T(end+1,:) = {"Poor",true,[130 84],37}
, then you receive the error:
Right hand side of an assignment to a categorical array must be a categorical or text representing a category name.
The error occurs because MATLAB® cannot assign numeric data, 37
, to the categorical array, SelfAssessedHealthStatus
.
For comparison of tables with structures, consider the structure array, StructArray
, that is equivalent to the table, T
.
StructArray = table2struct(T)
StructArray=100×1 struct array with fields:
Age
Smoker
BloodPressure
SelfAssessedHealthStatus
Structure arrays organize records using named fields. Each field's value can have a different data type or size. Now, display the named fields for the first element of StructArray
.
StructArray(1)
ans = struct with fields:
Age: 38
Smoker: 1
BloodPressure: [124 93]
SelfAssessedHealthStatus: Excellent
Fields in a structure array are analogous to variables in a table. However, unlike with tables, you cannot enforce homogeneity within a field. For example, you can have some values of S.SelfAssessedHealthStatus
that are categorical array elements, such as Poor
or Good
, others that are strings, such as "Poor"
and "Good"
, and others that are integers, such as 0
or 1
.
Now consider the same data stored in a scalar structure, with four fields each containing one variable from the table.
ScalarStruct = struct(... "Age",Age,... "Smoker",Smoker,... "BloodPressure",BloodPressure,... "SelfAssessedHealthStatus",SelfAssessedHealthStatus)
ScalarStruct = struct with fields:
Age: [100x1 double]
Smoker: [100x1 logical]
BloodPressure: [100x2 double]
SelfAssessedHealthStatus: [100x1 categorical]
Unlike with tables, you cannot enforce that the data is rectangular. For example, the field ScalarStruct.Age
can be a different length than the other fields.
A table allows you to maintain the rectangular structure (like a structure array) and enforce homogeneity of variables (like fields in a scalar structure). Although cell arrays do not have named fields, they have many of the same disadvantages as structure arrays and scalar structures. If you have rectangular data that is homogeneous in each variable, consider using a table. Then you can use numeric or named indexing, and you can use table properties to store metadata.
Access Data Using Numeric or Named Indexing
You can index into a table using parentheses, curly braces, or dot indexing. Parentheses allow you to select a subset of the data in a table and preserve the table container. Curly braces and dot indexing allow you to extract data from a table. Within each table indexing method, you can specify the rows or variables to access by name or by numeric index.
Consider the sample table from above. Each row in the table, T
, represents a different patient. The workspace variable, LastName
, contains unique identifiers for the 100 rows. Add row names to the table by setting the RowNames
property to LastName
and display the first five rows of the updated table.
T.Properties.RowNames = LastName; T(1:5,:)
ans=5×4 table
Age Smoker BloodPressure SelfAssessedHealthStatus
___ ______ _____________ ________________________
Smith 38 true 124 93 Excellent
Johnson 43 false 109 77 Fair
Williams 38 false 125 83 Good
Jones 40 false 117 75 Fair
Brown 49 false 122 80 Good
In addition to labeling the data, you can use row and variable names to access data in the table. For example, use named indexing to display the age and blood pressure of the patients Williams
and Brown
.
T(["Williams","Brown"],["Age","BloodPressure"])
ans=2×2 table
Age BloodPressure
___ _____________
Williams 38 125 83
Brown 49 122 80
Now, use numeric indexing to return an equivalent subtable. Return the third and fifth rows from the first and third variables.
T([3 5],[1 3])
ans=2×2 table
Age BloodPressure
___ _____________
Williams 38 125 83
Brown 49 122 80
With cell arrays or structures, you do not have the same flexibility to use named or numeric indexing.
With a cell array, you must use
strcmp
to find desired named data, and then you can index into the array.With a scalar structure or structure array, it is not possible to refer to a field by number. Furthermore, with a scalar structure, you cannot easily select a subset of variables or a subset of observations. With a structure array, you can select a subset of observations, but you cannot select a subset of variables.
With a table, you can access data by named index or by numeric index. Furthermore, you can easily select a subset of variables and a subset of rows.
For more information on table indexing, see Access Data in Tables.
Use Table Properties to Store Metadata
In addition to storing data, tables have properties to store metadata, such as variable names, row names, descriptions, and variable units. You can access a property using T
.Properties.
PropName
, where T
is the name of the table and PropName
is the name of a table property.
For example, add a table description, variable descriptions, and variable units for Age
.
T.Properties.Description = "Simulated Patient Data"; T.Properties.VariableDescriptions = ... ["" ... "true or false" ... "Systolic/Diastolic" ... "Status Reported by Patient"]; T.Properties.VariableUnits("Age") = "Yrs";
Individual empty strings within VariableDescriptions
indicate that the corresponding variable does not have a description. For more information, see the Properties section of table
.
To print a table summary, use the summary
function.
summary(T)
Description: Simulated Patient Data Variables: Age: 100x1 double Properties: Units: Yrs Values: Min 25 Median 39 Max 50 Smoker: 100x1 logical Properties: Description: true or false Values: True 34 False 66 BloodPressure: 100x2 double Properties: Description: Systolic/Diastolic Values: Column 1 Column 2 ________ ________ Min 109 68 Median 122 81.5 Max 138 99 SelfAssessedHealthStatus: 100x1 categorical Properties: Description: Status Reported by Patient Values: Excellent 34 Fair 15 Good 40 Poor 11
Structures and cell arrays do not have properties for storing metadata.