stats::frequency

Tally numerical data into classes and count frequencies

MuPAD® notebooks will be removed in a future release. Use MATLAB® live scripts instead.

MATLAB live scripts support most MuPAD functionality, though there are some differences. For more information, see Convert MuPAD Notebooks to MATLAB Live Scripts.

Syntax

stats::frequency(data, <ClassesClosed = Left | Right>)
stats::frequency(data, n, <ClassesClosed = Left | Right>)
stats::frequency(data, [n], <ClassesClosed = Left | Right>)
stats::frequency(data, [a1 .. b1, a2 .. b2, …], <ClassesClosed = Left | Right>)
stats::frequency(data, [[a1, b1], [a2, b2], …], <ClassesClosed = Left | Right>)
stats::frequency(data, Classes = n, <ClassesClosed = Left | Right>)
stats::frequency(data, Classes = [n], <ClassesClosed = Left | Right>)
stats::frequency(data, Classes = [a1 .. b1, a2 .. b2, …], <ClassesClosed = Left | Right>)
stats::frequency(data, Classes = [[a1, b1], [a2, b2], …], <ClassesClosed = Left | Right>)
stats::frequency(data, Cells = n, <CellsClosed = Left | Right>)
stats::frequency(data, Cells = [n], <CellsClosed = Left | Right>)
stats::frequency(data, Cells = [a1 .. b1, a2 .. b2, …], <CellsClosed = Left | Right>)
stats::frequency(data, Cells = [[a1, b1], [a2, b2], …], <CellsClosed = Left | Right>)

Description

stats::frequency(data, [[a1, b1], [a2, b2], …]) tallies numerical data into different classes given by semiopen intervals . It counts how many data elements fall into each class.

All data elements must be real numerical values. Exact numerical values such as π, etc. are allowed if they can be converted to real floating-point numbers via float. An error is raised if symbolic data are found that cannot be converted to real floating point numbers.

Note

Note that stats::frequency is fast if all data elements are integers, rational numbers, or floating point numbers. Exact numerical values such as π, etc. are processed, but have a noticeable impact on the efficiency of stats::frequency.

Data given by an array, a table etc. are internally treated like a list containing all operands of the data container. In particular, all rows and columns of arrays, matrices and stats::sample objects are taken into account. A stats::sample object must not contain any text entries.

For the specification of the classes, stats::frequency accepts either a single positive integer (or, equivalently, a list of one positive integer), or a list of classes given as ranges or lists of two elements.

A single integer n in the specification Classes= n or Classes= [n] is interpreted as “subdivide the range from min(data) to max(data) into n classes of equal size”. The left border of the first class is set to - ∞.

The classes may be specified directly as in Classes = [[a1, b1], [a2, b2], …] or Classes=[a_1..b_1, a_2..b_2, dots].

Note

With the default setting ClassesClosed = Right, the i-th class is the semi-open interval , i.e., a datum x is tallied into the i-th class if ai < xbi is satisfied.

With ClassesClosed = Left, the i-th class is the semi-open interval , i.e., a datum x is tallied into the i-th class if aix < bi is satisfied.

The class boundaries must be numerical real values satisfying a1b1a2b2a3 ≤ …. In most applications, b1 = a2, b2 = a3 etc. is appropriate.

Exact values such as π, etc. are accepted and processed.

The classes need not cover the entire data range. Data are ignored if they do not fall into one of the specified classes.

If giving classes directly, the leftmost border may be - ∞ and the rightmost border may be infinity.

Examples

Example 1

We split the following data into 10 classes of equal size (default). The first class covers the values from - ∞ to 2:

data := [0, 1, 2, PI, 4, 5, 6, 7, 7.1, 20]:
T := stats::frequency(data)

We split the information on the classes into 3 separate tables:

TheClasses = map(T, op, 1)

TheFrequencies = map(T, op, 2)

TheValues = map(T, op, 3)

The classes are specified explicitly:

classes:= [[0, 5], [5, 10], [10, 20]]:
stats::frequency(data, classes)

Note that the value 0 is not tallied into any of the classes (the first class represents the semi-open interval )! In order to include all values, we use as class boundaries:

classes:= [[-infinity, 5], [5, 10], [10, infinity]]:
stats::frequency(data, classes)

delete data, T, classes:

Example 2

We demonstrate the difference between the options ClassesClosed = Left and ClassesClosed = Right. In the first case, the value 1 is tallied into the second class:

stats::frequency([0, 1, 2], Classes = [-infinity..1, 1..infinity],
                          ClassesClosed = Left)

With ClassesClosed = Right, the value 1 is tallied into the first class:

stats::frequency([0, 1, 2], Classes = [-infinity..1, 1..infinity],
                          ClassesClosed = Right)

The default setting is ClassesClosed = Right:

stats::frequency([0, 1, 2], Classes = [-infinity..1, 1..infinity])

Example 3

We create a sample of 1000 normally distributed data points:

X := stats::normalRandom(0, 10):
data := [X() $ i = 1..1000]:

These data are tallied into 5 different classes of equal width:

T := stats::frequency(data, 5):

We determine the number of data values in each class:

for i from 1 to 5 do
    print(Class = T[i][1], NumberOfElements = T[i][2]);
end_for:

We determine the outliers of the data sample by collecting the values smaller than - 9 and the values larger than 10:

classes := [[-infinity, -9], [10, infinity]]:
T := stats::frequency(data, classes);

delete X, data, T, i, classes:

Parameters

data

The statistical data: a list, a set, a table, an array, a matrix, or an object of type stats::sample containing numerical real data values

n

The number of classes (cells): a positive integer. If not specified, n = 10 is used.

a1, b1, a2, …

The class boundaries: real numerical values satisfying

.

Also are allowed as class boundaries.

Return Values

table is returned with integer indices from 1 through the number of classes. The i-th entry of the table T = stats::frequency(data, ...) is the list T[i] = [[ai, bi], ni, [v1, v2, …]], where [ai, bi] is the i-th class, ni is the number of data falling in this class, and [v1, v2, …] is the sorted list of all data in this class (i.e., ai < vjbj for all j from 1 through ni).

See Also

MuPAD Functions

MuPAD Graphical Primitives