Data normalization using robust scaling

Hello all, I am trying to implement "Robust Scaling" but I am confused. Should I use "all" argument for "median" and "iqr" functions?
Thanks for the help.
DataSet = readtable('Datasets/Test.csv');
DataSet = table2array(DataSet); % Row:7195 x Colums:22
RScaling = (DataSet - median(DataSet))./iqr(DataSet)

 採用された回答

Voss
Voss 2024 年 6 月 4 日

1 投票

If you want to normalize all columns the same way (i.e., using the median and inter-quartile range of the entire data set), then use "all".
If you want to normalize each column separately (i.e., using each column's own median and inter-quartile range), then do not use "all". And in this case, it's best to use the dim argument set to 1, to explicitly say you want the median and iqr by column, in order to properly handle the situation that your data set has only one row.

4 件のコメント

MB
MB 2024 年 6 月 4 日
編集済み: MB 2024 年 6 月 4 日
Thank you for your answer. So, I can normalize each column separately or all columns together. I want to explore the effects of various normalization techniques on clustering. I've experimented with the methods defined in the "normalize" function without specifying the "dim" argument. If I understand correctly, this normalizes each column separately. "If A is a matrix, then normalize operates on each column of A separately."
RScaling = (DataSet - median(DataSet, 1))./iqr(DataSet, 1)
Voss
Voss 2024 年 6 月 4 日
編集済み: Voss 2024 年 6 月 4 日
You're welcome!
"If I understand correctly, this normalizes each column separately. "If A is a matrix, then normalize operates on each column of A separately.""
That's right. For a matrix that's not a vector, the default dim is 1, so you don't have to specify it (but it doesn't hurt to specify it). However, if you ever had the situation where your data set had one row, then you would need to specify dim as 1 if you want to normalize by column. Therefore, it's a good idea to always include the dim as 1. That's all I was suggesting.
Example: Matrix:
data = [1 2 3; 4 5 6] % non-vector matrix
data = 2x3
1 2 3 4 5 6
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
normalize(data) % normalize each column
ans = 2x3
-0.7071 -0.7071 -0.7071 0.7071 0.7071 0.7071
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
normalize(data,1) % same
ans = 2x3
-0.7071 -0.7071 -0.7071 0.7071 0.7071 0.7071
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
normalize(data,2) % normalize each row
ans = 2x3
-1 0 1 -1 0 1
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
Row vector (matrix with one row):
data = [1 2 3] % row vector
data = 1x3
1 2 3
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
normalize(data) % without the dim specified, this normalizes all together this time
ans = 1x3
-1 0 1
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
normalize(data,1) % normalize each column
ans = 1x3
NaN NaN NaN
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
normalize(data,2) % normalize each row (same as all together in this case)
ans = 1x3
-1 0 1
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
MB
MB 2024 年 6 月 4 日
Many thanks.
Voss
Voss 2024 年 6 月 4 日
You're welcome!

サインインしてコメントする。

その他の回答 (0 件)

製品

リリース

R2024a

タグ

質問済み:

MB
2024 年 6 月 4 日

コメント済み:

2024 年 6 月 4 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by