MATLAB Answers

## Find columns of a Matrix (n x m) that fit best to a Vector (n x 1)

Alexander R2018b

### Alexander R2018b (view profile)

さんによって質問されました 2019 年 3 月 28 日

### John D'Errico (view profile)

さんによって 編集されました 2019 年 3 月 28 日
John D'Errico

### John D'Errico (view profile)

さんの 回答が採用されました
Hello,
I have a Matrix that consists of 24 x 365 values (the hourly elctricity consumption of one year). Now I want to find a set of similar data of m days, that fits best to my forecast vector (24 x 1). So I want to generate a Matrix (24 x m) for m days. For example if I want to find the 6 best fitting days of the year, the new Matrix should have 24 x 6 values.
How could that be implemented? Are there any predefined functions?
greetings
Alex

#### 0 件のコメント

サインイン to comment.

## 3 件の回答 2019 年 3 月 28 日

### John D'Errico (view profile)

2019 年 3 月 28 日
採用された回答

You need first to decide what it means to "fit best". For example, I'll just make up some data here.
data = rand(3,20);
>> data'
ans =
0.075854 0.05395 0.5308
0.77917 0.93401 0.12991
0.56882 0.46939 0.011902
0.33712 0.16218 0.79428
0.31122 0.52853 0.16565
0.60198 0.26297 0.65408
0.68921 0.74815 0.45054
0.083821 0.22898 0.91334
0.15238 0.82582 0.53834
0.99613 0.078176 0.44268
0.10665 0.9619 0.0046342
0.77491 0.8173 0.86869
0.084436 0.39978 0.25987
0.80007 0.43141 0.91065
0.18185 0.2638 0.14554
0.13607 0.86929 0.5797
0.54986 0.14495 0.85303
0.62206 0.35095 0.51325
0.40181 0.075967 0.23992
0.12332 0.18391 0.23995
As you can see, I transposed data to display it, so it will be easier to read.
Now let us pretend that I have 20 daya worth of data I have sampled 3 times per day and I want to find the subset of 3 days among those 20 days that are the best fit to some prototypical day in the variable target:
target = [0.4; 0.5; 0.6]
target =
0.4
0.5
0.6
I'll use some tools below that come from more recent MATLAB releases. So if the code I wrote does not work for you, then tell me what release you have, and I'll explain how to fix it to work in older releases. (Best is if you always say what release you are using.)
What we need to decide now is how to measure how different any specific daily data is from your target. The simplest might be to compute the sum of the absolute value of the differences. Then find the two smallest such sums, and report which they were.
[err,ind] = mink(sum(abs(data - target),1),3)
err =
0.45785 0.49309 0.55167
ind =
18 6 5
So, if you look at the one liner computation above, it takes the difference, then tha absolute value, adds them all up for each column of data, and finally, looks to see which were the 3 smallest such sums. Days 18, then 6, then 5 were the closest by that measure.
[target, data(:,ind)]
ans =
0.4 0.62206 0.60198 0.31122
0.5 0.35095 0.26297 0.52853
0.6 0.51325 0.65408 0.16565
To be honest, the first column shown here does not seem like that great of a match to the others. But by the above measure, those 3 days were the best "fit".
Alternatively, you might care only about the square root of the sum of squares of those differences. This tends to emphasize the larger differences as important, and it is a rather classic way to look at such a fit.
[err,ind] = mink(sqrt(sum((data - target).^2,1)),3)
err =
0.28116 0.31608 0.39474
ind =
18 6 4
As you can see, the 3rd best such day is now a different choice.
[target, data(:,ind)]
ans =
0.4 0.62206 0.60198 0.33712
0.5 0.35095 0.26297 0.16218
0.6 0.51325 0.65408 0.79428
Finally, what we might care about could just be the largest difference. That too is easy to locate.
[err,ind] = mink(max(abs(data - target),[],1),3)
err =
0.22206 0.23703 0.28921
ind =
18 6 7
>> [target, data(:,ind)]
ans =
0.4 0.62206 0.60198 0.68921
0.5 0.35095 0.26297 0.74815
0.6 0.51325 0.65408 0.45054
Again, the first two days found were the very best days, but the 3rd best day again changed.
So, if your data is really 24x365, and your forecast is 24x1, and you want to find the best 6 days, then the solution just depends on which metric you need to use.
The 1-norm (sum of absolute values)
nbest = 6;
[err,ind] = mink(sum(abs(data - target),1),nbest);
The 2-norm (sum of squares)
nbest = 6;
[err,ind] = mink(sqrt(sum((data - target).^2,1)),nbest);
The infinity-norm (maximum of absolute values)
nbest = 6;
[err,ind] = mink(max(abs(data - target),[],1),nbest);
Your choice.

#### 0 件のコメント

サインイン to comment.

### Andrei Bobrov (view profile)

2019 年 3 月 28 日

% Let A - your array (24 x 365)
% B - your forecast vector (24 x 1)
m = 6;
ii = sqrt(sum((A - B).^2));
[~,ij] = mink(ii,m);
out = A(:,ij);

#### 0 件のコメント

サインイン to comment.

2019 年 3 月 28 日

### KSSV (view profile)

2019 年 3 月 28 日

Read about ismember.

#### 0 件のコメント

サインイン to comment.