Create (equal density) spaced vector in MATLAB

Question

1 投票

I would like to create a spaced x-axis vector. linspace() results with the equally spaced vector. However, applying linspace to my data (image shown below) ends up losing a major chunk of information in the high density area. So I would like to produce an unequal spaced vector adjusted based on density. (Do suggest me if you feel there is any other method would work best for my dataset).

Thanks,

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Follow Question

Answer 1

William Rose 2024 年 9 月 27 日

編集済み: William Rose 2024 年 9 月 27 日

MATLAB Online で開く

0 投票

[Sorry if this is a duplicate answer. My first attempt to post produced an error message.]

@Abinesh G,

Here are command that work for me. I can;t run them in this window, since the data file, prec_ev_data.mat, is too big to attach, even if I zip it.

data=load('prec_ev_data.mat');
x=data.prec_ev_data(:,1);
y=data.prec_ev_data(:,2);
xs=sort(x);
idx=1:10000:length(x);
xp=xs(idx);
xp=[xp;max(x)];   % add the max value

After the commands above, xp (x for plotting) is a column vector of length 1448. The first and last elements are the minimum and maximum x values in the data. The other values are spaced so there will be 10000 data points between each value in xp.

The triouble with the result above is that you have more than 10000 data pairs with the same x values. Therefore the first 4 values of xp() are identical, the next 7 values of xp() are identical, and so on. Eliminate the duplicate values in xp:

xpu=unique(xp);
disp([length(xp),length(xpu)])

1448 1033

Now xpu has 1033 unique x-values for plotting. They are unevenly spaced and increasing. There are 10000, or sometimes more, data pairs with x-values between each value in xpu.

10 件のコメント
8 件の古いコメントを表示 8 件の古いコメントを非表示

William Rose 2024 年 9 月 27 日

編集済み: William Rose 2024 年 9 月 27 日

MATLAB Online で開く

@Abinesh G,

Here is an example of how you could use the vector xpu in your analysis: Find the mean y-value of the samples in each bin.

This code doesn't run in this window, since the data file is too big to attach. It assumes the data vectors x and y are available, from commands listed above, and xpu has been computed using the commands above. size(xpu)=1033x1.

ymn=zeros(size(xpu));  % allocate vector for mean values of y
for i=1:length(xpu)-1, ymn(i)=mean(y(x>=xpu(i) & x<xpu(i+1))); end
ymn(end)=mean(y(x==xpu(end))); % last value in ymn

Plot the results. Include a plot of all the mean values and a separate plot of the low-x range, where most of the data is concentrated.

figure; subplot(211); plot(xpu,ymn,'-r.');
grid on; xlabel('X'); ylabel('mean(Y)')
subplot(212); plot(xpu,ymn,'-r.');
grid on; xlabel('X'); ylabel('mean(Y)'); xlim([0 5e5])

The commands above produce the figure below.

Most of the bins have 10000 elements in them, but some bins have more, and the last bin has only one sample in it. The low end x-value of each bin is used as the x-value for plotting.

William Rose 2024 年 9 月 27 日

MATLAB Online で開く

prec_ev_data.mat

@Abinesh G,

"extract the data points existing outside the 90 percentile line"

The commands below extract the x,y pairs with the bottom 5%, middle 90%, and top 5% of x values. I realize this is not exactly what you want, but it is related.

data=load('prec_ev_data.mat');

x=data.prec_ev_data(:,1);

y=data.prec_ev_data(:,2);

N=length(x);

[xs,xOrder]=sort(x);

ys=y(xOrder); % y, sorted by sort order of x

xsLo=xs(1:round(N/20)); % lowest 10% of x values

ysLo=ys(1:round(N/20)); % y values corresponding to xsLo

xsMid=xs(round(N/20)+1:round(0.95*N)); % middle 90% of x values

ysMid=ys(round(N/20)+1:round(0.95*N)); % y values corresponding to xsMid

xsHi=xs(round(0.95*N)+1:end); % top 10% of x values

ysHi=ys(round(0.95*N)+1:end); % y values corresponding to xsHi

figure;

subplot(311), scatter(xsLo,ysLo,24,'r')

xlabel('X_{Low}'); ylabel('Y'); grid on

subplot(312), scatter(xsMid,ysMid,24,'g')

xlabel('X_{Mid}'); ylabel('Y'); grid on

subplot(313), scatter(xsHi,ysHi,24,'b')

xlabel('X_{High}'); ylabel('Y'); grid on

William Rose 2024 年 9 月 28 日

MATLAB Online で開く

prec_ev_data.mat

@Abinesh G, you're welcome. the suggestions of @Star Strider and @Image Analyst are always valuable.

Are you trying to make a decision tree (or a decision tree forest) to predict Y from X, where X and Y are vectors?

data=load('prec_ev_data.mat');
x1=data.prec_ev_data(:,1);
y1=data.prec_ev_data(:,2);
N1=length(x1);
% There are 900K x1,y1 pairs. Use a random 1% of them for this example.
idx1=randperm(N1); % random rearrangement of indices
x=x1(idx1(1:round(N1/100)));  % random 1% of the x1 values
y=y1(idx1(1:round(N1/100)));  % correspondng y1 values
Mdl1= TreeBagger(50,x,y,Method="regression")
Mdl1 = 
  TreeBagger
Ensemble with 50 bagged decision trees:
                    Training X:             [9000x1]
                    Training Y:             [9000x1]
                        Method:           regression
                 NumPredictors:                    1
         NumPredictorsToSample:                    1
                   MinLeafSize:                    5
                 InBagFraction:                    1
         SampleWithReplacement:                    1
          ComputeOOBPrediction:                    0
 ComputeOOBPredictorImportance:                    0
                     Proximity:                   []

For predX, use 21 x-values, chosen so there are approximately equal numbers of samples between successive vaues of predX.

N=length(x);
xs=sort(x);
idx=[1,round(N*(.05:.05:1))];
predX=xs(idx);
%predX=linspace(min(x),max(x),21)'
YQuantiles = quantilePredict(Mdl1,predX,'Quantile',[0.05,0.5,0.95]);
Error using sparse
Third input must be double or logical.

Error in CompactTreeBagger>localGetTrainingNodes (line 2224)
    S = sparse(1:sum(ibtf),tnode,tw(ibtf));

Error in CompactTreeBagger/quantileNode (line 1880)
            [trainNode,TW] = localGetTrainingNodes(trees,Xtrain,wtrain,ibIdx);

Error in CompactTreeBagger/quantilePredictCompact (line 1924)
                quantileNode(bagger,X,Xtrain,Wtrain,varargin{:});

Error in TreeBagger/quantilePredict (line 1537)
            [varargout{1:nargout}] = quantilePredictCompact(bagger.Compact,X,...
figure; hold on
plot(X,Y,"r.");
plot(predX,YQuantiles)
xlabel('X'); ylabel('Y'); 
legend('Data','5%','Median','95%')

I am not sure why this error happens. Maybe you can figure it out. I have tried various changes (different number of trees, different length of predX, different values for Quantile vector, changing "...,'Quantile',[...])" to "...,Quantile=[...])", changing predX to a linspace vector, etc. Nothing helped. I also viewed vectors idx and predX to confirm that they look reasonable.

William Rose 2024 年 9 月 29 日

@Abinesh G, @Image Analyst suggested above that you find the cumulative distribution function, then invert it. That is what my code above does, using your data. By sorting the x-data, then taking equally spaced values (equal in terms of indices along the sorted x-vector), then finding the corresponding x-values at those indices, you are, in effect, finding equally spaced points on the vertical axis of the CDF, then finding the corresponding horizontal axis values (i.e. x-axis values).

Abinesh G 2024 年 9 月 30 日

Dear @Image Analyst

Thanks a lot for responding. I have gone through your demo. I understood the concept on inverse transfrom sampling from cumulation distribution function.

@William Rose: Agree. Your response on deriving interval from the sorted value roughly does this invrse transform.

Although, the concept is straightforward, it didn't occur to me. Thanks to both of you for an elegant solution.

サインインしてコメントする。

Create (equal density) spaced vector in MATLAB

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

採用された回答

10 件のコメント
8 件の古いコメントを表示 8 件の古いコメントを非表示

その他の回答 (0 件)

カテゴリ

タグ

Community Treasure Hunt

Create (equal density) spaced vector in MATLAB

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

採用された回答

10 件のコメント 8 件の古いコメントを表示 8 件の古いコメントを非表示

その他の回答 (0 件)

カテゴリ

タグ

参考

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

10 件のコメント
8 件の古いコメントを表示 8 件の古いコメントを非表示