Create (equal density) spaced vector in MATLAB
現在この質問をフォロー中です
- フォローしているコンテンツ フィードに更新が表示されます。
- コミュニケーション基本設定に応じて電子メールを受け取ることができます。
エラーが発生しました
ページに変更が加えられたため、アクションを完了できません。ページを再度読み込み、更新された状態を確認してください。
古いコメントを表示
I would like to create a spaced x-axis vector. linspace() results with the equally spaced vector. However, applying linspace to my data (image shown below) ends up losing a major chunk of information in the high density area. So I would like to produce an unequal spaced vector adjusted based on density. (Do suggest me if you feel there is any other method would work best for my dataset).
Thanks,

採用された回答
William Rose
2024 年 9 月 27 日
編集済み: William Rose
2024 年 9 月 27 日
[Sorry if this is a duplicate answer. My first attempt to post produced an error message.]
Here are command that work for me. I can;t run them in this window, since the data file, prec_ev_data.mat, is too big to attach, even if I zip it.
data=load('prec_ev_data.mat');
x=data.prec_ev_data(:,1);
y=data.prec_ev_data(:,2);
xs=sort(x);
idx=1:10000:length(x);
xp=xs(idx);
xp=[xp;max(x)]; % add the max value
After the commands above, xp (x for plotting) is a column vector of length 1448. The first and last elements are the minimum and maximum x values in the data. The other values are spaced so there will be 10000 data points between each value in xp.
The triouble with the result above is that you have more than 10000 data pairs with the same x values. Therefore the first 4 values of xp() are identical, the next 7 values of xp() are identical, and so on. Eliminate the duplicate values in xp:
xpu=unique(xp);
disp([length(xp),length(xpu)])
1448 1033
Now xpu has 1033 unique x-values for plotting. They are unevenly spaced and increasing. There are 10000, or sometimes more, data pairs with x-values between each value in xpu.
10 件のコメント
William Rose
2024 年 9 月 27 日
@Abinesh G, if you want a finer grid of x-values for plotting and analysis, adjust idx in the code above, or combine 2 lines. For example,
% idx=1:10000:length(x);
% xp=xs(idx);
xp=xs(1:1000:length(x)); % 1000 data pairs between xp values
You will still need to use xpu=unique(xp); to remove duplicates.
William Rose
2024 年 9 月 27 日
編集済み: William Rose
2024 年 9 月 27 日
Here is an example of how you could use the vector xpu in your analysis: Find the mean y-value of the samples in each bin.
This code doesn't run in this window, since the data file is too big to attach. It assumes the data vectors x and y are available, from commands listed above, and xpu has been computed using the commands above. size(xpu)=1033x1.
ymn=zeros(size(xpu)); % allocate vector for mean values of y
for i=1:length(xpu)-1, ymn(i)=mean(y(x>=xpu(i) & x<xpu(i+1))); end
ymn(end)=mean(y(x==xpu(end))); % last value in ymn
Plot the results. Include a plot of all the mean values and a separate plot of the low-x range, where most of the data is concentrated.
figure; subplot(211); plot(xpu,ymn,'-r.');
grid on; xlabel('X'); ylabel('mean(Y)')
subplot(212); plot(xpu,ymn,'-r.');
grid on; xlabel('X'); ylabel('mean(Y)'); xlim([0 5e5])
The commands above produce the figure below.

Most of the bins have 10000 elements in them, but some bins have more, and the last bin has only one sample in it. The low end x-value of each bin is used as the x-value for plotting.
Abinesh G
2024 年 9 月 27 日
Thanks a lot for detailed response. I am attaching the sampled data within 5MB. My goal is to perform Quantile decision tree using 'quantilePredict' and extract the data points existing outside the 90 percentile line. For 'quantilePredict' I need to provide the predictor data, I have provided initially the data with 'linspace'. However it ends up with equal spacing vector so that I cannot capture the data behaviour with the regions with high density.
Your suggestion on indexing the unique sorted data is sensible. I will give a try. If you have any other suggestion for my problem, kindly let me know.
Image Analyst
2024 年 9 月 27 日
I could be wrong (because I don't fully understand your problem) but I think if you wanted to spread out the x axis to make it more uniform you'd want to use inverse transform sampling. Basically you find the CDF of your data and invert it. See https://en.wikipedia.org/wiki/Inverse_transform_sampling for more info.
However I suspect this may be an XY Problem ( https://en.wikipedia.org/wiki/XY_problem ) where you're asking us to solve something that is not ultimately what you should be wanting to do.
"extract the data points existing outside the 90 percentile line"
The commands below extract the x,y pairs with the bottom 5%, middle 90%, and top 5% of x values. I realize this is not exactly what you want, but it is related.
data=load('prec_ev_data.mat');
x=data.prec_ev_data(:,1);
y=data.prec_ev_data(:,2);
N=length(x);
[xs,xOrder]=sort(x);
ys=y(xOrder); % y, sorted by sort order of x
xsLo=xs(1:round(N/20)); % lowest 10% of x values
ysLo=ys(1:round(N/20)); % y values corresponding to xsLo
xsMid=xs(round(N/20)+1:round(0.95*N)); % middle 90% of x values
ysMid=ys(round(N/20)+1:round(0.95*N)); % y values corresponding to xsMid
xsHi=xs(round(0.95*N)+1:end); % top 10% of x values
ysHi=ys(round(0.95*N)+1:end); % y values corresponding to xsHi
figure;
subplot(311), scatter(xsLo,ysLo,24,'r')
xlabel('X_{Low}'); ylabel('Y'); grid on
subplot(312), scatter(xsMid,ysMid,24,'g')
xlabel('X_{Mid}'); ylabel('Y'); grid on
subplot(313), scatter(xsHi,ysHi,24,'b')
xlabel('X_{High}'); ylabel('Y'); grid on

Abinesh G
2024 年 9 月 27 日
Thanks for responding. I may have not communicated properly. But inverse sampling as mentioned by @Image Analyst and your comment on indexing unique sort data may help me in producing unequal ordered vector (with some modifications). So for now I am closing the question by accepting your answer.
@Abinesh G, you're welcome. the suggestions of @Star Strider and @Image Analyst are always valuable.
Are you trying to make a decision tree (or a decision tree forest) to predict Y from X, where X and Y are vectors?
data=load('prec_ev_data.mat');
x1=data.prec_ev_data(:,1);
y1=data.prec_ev_data(:,2);
N1=length(x1);
% There are 900K x1,y1 pairs. Use a random 1% of them for this example.
idx1=randperm(N1); % random rearrangement of indices
x=x1(idx1(1:round(N1/100))); % random 1% of the x1 values
y=y1(idx1(1:round(N1/100))); % correspondng y1 values
Mdl1= TreeBagger(50,x,y,Method="regression")
Mdl1 =
TreeBagger
Ensemble with 50 bagged decision trees:
Training X: [9000x1]
Training Y: [9000x1]
Method: regression
NumPredictors: 1
NumPredictorsToSample: 1
MinLeafSize: 5
InBagFraction: 1
SampleWithReplacement: 1
ComputeOOBPrediction: 0
ComputeOOBPredictorImportance: 0
Proximity: []
For predX, use 21 x-values, chosen so there are approximately equal numbers of samples between successive vaues of predX.
N=length(x);
xs=sort(x);
idx=[1,round(N*(.05:.05:1))];
predX=xs(idx);
%predX=linspace(min(x),max(x),21)'
YQuantiles = quantilePredict(Mdl1,predX,'Quantile',[0.05,0.5,0.95]);
Error using sparse
Third input must be double or logical.
Third input must be double or logical.
Error in CompactTreeBagger>localGetTrainingNodes (line 2224)
S = sparse(1:sum(ibtf),tnode,tw(ibtf));
Error in CompactTreeBagger/quantileNode (line 1880)
[trainNode,TW] = localGetTrainingNodes(trees,Xtrain,wtrain,ibIdx);
Error in CompactTreeBagger/quantilePredictCompact (line 1924)
quantileNode(bagger,X,Xtrain,Wtrain,varargin{:});
Error in TreeBagger/quantilePredict (line 1537)
[varargout{1:nargout}] = quantilePredictCompact(bagger.Compact,X,...
figure; hold on
plot(X,Y,"r.");
plot(predX,YQuantiles)
xlabel('X'); ylabel('Y');
legend('Data','5%','Median','95%')
I am not sure why this error happens. Maybe you can figure it out. I have tried various changes (different number of trees, different length of predX, different values for Quantile vector, changing "...,'Quantile',[...])" to "...,Quantile=[...])", changing predX to a linspace vector, etc. Nothing helped. I also viewed vectors idx and predX to confirm that they look reasonable.
Image Analyst
2024 年 9 月 28 日
@Abinesh G for what it's worth, I'm attaching my demo of using inverse transform sampling to draw samples from a Rayleigh distribution using the formula for one input, and a uniform sampling of random numbers for the other input. Output is random numbers as if they were drawn from a Rayleigh distribution even though we started with random numbers drawn from a uniform distribution.
William Rose
2024 年 9 月 29 日
@Abinesh G, @Image Analyst suggested above that you find the cumulative distribution function, then invert it. That is what my code above does, using your data. By sorting the x-data, then taking equally spaced values (equal in terms of indices along the sorted x-vector), then finding the corresponding x-values at those indices, you are, in effect, finding equally spaced points on the vertical axis of the CDF, then finding the corresponding horizontal axis values (i.e. x-axis values).
Abinesh G
2024 年 9 月 30 日
Thanks a lot for responding. I have gone through your demo. I understood the concept on inverse transfrom sampling from cumulation distribution function.
@William Rose: Agree. Your response on deriving interval from the sorted value roughly does this invrse transform.
Although, the concept is straightforward, it didn't occur to me. Thanks to both of you for an elegant solution.
その他の回答 (0 件)
カテゴリ
ヘルプ センター および File Exchange で Annotations についてさらに検索
参考
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!Web サイトの選択
Web サイトを選択すると、翻訳されたコンテンツにアクセスし、地域のイベントやサービスを確認できます。現在の位置情報に基づき、次のサイトの選択を推奨します:
また、以下のリストから Web サイトを選択することもできます。
最適なサイトパフォーマンスの取得方法
中国のサイト (中国語または英語) を選択することで、最適なサイトパフォーマンスが得られます。その他の国の MathWorks のサイトは、お客様の地域からのアクセスが最適化されていません。
南北アメリカ
- América Latina (Español)
- Canada (English)
- United States (English)
ヨーロッパ
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
