How to get area between 2 cdf curves?
6 ビュー (過去 30 日間)
Iam writing a script to automate the n number of cdfplots to save automatically and among 'n' number of plots i need to choose the best cdf plot automatically. But iam stuck in finding best cdfplot. As i have tried different appraoch to find best cdf plot here are few:
1. Using interpolation method : By taking 1 cdf plot among n cdfplots as an example, interpolating points to curve 1 and curve 2 and taking difference of the each values ( curve1(:,1)-curve2(:,1)) and by keeping threshold after taking differences, I will count how many point are less than threshold in difference array . Similar way I tried doing for n number of plots and based the count as mentioned above I will take max(count) and I find the respective max count index, based on that index I am deciding the best plot. But by trying this method I am not getting desired results.
So I am trying to find the below approach:
2. Area method: Find the area between 2 cdf curve fro each plot and which plot gives me min area I will choose that plot as best cdfplot.
Finding area between curve was easy when we use plot() but uisng cdfplot() I am finding it difficult. So please can some one help in sloving this .
Here is my code for ploting cdf:
here in the above figure i am trying to find the area between curves as i have marked lines between curves shown in bellow attached figure.
Thanks in advance.
John D'Errico 2023 年 6 月 3 日
If the curves cross, then you need to decide what the "area between" means. That is, is the area in one part negative? Or do you just want to compute the absolute value of the area between the curves.
Regardless, it is pretty simple in any case. You can do multiple things, all of which are easy, AND correct. But first, you NEED to decide which area you intend to compute.
The area between two curves is simply the integral of the difference. Essentially you just make sure all of the curves are extended at the top end to the same point. So some may need to be estrapolated. That allows you to evaluate them at the same points.
Note that the integral of the difference will, IF the curves cross, have some parts as negative, and others positive. They will negate each other, unless you decide to compute the integral of the absolute value of the difference.
How should you compute the integrals? That part is also trivial. Since these are empirical CDFs, it probably makes the most sense to use a rectangle rule, but that may not be crucially important. These PDFs are fairly bumpy as empirical PDFs, so getting a high order of integration is probably not that important. You could just use trapz, as an easy solution. Or you could even get tricky and use polyshapes. So there are many ways to do this.
Honestly, this is not a difficult problem. (I think Star was a bit confused when he said it would not be possible.) Without any data posted, or knowing in what sense you are asking to compute the area "between" the curves, I won't go any further.