Calculating duration in large dataset based on conditions
2 ビュー (過去 30 日間)
I have a large dataset with 3 columns (datetime, Var1 & Var2). The data is over a month with a datapoint every 2 seconds.
I want to determine the duration when Var2 is above 1000 and the average value of Var1 when this is the case. The problem I have is not counting the duration from one true datapoint to the next true point (Var2 >1000).
Can someone help me with some code to determine this? Alternatively, a different way of determining the same thing would be appreciated.
Divija Aleti 2021 年 6 月 22 日
As far as I understand, the code given by Salah will help you solve your issue. I can try to explain it more clearly with a slight modification.
I created a small table with 3 columns, with a datapoint every 2 seconds, for explanation purposes.
The first line, which is,
index = find(Var2 > 1000)
identifies the datapoints where Var2 > 1000. It can be seen clearly that the 1st, 4th and 5th datapoints have Var2 > 1000. Sure enough the output of the above line is:
The second line (with a slight change), which is,
duration = 2*length(index)
first calculates the number of datapoints where Var2 > 1000, which is nothing but the number of elements in index (length(index)), which is 3. As each datapoint is of 2 seconds duration, by multiplying the obtained length with 2, we get the total duration in seconds. In this case it is 6 seconds.
As your dataset is over a period of one month, you will have to convert the obtained duration (in seconds) to days, hours and minutes using the corresponding conversion rates. (1 sec = (1/60) min = (1/3600) hours = (1/86400) days)
The third line, which is,
Var1_Avg = mean(Var1(index))
returns the average values of Var1 at those datapoints where Var2 > 1000. In this case, it is (100+250+300)/3 = 216.6667.
Hope this helps!