EXTRACTING NETCDF DATA BASED ON TIME

Good afternoon:
I am operating on a NetCDF file that contains data for 20 variables over a period of 8 months. This is too much data, so I am trying to extract data based on the time of day. That is, to extract data for all variables from 11pm to 4am for each day in the data file. I have been able to pull out the date / time in the format "dd-mmm-yyyy hh:mm:ss". I can extract and work on ranges of time, but not a range of time per day, for many days.
I can see in my head how to do this, but I am unsure of an efficient way to code it. Experimenting with different time functions, (datenum,hours, datetime, datevec) with other structure and NetCDF tools have been unsucessful. I could use a shove in the right direction. Thank you.

 採用された回答

Walter Roberson
Walter Roberson 2017 年 7 月 5 日

0 投票

See https://www.mathworks.com/matlabcentral/answers/312198-how-to-extract-data-from-nc-file-based-on-latitude-longitude-time-and-wind#comment_464820 and note that in my sample "expanding the selection" code that you could code hours and minutes into the from date and to date strings.

21 件のコメント

NATHAN MURRY
NATHAN MURRY 2017 年 7 月 5 日
編集済み: NATHAN MURRY 2017 年 7 月 5 日
Hi Walter:
It appears I spoke a bit too soon. I have been working with the 'expanded selection' code you referenced in the previous post. I added an experimental time to the 'from_' and 'to_' strings as follows.
from_date = '2015-07-01 1:0:0'; to_date = '2015-08-31 5:0:0';
time_datenum = time / (60*60*24) + datenum('1900-01-01 0:0:0')
date_match = time_datenum >= datenum(from_date) && time_datenum <= datenum(to_date);
This results in a "Operands to the and && operators must be convertible to logical scalar values" error.
The time calculations prior to the 'date_match' statement return the correct dates and times. The time variable in my NetCDF file is as follows:
time
Size: 10964x1
Dimensions: obs
Datatype: double
Attributes:
_FillValue = -9999999
long_name = 'time'
standard_name = 'time'
units = 'seconds since 1900-01-01 0:0:0'
calendar = 'gregorian'
axis = 'T'
The idea makes sense, but the syntax is tripping me up. Also, I am unsure how the 'date_match' statement will grab the data for the same block of time for every day in the dataset. Thank you for your assistance.
Walter Roberson
Walter Roberson 2017 年 7 月 5 日
date_match = time_datenum >= datenum(from_date) & time_datenum <= datenum(to_date);
Walter Roberson
Walter Roberson 2017 年 7 月 5 日
date_match will be a logical vector the same length as the time dimension. All "slices" at that same index in that dimension of the array are for the same time. So for example,
squeeze( data(:, :, time_match, :) )
if the time happens to be the third index.
NATHAN MURRY
NATHAN MURRY 2017 年 7 月 12 日
Hi Walter:
Thank you for your responses. I am back at this again, and I can now define time periods and retrieve data accordingly.
% VARIABLES
nctime = ncread(ncfile,'time');
dtime = nctime/(60*60*24)+datenum(1900,1,1);
pressure = ncread(ncfile,'ctdpf_ckl_seawater_pressure');
temperature = ncread(ncfile,'ctdpf_ckl_seawater_temperature');
salinity = ncread(ncfile,'ctdpf_ckl_sci_water_pracsal');
% TIME BLOCK
from_date = '2015-06-01 02:30:00';
to_date = '2015-06-01 05:30:00';
time_match = dtime >= datenum(from_date) & dtime <= datenum(to_date);
select_dtime = dtime(time_match);
select_pressure = pressure(time_match);
select_temperature = temperature(time_match);
select_salinity = salinity(time_match);
However, I am still unable to retrieve data for a particular time block for all the days contained inside a given data file. The Squeeze command makes sense, but only for particular variables, IE, salinity:
select_salinity = squeeze(salinity(time_match))
Adding (:, :, ....) as in your example generates a 'matrix exceeds dimensions', which I would expect since my select_ statements only address one variable. I am not sure how to read multiple NetCDF variables in one statement such that I can use 'Squeeze' as you did:
select_data = squeeze(all-required-variables(:, :, (however many indices), time_match, :, .....)
A solution could be to write a loop to step through all days in a data set, 'Squeezing' the data as per the time block defined above, for all required variables. However, that doesn't seem like efficient coding. I am looking for another shove in the right direction. Thanks.
Walter Roberson
Walter Roberson 2017 年 7 月 12 日
Could you post the ncinfo() results for the netcdf file?
NATHAN MURRY
NATHAN MURRY 2017 年 7 月 12 日
Name: '/'
Dimensions: [1x4 struct]
Variables: [1x20 struct]
Attributes: [1x55 struct]
Groups: []
Format: 'netcdf4'
Walter Roberson
Walter Roberson 2017 年 7 月 12 日
Please post the Dimensions and Variables components... I might need Attributes as well.
I am trying to figure out how the various parts are indexed.
NATHAN MURRY
NATHAN MURRY 2017 年 7 月 12 日
Please see the attached text file. This is the result of a ncdisp(ncfile), where ncfile = the netcdf data file in question.
Also, please know that I am not merely fishing for answers. I would like to learn from this as much as I can, and do as much of the spade work as possible. It is just nice to have the experts point me in the right direction when I am banging my head. Thanks again.
Walter Roberson
Walter Roberson 2017 年 7 月 13 日
My talking about slices earlier was against the idea that you were accessing an array with multiple dimensions, for which the time formed one of the directions. I see from the variable descriptions that is not the case at all: for the non-string variables you just have vectors of readings. You would use the same mask to index all of the vectors, and you would not use squeeze.
NATHAN MURRY
NATHAN MURRY 2017 年 7 月 21 日
編集済み: NATHAN MURRY 2017 年 7 月 21 日
Hi Walter:
I have attacked this problem two ways. I believe first is close. When using the ENTIRE data file, the script will pull the hours and data I want, exactly as I expect it. However, when I attempt to subset the date range, something goes haywire:
%----VECTORIZE DATA FILE TIME VARIABLE----
dtime_vec = datevec(dtime); % Vectorize entire data file time variable
%----SELECT DATE RANGE IF DESIRED----
from_date = '2015-05-01 00:00:00';
to_date = '2015-05-02 00:00:00';
date_match = dtime >= datenum(from_date) & dtime <= datenum(to_date);
date_range = dtime(date_match);
date_range_vec = datevec(date_range);
%----SELECT DATA BY TIME----
from_hour = 2;
to_hour = 4;
%****IN 'time_match' STATEMENT BELOW, REPLACE 'date_range_vec'
%**** WITH 'dtime_vec' IF ENTIRE DATA FILE IS TO BE USED
time_match = date_range_vec(:,4) >= from_hour & date_range_vec(:,4) <= to_hour ;
time_range = datenum(datevec(dtime(time_match)));
time_range_pressure = pressure(time_match);
time_range_temperature = temperature(time_match);
time_range_salinity = salinity(time_match);
time_range_data = [time_range time_range_pressure time_range_temperature time_range_salinity];
Again, this method works perfectly when using an entire data file, without the date subsetting.
I am still working with the second method, which is adapting a stock script found elsewhere. The idea was to vectorize the full 'dtime' variable as above, and use 'find' to isolate/match the 'hour' data, and then 'ncread' to pull in the corresponding data. This works perfectly for any date range (the date match statement is rem-ed out below), but not with pulling selected hours ranges:
%----START / END DATES & TIMES, AND MATCHING----
dtime_vec = datevec(dtime);
start_dt = datenum(2015,5,1,6,00,0);
start_dt_vec = datevec(start_dt);
end_dt = datenum(2015,5,1,6,30,0);
end_dt_vec = datevec(end_dt);
%----FIND DATA IN TIME RANGE----
% tmindex = find(dtime>=start_dt & dtime<=end_dt) %--SUBSET BY DATE--
tmindex = find(dtime_vec(:,4) >= start_dt_vec(:,4) & dtime_vec(:,4) <= end_dt_vec(:,4)) % --SUBSET BY TIME--
dtime = dtime(tmindex)
%----READ VARIABLES WITHIN THE DEFINED TIME RANGE----
pressure = ncread(ncfile,'ctdpf_ckl_seawater_pressure',tmind(1),tmind(end)-tmind(1)+1,1);
%--------
I don't think I can use 'find' the way I am attempting to, but I am not sure if this is close as well. I could use another shove. Thank you.
Walter Roberson
Walter Roberson 2017 年 7 月 21 日
Could you post a link to the netcdf file you are using; also it would help to have all of the code in one place so I do not have to make assumptions about how you are reading the file to create the time vector.
NATHAN MURRY
NATHAN MURRY 2017 年 7 月 24 日
The 'v2' script contains code for the first example I listed above.
The 'v3' script contains code for the second example I listed above.
Walter Roberson
Walter Roberson 2017 年 7 月 24 日
The scripts lead to "not found".
NATHAN MURRY
NATHAN MURRY 2017 年 7 月 24 日
Sorry, IIS glitch. Try it now.
Walter Roberson
Walter Roberson 2017 年 7 月 24 日
When you do
date_range = dtime(date_match);
date_range_vec = datevec(date_range);
then you create date_range and date_range_vec as a subset of the original data. Then when you use
time_match = date_range_vec(:,4) >= from_hour & date_range_vec(:,4) <= to_hour ;
you are doing that relative to the subset, creating a logical subset of the subset. But then you do
time_range_pressure = pressure(time_match);
which is indexing the full data with the mask that is only relevant to the subset.
You should be doing something like
%----SELECT DATA BY TIME----
from_hour = 2;
to_hour = 4;
date_vec = datevec(dtime);
date_match = dtime >= datenum(from_date) & dtime <= datenum(to_date) & date_vec(:,4) >= from_hour & date_vec(:,4) <= to_hour;
Now extract from the full data using date_match as the logical index.
NATHAN MURRY
NATHAN MURRY 2017 年 7 月 24 日
I also forgot to mention that the ...987390.nc is what I have been operating on. The ...988564.nc is another data file of roughly the same content I have used for comparison.
Walter Roberson
Walter Roberson 2017 年 7 月 24 日
I do not have your "startup" routine.
Note: you should avoid using "clear all". Especially in code you are sending around to other people, who might not appreciate having their workspaces wiped out :( "clear all" clears everything except for closing active graphics -- including it removes all pre-parsed files, and all loaded dll's, and all persistent variables.
"clear all" is more or less Wile E. Coyote's trick of exploding the rock he is standing on: he might be okay for a moment, but only until he looks down.
Basically you should only use "clear all" in a script if it is a script that does nothing other than resetting internal state and you do not want to have to remember "clear all" as a command name.
If you are using it to avoid name clashes with variables in your script, then chances are you should be using a function instead of a script.
NATHAN MURRY
NATHAN MURRY 2017 年 7 月 25 日
Hi Walter: Thank you for all the tips. I will attack it again tomorrow.
I rarely use 'clear all', and am aware of it's "power". It was meant to be part of some tests with this particular script on on my machine only. I meant to eliminate that line before I posted the script for your inspection; sorry about that.
When I start working on a new operation, I usually script it to prove the concept, and then clean it up and make it a function (if required) before it moves on. I don't always follow my own best practices and neatness at that stage. I appreciate any guidance I can get, but do know that it is not my habit to release scripts or functions in this rough form.
Thanks again for all of your pointers.
Walter Roberson
Walter Roberson 2017 年 7 月 25 日
I think I still need the startup code? Or can I just comment that out?
NATHAN MURRY
NATHAN MURRY 2017 年 7 月 25 日
There is a copy of it in the web directory I listed above. However, I do not believe you will find anything in it critical to solving the issue at hand.
NATHAN MURRY
NATHAN MURRY 2017 年 8 月 1 日
Hi Walter: So I see I had the correct two statements, but I didn't try to join them together in a larger logical statement as you showed. With some further experimentation and additions, the function works great.
Thank you again for all of your help. I learned quite a bit in wrestling through this problem. Take care.
--NMM

サインインしてコメントする。

その他の回答 (2 件)

Tanziha Mahjabin
Tanziha Mahjabin 2020 年 1 月 29 日
編集済み: Walter Roberson 2020 年 1 月 29 日

0 投票

Hi,
I want to cut some time from a bid data, using ncread(source,varname,start,count).
for your information,
UCUR_sd
Size: 69x69x45588
Dimensions: J,I,TIME
Datatype: single
Attributes:
long_name = 'Standard deviation of sea water velocity U component values in 1 hour.'
units = 'm s-1'
valid_min = -10
valid_max = 10
cell_methods = 'TIME: standard_deviation'
coordinates = 'TIME LATITUDE LONGITUDE'
_FillValue = 999999
ancillary_variables = 'NOBS1 NOBS2 UCUR_quality_control'
Now if i write,
u=ncread(ncfile,'UCUR',[1 1 1],[Inf Inf 44931]);
it takes the command as the start time is from the start.
But what should i write if i want cut the time from somewhere middle?
I tried to define index,
ind=find(time>=datenum(2017,02,16,0,0,0)&time<=datenum(2017,02,17,0,0,0))
u=ncread(ncfile,'UCUR',[1 1 ind],[Inf Inf 44931]);
But it is not working. Any helpful suggestion please.

1 件のコメント

Walter Roberson
Walter Roberson 2020 年 1 月 29 日
netcdf times are never in MATLAB serial datenum . Instead they are in some time units relative to a particular epoch that is defined in the attributes, such as "seconds since Jul 1, 1983 00:00:00 UTC" . You need to examine the attributes for the TIME coordinate and do the conversion.

サインインしてコメントする。

Tanziha Mahjabin
Tanziha Mahjabin 2020 年 1 月 30 日

0 投票

Hi Walter,
Thanks for the comment. I did the conversion.
ncfile='IMOS_aggregation_20200124T074252Z.nc';
rtime=ncread(ncfile,'TIME');
time=datenum(rtime+datenum(1950,1,1,0,0,0));
When i write something like this, ru=ncread(ncfile,'UCUR',[1 1 1],[Inf Inf 931]); it works as the time starts from the beginning.
But i want to start the time from somewhere else as i mentioned in my question. So i defined index and try to start according to that.
ind=find(time>=datenum(2017,02,16,0,0,0)&time<=datenum(2017,02,17,0,0,0))
u=ncread(ncfile,'UCUR',[1 1 ind],[Inf Inf 44931]);
It didn't work.

8 件のコメント

Walter Roberson
Walter Roberson 2020 年 1 月 30 日
Is the time in the file days since that particular date? Not impossible for netcdf but tends to be a different time unit.
Tanziha Mahjabin
Tanziha Mahjabin 2020 年 1 月 30 日
no, it is hourly average.
Walter Roberson
Walter Roberson 2020 年 1 月 30 日
You coded
time=datenum(rtime+datenum(1950,1,1,0,0,0));
When you used datenum(1950,1,1,0,0,0) you would have gotten back a MATLAB serial date number, which would be in full days since Jan 1 0000 . When you add rtime to that, you are adding rtime days, so for example 27 would be 27 days after the base time, not 27 hours. You would have to use
time=datenum(rtime/24+datenum(1950,1,1,0,0,0));
However, unless you have special reason otherwise, I would recommend that you switch to using datetime objects:
time = datetime(1950,1,1,0,0,0)) + hours(rtime);
ind = isbetween(time, datetime(2017,02,16,0,0,0), datetime(2017,02,17,0,0,0));
Tanziha Mahjabin
Tanziha Mahjabin 2020 年 1 月 30 日
Sorry to bother you again. i tried both, but they didn't work. they are giving empty matrix for ind.
I guess because they are already hourly values.
If i write according to my code, ind=find(time>=datenum(2017,02,16,0,0,0)&time<=datenum(2017,02,17,0,0,0))
it gives me the correct count numbers during 16Feb2017 to 17Feb2017.
only it doesn't work when i input it in this line:
u=ncread(ncfile,'UCUR',[1 1 ind],[Inf Inf 44931]);
Walter Roberson
Walter Roberson 2020 年 1 月 30 日
ncread cannot read selected indices from an array, except that it can read a consecutive subsection.
mask = time>=datenum(2017,02,16,0,0,0) & time<=datenum(2017,02,17,0,0,0);
firstind = find(mask,1,'first');
lastind = find(mask,1,'last');
reduced_mask = mask(firstind:lastind);
most_u = ncread(ncfile,'UCUR',[1 1 firstind],[Inf Inf lastind-firstind+1]);
u = most_u(:,:,reduced_mask);
In the special case that time is sorted, the true entries in mask would be consecutive and the result of ncread could be used without further subselection.
mask = time>=datenum(2017,02,16,0,0,0) & time<=datenum(2017,02,17,0,0,0);
firstind = find(mask,1,'first');
lastind = find(mask,1,'last');
u = ncread(ncfile,'UCUR',[1 1 firstind],[Inf Inf lastind-firstind+1]);
Tanziha Mahjabin
Tanziha Mahjabin 2020 年 1 月 30 日
編集済み: Walter Roberson 2022 年 10 月 24 日
I did manage to do it. Thank you for your time.
ncfile='IMOS_aggregation_20200124T074252Z.nc';
time=ncread(ncfile,'TIME');
begin = datenum('01 January 1950 00:00:00');
time=time+begin;
date_start=datestr(time(1));
date_end=datestr(time(end));
t0=find((datenum('16 February 2017 00:00:00'))==time);
t1=find((datenum('16 February 2017 23:00:00'))==time);
mtime=time(t0:t1);
date=datestr(mtime);
coo=0; nt=length(mtime);
for i=1:nt;
coo=coo+1;
% get each hours data
rtime=datenum(mtime(i));
[y m d hr min sec]=datevec(rtime);
start=datenum(y,m,d,hr,0,0);
ru=single(ncread(ncfile,'UCUR',[1 1 t0-1],[Inf Inf nt]));
Walter Roberson
Walter Roberson 2020 年 1 月 30 日
I don't think you want that read inside a for loop??
Mehak S
Mehak S 2024 年 4 月 8 日
Why 't0-1' and not t0 while reading the file?

サインインしてコメントする。

製品

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by