Combine data from multiple netCDF files

Question

Michiel Smit 2021 年 11 月 8 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1581584-combine-data-from-multiple-netcdf-files

コメント済み: Michiel Smit 2021 年 11 月 9 日

Dear reader,

I have multiple datasets for hourly wave height in the NE Atlantic Ocean that is stored in netCDF files for each month (Jan 1990 till Dec 2019). To properly work with the data I wish to combine these datasets such that I get the following output:

Vectors for longitude and latitude

Datetime array from Jan 1990 till Dec 2019

3D data for wave height stored as longitude x latitude x time\

For this, I have written a script (see below). However, at the moment this script is very inefficient and it takes hours to do the job. Is there any way to optimize this script or is there another way to efficiently combine data from multiple NETCDF files?

Your help would be very much appreciated.

param = 'hs';
year = (1990:2019)';
years = num2str(year);
month = ['01'; '02'; '03'; '04'; '05'; '06'; '07'; '08'; '09'; '10'; '11'; '12'];
%% Step 2: Process the data
% Load longitude bands
longitude = ncread(fileIN,'longitude');
% Load latitude bands
latitude = ncread(fileIN,'latitude');
% Create the timeTotal and paramTotal here
timeTotal = datetime(1900,1,1,0,0,0,'Format','dd-MMM-yyyy HH:mm:ss');
paramDataTotal = zeros(length(longitude),length(latitude),1);
% The following for loops first checks if the file exists, then extracts
% the time and param arrays and adds those to the "total dataset."
for i = 1:length(years(:,1));
    for j = 1:length(month(:,1));
        fileIN = ['WW3-ATNE-10M_' years(i,:) month(j,:) '_hs.nc']
        if isfile([Directory_load '\' fileIN]) == 1
            % Load all time arrays time array
            time = ncread(fileIN,'time');
            epoch = datetime(1990,01,01);
            time = epoch + days(time);
            
            % add time array to the total time array
            timeTotal = cat(1,timeTotal,time);
            
            % Load parameter data
            paramData = ncread(fileIN,param);
            
            % add param data to the total 3D paramdata
            paramDataTotal = cat(3,paramDataTotal,paramData);
        else 
            continue
        end
    end
end
% Remove the first entry in the total data.
timeTotal(1) = [];
paramDataTotal(:,:,1) = [];

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Seth Furman 2021 年 11 月 8 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1581584-combine-data-from-multiple-netcdf-files#answer_826784

MATLAB Online で開く

Take a look at the documentation for reading large files and big data.

https://www.mathworks.com/help/matlab/large-files-and-big-data.html

One simple approach would be to read your data into a tall timetable, e.g.

% Create some sample NetCDF files.
createSampleNetCDFFiles();
ls data
WW3-ATNE-10M_199001_hs.nc  WW3-ATNE-10M_199003_hs.nc  WW3-ATNE-10M_199005_hs.nc  WW3-ATNE-10M_199007_hs.nc  WW3-ATNE-10M_199009_hs.nc  WW3-ATNE-10M_199011_hs.nc
WW3-ATNE-10M_199002_hs.nc  WW3-ATNE-10M_199004_hs.nc  WW3-ATNE-10M_199006_hs.nc  WW3-ATNE-10M_199008_hs.nc  WW3-ATNE-10M_199010_hs.nc  WW3-ATNE-10M_199012_hs.nc
% Read data as a tall timetable.
ds = fileDatastore("data", "ReadFcn", @readNetCDF, "UniformRead", true);
tTall = tall(ds)
tTall =

  M×3 tall timetable

            Time            Latitude    Longitude          Data      
    ____________________    ________    _________    ________________

    30-Dec-1989 23:59:53    -1.0001      -1.0001     706.05    438.74
    03-Jan-1990 19:02:14     2.7932       2.7932     31.833    381.56
    09-Jan-1990 23:34:35     8.9824       8.9824     276.92    765.52
    17-Jan-1990 08:28:52     16.353       16.353     46.171     795.2
    17-Jan-1990 08:39:27     16.361       16.361     97.132    186.87
    18-Jan-1990 00:06:36     17.005       17.005     823.46    489.76
    19-Jan-1990 19:23:06     18.808       18.808     694.83    445.59
    20-Jan-1990 05:12:04     19.217       19.217      317.1    646.31
             :                 :            :               :
             :                 :            :               :
function t = readNetCDF(filename)
    Time = datetime(1990, 1, 1) + days(ncread(filename, "time"));
    Latitude = ncread(filename, "time");
    Longitude = ncread(filename, "time");
    Data = ncread(filename, "data");
    t = timetable(Time, Latitude, Longitude, Data);
end
function createSampleNetCDFFiles()
rng default
y = 1990;
latitude = sort(180 * rand([10 1]) - 90);
longitude = sort(360 * rand([10 1]) - 180);
mkdir data
cd data
for m = 1:12
    filename = sprintf("WW3-ATNE-10M_%d%02d_hs.nc", y, m);
    d = 28 * sort(rand([10 1]));
    time = days(datetime(y, m, 0) + days(d-1) - datetime(1990, 1, 1));
    data = 1000 * rand(10, 2);
    nccreate(filename, "time", "Dimensions", {'r', size(time,1), 'c', size(time,2)});
    nccreate(filename, "data", "Dimensions", {'r', 'c1', size(data,2)});
    nccreate(filename, "latitude", "Dimensions", {'r', 'c2', size(latitude,2)});
    nccreate(filename, "longitude", "Dimensions", {'r', 'c3', size(longitude,2)});
    ncwrite(filename, "time", time);
    ncwrite(filename, "data", data);
    ncwrite(filename, "latitude", latitude);
    ncwrite(filename, "longitude", longitude);
end
cd ..
end

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Michiel Smit 2021 年 11 月 9 日

Dear Seth,

Thank you for your response and tips. I will take a look at it.

For the moment, I have figured out a way so that I do not need al data at once, but rather cut the coordinatespecific data from each dataset and connect those together to get a time series for the desired location. Although this still takes considerable time, it is more managable.

Best regards,

Michiel

サインインしてコメントする。

Answer 2

Kelly Kearney 2021 年 11 月 8 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1581584-combine-data-from-multiple-netcdf-files#answer_826809

What are the dimensions of the total dataset? And do you really need to hold the entire thing in memory at once?

I think you could speed things up in your current script by preallocating the third dimension of paramDataTotal; right now, the constant expansion of the array in each loop iteration requires Matlab to reallocate memory, which can slow things down considerably. You may want to take a look at some of the netCDF-reading tools in the Climate Data Toolbox (particularly ncstruct.m); they are designed to optimize reading in netCDF data from a series of files.

That said, reading in the entirety of a large dataset may be inherently slow regardless of how optimized your code is, simply due to the size. Which brings me back to the question of whether you really need to hold the entire dataset in memory at once. I find it often makes more sense to rethink the rest of my calculations and only read in specific hyperslabs of data as needed (again, ncstruct may simplify the reading-in syntax for you). This read-as-needed technique is similar to how talltable and other Matlab large-dataset functions work under the hood... unfortunately, tabular constructs aren't ideal for working with 3D data, in my opinion.

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Michiel Smit 2021 年 11 月 9 日

Dear Kelly,

Thank you for your response and tips.

I have figured out a way so that I do not need al data at once, but rather cut the coordinatespecific data from each dataset and connect those together to get a time series for the desired location. Although this still takes considerable time, it is more managable.

Best regards,

Michiel

サインインしてコメントする。

Combine data from multiple netCDF files

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (1 件)

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Combine data from multiple netCDF files

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (1 件)

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示