- Use tabularTextDatastore to read all the data from the CSV files into a single table, which will be your big dataset.
- Create timestamps for all the measurements. It's important to make sure that the timestamps match the order of the measurements.
- Add the timestamps as a new variable (column) in your table.
- Finally, save the table as a single CSV file for machine learning.
How I can load multiple csv files from class type data into one big dataset?
1 回表示 (過去 30 日間)
古いコメントを表示
Hello, I have a class type data called "mSWW", it has different properties (as per below code). It contains many .csv files with data from smart meters for electric load forecast. This class was generated with Matlab code, that sorted different properties into several scalar vectors (fName, fYear, fMonth and etc.) In the folder this files are sorted by day (17 days in total). For each day, the smart meters did measurements each minute. So there is a lot of csv files for each day (around 1440 for each day).
classdef MeasureSWW
properties
% File Path Information
path % Main File Path
fName % All File Names
fFullPath % All File Names (Full Path)
% File Date Number Information
fYear % File Year Identifier
fMonth % File Month Identifier
fDay % File Day Identifier
fHour % File Hour Identifier
fMinute % File Minute Identifier
fSecond % File Second Identifier
% Start and End Date Information
START_DATE % Start Date Information
END_DATE % End Date Information
end
My goal is
1)to make a big .csv file (for Machine Learning purposes) out of this files.
2) make 17 .csv files for each day.
Any suggestions on how it is better done? Please share your experience.
Right now, what I do is I used datastore() function to read some data. Here is my code so far:
%Loading files from mSWW class
%For creating mSWW, the LoadMeasurements.m was used (with some adjustments)
load mSWW.mat
ds = tabularTextDatastore(mSWW.fFullPath(1:1439, 1),'FileExtensions','.csv')
% creating a timestamp (readable time for Matlab) - datetime type value
% from Year, Month, Day, Hour, Minute and Second
timeStamp = datetime(mSWW.fYear(1:1439, 1),mSWW.fMonth(1:1439, 1), ...
mSWW.fDay(1:1439, 1),mSWW.fHour(1:1439, 1),mSWW.fMinute(1:1439, 1),mSWW.fSecond(1:1439, 1))
%Creating a big dataset (raw version)
all_days = readall(ds)
% data for day 1 (example)
day1 = all_days(1:1439,:)
day1.Properties.Description
My problem is that with this approach I can't cover all data in one big file and concate it with timestamp. Because the size of timestamp is 1439x1 and size of all_days is 109363x12.
Thank you very much in advance!
0 件のコメント
回答 (1 件)
Rishav
2023 年 10 月 11 日
Hi Magsud,
I understand that you want to create a big csv file and create 17 csv files for each day.
Create a Big CSV File
Here is how you can do it:
% Assuming mSWW is loaded already
ds = tabularTextDatastore(mSWW.fFullPath, 'FileExtensions', '.csv');
% Read all data into a table
all_data = readall(ds);
% Create timestamps based on your time components
timeStamp = datetime(mSWW.fYear, mSWW.fMonth, mSWW.fDay, ...
mSWW.fHour, mSWW.fMinute, mSWW.fSecond);
% Add the timestamps to the table
all_data.Timestamp = timeStamp;
% Save the combined table as a big CSV file
writetable(all_data, 'big_data.csv');
Create Daily CSV Files
To create separate CSV files for each of the 17 days, you can use a loop to filter the data for each day and save it as a separate CSV file. Here is how you can do it:
unique_dates = unique(dateshift(all_data.Timestamp, 'start', 'day'));
for i = 1:length(unique_dates)
date_filter = all_data.Timestamp >= unique_dates(i) & ...
all_data.Timestamp < unique_dates(i+1);
daily_data = all_data(date_filter, :);
% Save daily data as CSV
filename = sprintf('day_%02d.csv', i);
writetable(daily_data, filename);
end
This code will create separate CSV files named day_01.csv, day_02.csv, and so on for each day.
Thank you,
Rishav Saha
0 件のコメント
参考
カテゴリ
Help Center および File Exchange で Data Import and Analysis についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!