Extracting survey data from individual files and combining

1 回表示 (過去 30 日間)
Erik J
Erik J 2022 年 10 月 18 日
コメント済み: Erik J 2022 年 10 月 19 日
I have folder containing thousands of individuals files. In each file is text (representing a survey response) of the following format:
2022_1_29_14_29_51_913,0, 0
2022_1_29_14_29_56_761,1, 0
2022_1_29_14_30_4_268,2, 3
2022_1_29_14_30_8_6,4, 1
2022_1_29_14_30_11_87,5, 1
2022_1_29_14_30_16_668,6, 0
2022_1_29_14_30_19_470,7, 1
2022_1_29_14_30_23_586,8, 3
2022_1_29_14_30_32_501,16, 4
Where the 2nd to last number indicates the question number on a survey and the last number is the answer value for that question.
I need to combine all of these files into a single table, where each row represents one file (survey) and each answer (last number) is placed in a column that corresponds to the question number (2nd to last number).
I have no problem reading in the files. I do not know how to access just the last 2 values from each line, and use the 2nd to last number as the location of the last number in a table (with NaN values where there are skipped questions).
Any ideas would be very useful. Thank you.
  2 件のコメント
David Hill
David Hill 2022 年 10 月 18 日
Attaching a file or two would be helpful. Do all the files have the same number of table columns?
Erik J
Erik J 2022 年 10 月 18 日
There are 16 possible questions, but not all files have all 16 answers because of skip logic in the survey. So, I need to use the 2nd to last number to place the last number in that column, and NaN elsewhere.
I have attached a couple files.

サインインしてコメントする。

採用された回答

Karim
Karim 2022 年 10 月 18 日
編集済み: Karim 2022 年 10 月 19 日
I used the text files from your comment to update my answer, to (hopfully) better fit it to what you are looking for.
I was not sure what to do with question no "0", hence i added a column for that one as well. I added comments in the code to explain the idea.
Hope it helps :)
% list the files
MyFiles = ["https://www.mathworks.com/matlabcentral/answers/uploaded_files/1161118/003_003_0_2022_1_29_14_29_40_741_SA.txt";
"https://www.mathworks.com/matlabcentral/answers/uploaded_files/1161123/003_003_0_2022_2_6_12_31_6_722_SA.txt"];
MyFileNames = ["003_003_0_2022_1_29_14_29_40_741_SA";
"003_003_0_2022_2_6_12_31_6_722_SA"];
% allocate a table
sz = [numel(MyFiles) 18]; % 16 questions + survey name
varTypes = ["string",repmat("double",1,sz(2)-1)];
varNames = ["Survey","Q0","Q1","Q2","Q3","Q4","Q5","Q6","Q7","Q8","Q9","Q10","Q11","Q12","Q13","Q14","Q15","Q16"];
MyTable = table('Size',sz,'VariableTypes',varTypes,'VariableNames',varNames);
% set all answers tot nan
MyTable{:, 2:end} = NaN;
% have a look at the empty table
MyTable
MyTable = 2×18 table
Survey Q0 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 _________ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ <missing> NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN <missing> NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
for i = 1:numel(MyFiles)
% read new file
currTable = readtable(MyFiles(i),'Format','%s %f %f');
% use file name as survey name
MyTable{i, 1 } = MyFileNames(i);
% append the new data into the large table
Q_num = currTable{:,2}+2; % -> determine the column number in the table
Q_val = currTable{:,3}; % -> values for the answers
MyTable{i, Q_num } = Q_val.'; % -> note the transpose to switch from column to row vector
end
% have a look at the final table
MyTable
MyTable = 2×18 table
Survey Q0 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 _____________________________________ __ __ __ ___ ___ __ __ __ __ ___ ___ ___ ___ ___ ___ ___ ___ "003_003_0_2022_1_29_14_29_40_741_SA" 0 0 3 NaN 1 1 0 1 3 NaN NaN NaN NaN NaN NaN NaN 4 "003_003_0_2022_2_6_12_31_6_722_SA" 0 0 1 1 NaN 1 1 1 3 2 0 4 0 4 2 4 4

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeIntroduction to Installation and Licensing についてさらに検索

タグ

製品


リリース

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by