Seeking suggestions to speed up JSON parsing to table.

21 ビュー (過去 30 日間)
Roger Pierson
Roger Pierson 2019 年 12 月 28 日
コメント済み: Roger Pierson 2020 年 1 月 2 日
I am seeking assistance in speeding up the processing time needed to parse very large JSON files and endup with a flat table of values. I have a solution that works, it just takes on the order of 3-6 minutes to process even a relaively small (for the sensor data in question) file of 57,360 JSON strings.
The original file is many thousands of individual track reports from a radar system. To aid further processing and analyis work, I want to get these track reports into a flat table of 130 variables. jsondecode faithfully decodes the strings and produces a struct with all the data. Unfortunately, the data includes many structures itself, and so simply doing a struct2tableisn't then whole answer.
for i = 1:lengthOfArray
% Decode the current row of the character array containing the JSON
currentRow = jsondecode(importedFile{1,1}{i,1});
end
In fact, trying to different variations of converting each sub-structure to a table and concatenating into one final flat table appears to be even more time consuming than my current solutions, which is brute force reading each field of each structure/sub-structure and assigning it to a flat temporary holding structure, then at the end converting that temporary structure to a table. Here is a short excerpt to show you what I mean:
tempStruct(i).trackQuality = currentRow.data.TrackData.Q_Value; %Track quality
tempStruct(i).covarianceType = currentRow.data.Track_Data.Covariance_Data.x_discriminator; %Discriminator = cartisian or shperical
tempStruct(i).coVarCartesian_varX = currentRow.data.Track_Data.Covariance_Data.Cartesian_Data.Var_X; % Track Variance for X. Meters^2
tempStruct(i).coVarCartesian_covXY = currentRow.data.Track_Data.Covariance_Data.Cartesian_Data.Cov_X_Y; % Track Covariance for X & Y
tempStruct(i).coVarCartesian_covXZ = currentRow.data.Track_Data.Covariance_Data.Cartesian_DataDataCartesian.Cov_X_Z; % Track CoVar for X & Z
Before I continue, here are some relevant profiler results:
JsonParser_Profile.png
The tabular display I think must be the final output to the command window of the struct2Table function. I can't figure out how to suppress that.
trackReportTable = struct2table(tempStruct);
Time conversions are killing me. The radar reports seconds and microseconds in seperate fields every time a timestamp is required. There are 25 times stamps in each JSON string. So for every iteration of the loop that parses the JSON file, I have to call a custom function ambTime2mat to convert the time stamps to datenum.
function [serialDateNum] = ambTime2mat (epochSeconds,microSeconds)
%This function returns date and time in matlab date serial number format
%from a given input of seconds since 1/1/1970 and microseconds since the
%last second.
%
% **** NOTE: Due to the mechanisim to combine the fields, the result only
% has millisecond resolution. ****
%
%INPUTS:
% Epochseconds = Seconds since January 1st, 1970
% microSeconds = Microseconds since value in seconds.
%
%OUTPUTS:
% serialDateNum = date and time in Matlab date serial number format.
%
%EXAMPLE 01: ambTime2mat(timeInEpochSeconds,microseconds);
%% CONVERT
%Check to see that something was passed.
if nargin == 0
error('No data passed, nothing to convert');
end
% Epoch seconds to date serial
dnum = datenum(1970,1,1,0,0,epochSeconds);
% No apparent way to add microseconds to datenum, so convert to milliseonds
% and accept loss of resolution :(
partialSeconds = round(microSeconds/1000);
% Add the milliseconds to the date serial number
serialDateNum = addtodate(dnum,partialSeconds,'millisecond');
end
I think there is substantial room for improvement here, but I can't identify it. The goal is to get seconds and microseconds from seperate fields converted into a single serial date number, loosing some resolution if necessary.
Thoughts, opinions, suggestions all welcome. Keep in mind this parser is part of a larger set of analysis tools and so doing things like creating a flat table or converting to datenum is simply to get the data into a common format the other tool components expect.
Thanks,
Roger
  4 件のコメント
Roger Pierson
Roger Pierson 2020 年 1 月 2 日
I should note that I've made some progress on the time conversion portion of my problem. I figured out that I can extract a field from a structure as a vector.
% Create a row vector from field trackTime in all elements of tempStruct
trackTimeVector = [tempStruct.trackTime]
or in one case I need convert from text to datetime, and that required a column vector:
% Create a column vector from field logTime in all elements of tempStruct
logTimeVector = vertcat(tempStruct.logTime)
From there I was able to perform conversions on the time using the vectors as input.
logTime = datetime(logTimeVector,'InputFormat',LOG_TIMESTAMP_FORMAT,'Format',AVENGER_TIMESTAMP_FORMAT);
This worked equaly well for datenum.
After converting all timestamps in this manner and removing the coresponding fields from the struct (to avoid needlessly converting them to table), I converted the rest of the structure to table.
trackReportTable = struct2table(tempStruct);
Finally, I added the timestamps, which now exist as seperate vectors into the final table to achieve my desired endstate of one flat table with all the data in it.
% Add logTime to the table. logTime was created with vertCAT, so it is already in the correct shape.
trackReportTable.logTime = (logTime);
% Add trackTime to the table. It is a row vector, and so needs ' to transpose it into columns.
trackReportTable.trackTime = (trackTime');
With 25 timestamps to deal with, this is far more clumbsy than my original approach, however it is MUCH, MUCH faster. Seriously, using vectors is just insanely fast. An operation of a few minutes is now a few seconds.
I'm left with figuring out how to stop struct2Table from outputing the result to the command window, as well as exploring more efficient ways of parsing the JSON.
Roger Pierson
Roger Pierson 2020 年 1 月 2 日
Breakthrough on the issue with struct2table outputing results to the command window, and taking a significant amount of time to do so. Like most problems in computing, this one came down to user error.
The output wasn't comming from struct2table at all. It was comming from the result of the very function I was running.
I was using the 'run' command in the editor with this command string:
trackTable=parseDi20_01('/MATLAB/InputFiles/TRACKS.json')
OF COURSE the command window received a bunch of data - from assigning the output of the function to varialble trackTable. There is no ; at the end!
In short - I didn't even think about needing a semi-colon at the end of the command string tucked away up in that little run button on the menu. Out of sight, out of mind I guess. Sure enough
trackTable=parseDi20_01('/MATLAB/InputFiles/TRACKS.json');
Results in no unintended output to the command window.
I just didn't realize what was going on because struct2table is litterally the last line in the function, so it took the blame.
D'oh.

サインインしてコメントする。

回答 (0 件)

カテゴリ

Help Center および File ExchangeString Parsing についてさらに検索

製品

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by