Reading Parquet Key/Value Metadata

23 ビュー (過去 30 日間)
Matthew Clause
Matthew Clause 2021 年 9 月 22 日
Is there a recommended method for reading metadata key/value pairs in the footer for a parquet file?
I'm using MATLAB to extract JSON metadata added from python's pyarrow. The footer layout is documented here https://github.com/apache/parquet-format#file-format. I have the following code which works, but would prefer a more robust solution in the future.
function metadata = parquetmeta(filename)
fid = fopen(filename);
fseek(fid, -8, 'eof');
footer_bytes = fread(fid, 1, 'uint32', 'ieee-le');
fseek(fid, -footer_bytes, 'eof');
footer = fread(fid, [1, footer_bytes], '*char');
fclose(fid);
start_idx = find(footer == '{', 1, 'first');
end_idx = find(footer == '}', 1, 'last');
metadata = struct;
if ~isempty(start_idx) && ~isempty(end_idx)
try %#ok<TRYNC>
metadata = jsondecode(footer(start_idx, end_idx));
end
end
end
I'm hoping MathWorks will add support for reading Parquet file metadata in a future release. Interestingly, it appears this use to be included as a feature before MATLAB 2019, but cannot verify on my version of MATLAB.
jobj = com.mathworks.bigdata.parquet.Reader;
md = jobj.getParquetFileReader.getFileMetaData.getKeyValueMetaData;
Thanks!

回答 (0 件)

カテゴリ

Help Center および File ExchangeLogical についてさらに検索

製品


リリース

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by