How to extract text from .json files and combine them?

32 ビュー (過去 30 日間)
Susan
Susan 2020 年 3 月 28 日
コメント済み: Ameer Hamza 2020 年 3 月 31 日
Hello everyone,
I've got some questions and any inputs would be greatly appreciated. I have bunch of .json files, say 1000. To read each files I run the following code
fname = 'C:\Users\...\d90f3c62681e.json';
val = jsondecode(fileread(fname));
the output is as follows. For each file the paper_id, the size of abstract, and the size of body_text changes. I am interested in the text data in the "abstract" and the "body text". How can I extract text file in the abstract and body_text, and combine all these .json files into one file?
val =
struct with fields:
paper_id: 'd90f3c62681e'
metadata: [1×1 struct]
abstract: [1×1 struct]
body_text: [4×1 struct]
bib_entries: [1×1 struct]
ref_entries: [1×1 struct]
back_matter: []
val.abstract =
struct with fields:
text: '300 words)
cite_spans: []
ref_spans: []
section: 'Abstract'
val.body_text =
4×1 struct array with fields:
text
cite_spans
ref_spans
section
  4 件のコメント
Walter Roberson
Walter Roberson 2020 年 3 月 28 日
Which release are you using? When I try in R2020a, I get
paper_id: '0a43046c154d0e521a6c425df215d90f3c62681e'
>> val.abstract
struct with fields:
text: '300 words) 33 Quantification of aerosolized influenza virus [and a bunch more]
Susan
Susan 2020 年 3 月 29 日
Hi Walter,
Thanks for your reply. I am using R2019a and get the same results as yours. My main question is considering some of this json files don't have any text for abstract, i.e., val.abstract = [], could you please tell me how I can put all the available val.abstract.text and val.body_text.text in 1 file? Do I need a for loop to go through all paper_id and extract text from each paper? If so, how?
Many thanks in advance!!

サインインしてコメントする。

採用された回答

Ameer Hamza
Ameer Hamza 2020 年 3 月 31 日
編集済み: Ameer Hamza 2020 年 3 月 31 日
As I answered in the comment on your other question, the following code will create a struct by combining the fields from individual files. It will then create a combined JSON file
files = dir('JSON files/*.json');
s = struct('abstract', [], 'body_text', []);
for i=1:numel(files)
filename = fullfile(files(i).folder, files(i).name);
data = jsondecode(fileread(filename));
if ~isempty(data.abstract)
s.abstract = [s.abstract; cell2struct({data.abstract.text}, 'text', 1)];
end
if ~isempty(data.body_text)
s.body_text = [s.body_text; cell2struct({data.body_text.text}, 'text', 1)];
end
end
str = jsonencode(s);
f = fopen('filename.json', 'w');
fprintf(f, '%s', str);
fclose(f);
  2 件のコメント
Susan
Susan 2020 年 3 月 31 日
Thank you so much! Your answer completely solved my issue. Thanks again!
Ameer Hamza
Ameer Hamza 2020 年 3 月 31 日
Glad to be of help.

サインインしてコメントする。

その他の回答 (1 件)

Mohammad Sami
Mohammad Sami 2020 年 3 月 30 日
You can import your data into cell arrays
filelist = {};
vals = cell(length(filelist),1);
haveabstract = false(length(filelist),1);
havebody = false(length(filelist),1);
data = cell(length(filelist),3);
% first col paper_id, second_col abstract, third col body
for i=1:length(filelist)
vals{i} = jsondecode(fileread(filelist{i}));
haveabstract(i) = ~isempty(vals{i}.abstract);
havebody(i) = ~isempty(vals{i}.body_text);
data{i,1} = vals{i}.paper_id;
if haveabstract(i)
data{i,2} = vals{i}.abstract;
end
if havebody(i)
data{i,3} = vals{i}.body_text
end
end
  1 件のコメント
Susan
Susan 2020 年 3 月 30 日
Thank you so much!

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeJSON Format についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by