Removing commas between columns in text data

Question

0 投票

I have a txt file which is the ouput of a lemmatizer, in the form

Sometimes, ,, I, use, commas, .
I, like, writing, ,, I, like, reading

How can I read it into a tokenizedDocument deleting the unneccessary commas between tokens? A simple approach would be

test=readlines('/path/to/file.txt')
test=strrep(test,',','')
test=tokenizedDocument(test)

but it would remove even the commas already present in the original text, while I'd like to preserve punctuation-

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Follow Question

Answer 1

Walter Roberson 2021 年 10 月 16 日

MATLAB Online で開く

2 投票

test = {'Sometimes, ,, I, use, commas, .'
    'I, like, writing, ,, I, like, reading'};
test = regexprep(test, {'(?<=[^,]),\s', '\s*,,', '\s+\.'}, {' ', ',', '.'})
test = 2×1 cell array
    {'Sometimes, I use commas.'      }
    {'I like writing, I like reading'}

Notice we had to have a special rule for periods. You have 'use, commas' which should almost certainly translate to 'use commas' (so comma space becomes space), but after that 'commas, .' should not become 'commas .' .

To put it another way, we cannot use the rule that comma space pair is to be deleted: that works for the comma space between the word 'commas' and the period, but it does not work for the comma space pair between 'use' and 'commas': if you tried to apply that rule then 'use, commas' would merge together to 'usecommas' .

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

Kim Maria Damiani 2021 年 10 月 16 日

Thank you!

サインインしてコメントする。

Answer 2

Chunru 2021 年 10 月 16 日

MATLAB Online で開く

0 投票

test = {'Sometimes, ,, I, use, commas, .'
    'I, like, writing, ,, I, like, reading'};
test = regexprep(test, ',\s', ' ')
test = 2×1 cell array
    {'Sometimes , I use commas .'     }
    {'I like writing , I like reading'}

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

Removing commas between columns in text data

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

採用された回答

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

その他の回答 (1 件)

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

カテゴリ

製品

リリース

タグ

Community Treasure Hunt

Removing commas between columns in text data

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

採用された回答

1 件のコメント -1 件の古いコメントを表示 -1 件の古いコメントを非表示

その他の回答 (1 件)

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

カテゴリ

製品

リリース

タグ

参考

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示