how to extract a list of unique words from a set of one row strings

Question

Harrison 2024 年 11 月 14 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2166149-how-to-extract-a-list-of-unique-words-from-a-set-of-one-row-strings

コメント済み: Harrison 2024 年 11 月 15 日

Basically I have a set of 11 strings of words, and each string has no repeating words, but I need a list of every unique word in all 11 strings.

I've found that this works for one string at a time, but I can't get a list for all 11 strings this way.

A{1} = updatedDocuments(1,1)

B{1} = strjoin(unique(strtrim(strsplit(A{1}, ',')))', '')

Is it possible to index A{1} as updatedDocuments(1:11,1) or do something similar?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Madheswaran 2024 年 11 月 14 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2166149-how-to-extract-a-list-of-unique-words-from-a-set-of-one-row-strings#answer_1545194

編集済み: Madheswaran 2024 年 11 月 15 日

MATLAB Online で開く

Hi @Harrison,

I am assuming the following:

'updatedDocuments' is an array of 'tokenizedDocument'
Each document contains text that is comma seperated and doesn't end with a comma

To get the unique words from the entire set of strings, you can follow the below approach:

% remove comma from the documents if you don't want comma to be 
% included in 'uniqeWords'
updatedDocuments = removeWords(updatedDocuments, ","); 
uniqueWords = updatedDocuments.Vocabulary;

If the 'updatedDocuments' is an cell array of char vector, you can follow the below approach:

updatedDocuments = strcat(updatedDocuments, ','); % Add comma at end of each cell
allWords = strjoin(updatedDocuments(1:11,1), ' '); % Join all words into a single string
allWords = strtrim(strsplit(allWords, ',')); % Split with comma as delimiter and trim
uniqueWords = unique(allWords); % unique words (1 x n cell where n is the number of unique words)

For more information, refer to the following documentations:

Hope this helps!

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

Madheswaran 2024 年 11 月 15 日

That is because I assumed 'updatedDocument' to be a cell array of character vectors. If 'updatedDocument' were an array of 'tokenizedDocument', resolving this issue would be straightforward. I have updated the answer by including a solution for when 'updatedDocument' is a 'tokenizedDocument', in addition to the existing explanation.

Let me know if that helps!

Harrison 2024 年 11 月 15 日

Thats exactly right! Thank you!!

サインインしてコメントする。

Answer 2

Paul 2024 年 11 月 14 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2166149-how-to-extract-a-list-of-unique-words-from-a-set-of-one-row-strings#answer_1544974

MATLAB Online で開く

If UpdatedDocuments is a 1D cell array of chars ...

UpdatedDocuments{1} = 'one,two,three,one';
UpdatedDocuments{2} = 'one,two,three,two';
UpdatedDocuments{3} = 'one,two,three,three';
result = cellfun(@(S) strjoin(unique(strtrim(strsplit(S, ','))),','),UpdatedDocuments,'Uni',false)
result = 1x3 cell array
    {'one,three,two'}    {'one,three,two'}    {'one,three,two'}

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Paul 2024 年 11 月 15 日

MATLAB Online で開く

The Vocabulary property of tokenizedDocument returns the uniqew words in the array

documents = tokenizedDocument([
    "an example of a short sentence  an example of a short sentence " 
    "a second short sentence a second short sentence"]);
documents
documents = 
  2x1 tokenizedDocument:

    12 tokens: an example of a short sentence an example of a short sentence
     8 tokens: a second short sentence a second short sentence
documents.Vocabulary
ans = 1x7 string array
    "an"    "example"    "of"    "a"    "short"    "sentence"    "second"

サインインしてコメントする。

how to extract a list of unique words from a set of one row strings

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

その他の回答 (1 件)

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

how to extract a list of unique words from a set of one row strings

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

3 件のコメント 1 件の古いコメントを表示1 件の古いコメントを非表示

その他の回答 (1 件)

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示