textAnalytics toolbox: removing Entity details from documents

Question

0 投票

I have a very large set of documents that I am preprocessing to use in a bert classification model.

I have tokenized the documents and added the entity details.

Now I want to remove all of the tokenswith in the documents that have been "tagged as" orginisation.

I have the following variables:

documents: tokenized documents

tdetails: a table of tokens with the document number, sentence number, line number, Type, Language, PartOfSpeech and Entity.

Token

"Astoria" 1 2 3 'letters' 'en' 'proper-noun' 'person'

"Federal Savings Bank" 1 2 3 'other' 'en' 'proper-noun' 'organization'

"settled" 1 2 3 'letters' 'en' 'verb' 'non-entity'

How do I remove all of the tokens in the variable documents based on the entity=organisation

eg in documents(1,1).Vocabulary(7) I can find "Federal Savings Bank" which is in row 7 of the example above. I coudl loop through all of the documents and tdetails==organisation but that woudl take quite while

cant seem to figure out how to do this more simply

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Follow Question

Answer 1

Cris LaPierre 2023 年 11 月 18 日

MATLAB Online で開く

2 投票

I would use removeWords.

documents = tokenizedDocument(Text(:));
tdetails = tokenDetails(documents) ;
documents2 = removeWords(documents,tdetails{tdetails.Entity=="organisation"}); 

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

david cowan 2023 年 11 月 19 日

移動済み: Cris LaPierre 2023 年 11 月 19 日

Really appreciate that.

removeWords !!

I'll not forget that now - I knew there had to be a simple approach I was just missing

サインインしてコメントする。

textAnalytics toolbox: removing Entity details from documents

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

採用された回答

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

その他の回答 (0 件)

カテゴリ

製品

リリース

タグ

Community Treasure Hunt

textAnalytics toolbox: removing Entity details from documents

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

採用された回答

1 件のコメント -1 件の古いコメントを表示 -1 件の古いコメントを非表示

その他の回答 (0 件)

カテゴリ

製品

リリース

タグ

参考

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示