Text Analytics Toolbox seems making lots of mistakes on recognizing language and PartOfSpeech

2 ビュー (過去 30 日間)
Hi,
My input is a list of VERY BASIC ENGLISH words shown below. I would like to find out the part of speech of them.
kid
killer
kind
king
kiss
kitchen
knee
knife
knowledge
words = {'kid','killer','kind','king','kiss','kitchen','knee','knife','knowledge'};
words = string(words);
documents = tokenizedDocument(words);
documents = addPartOfSpeechDetails(documents);
tdetails = tokenDetails(documents);
And this is where the mistakes are when I check the 'tdetails' (see below).
Why Matlab thinks these words are german (should be 'en' for 'english') and adjectives (most of them should be nouns)?
tdetails =
9×7 table
Token DocumentNumber SentenceNumber LineNumber Type Language PartOfSpeech
___________ ______________ ______________ __________ _______ ________ ____________
"kid" 1 1 1 letters de adjective
"killer" 2 1 1 letters de adjective
"kind" 3 1 1 letters de adjective
"king" 4 1 1 letters de adjective
"kiss" 5 1 1 letters de adjective
"kitchen" 6 1 1 letters de adjective
"knee" 7 1 1 letters de adjective
"knife" 8 1 1 letters de adjective
"knowledge" 9 1 1 letters de adjective

回答 (1 件)

Christopher Creutzig
Christopher Creutzig 2020 年 3 月 9 日
Language detection also works very much better on longer text. It is not trying to do a dictionary lookup (and several of your words are valid German, anyway), it uses statistical information of letter distribution.
Part of speech detection relies heavily on the context in a sentence.
documents = tokenizedDocument("My kid is a king");
documents = addPartOfSpeechDetails(documents);
tokenDetails(documents)
ans =
5×7 table
Token DocumentNumber SentenceNumber LineNumber Type Language PartOfSpeech
______ ______________ ______________ __________ _______ ________ ______________
"My" 1 1 1 letters en pronoun
"kid" 1 1 1 letters en noun
"is" 1 1 1 letters en auxiliary-verb
"a" 1 1 1 letters en determiner
"king" 1 1 1 letters en noun

カテゴリ

Help Center および File ExchangeString Parsing についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by