- Use context around the OOV word. You can use the word embedding of the previous and next word to your current OOV word.
- Use synonyms or similar word to get the word embedding for your OOV word.
Handling out-of-vocabulary word in word embedding
2 ビュー (過去 30 日間)
古いコメントを表示
I'm using FastText and my own word embedding on a set of documents. It is being used to detect abbreviations (Y/N) for each word token.
When testing, words that does not have vectors (out-of-vocabulary - OOV words), and discarded and not included in the performance measures (precision, recall, etc.) giving a false result. How do you handle this?
Would you replace all words with NaN values be included in the performance measure? Can the NaN values be replaced with a vector? How would you decide which vector?
0 件のコメント
回答 (1 件)
Prince Kumar
2021 年 8 月 16 日
From my understanding your want to handle OOV(out-of-vocabulary) words for your abbreviations detection task. For now MATLAB fastTextWordEmbedding does not handle OOV words.
There are many ways to do it, following are the two popular ones:
参考
カテゴリ
Help Center および File Exchange で Get Started with MATLAB についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!