Creating the matrix of GloVe embedded vocabulary

1 回表示 (過去 30 日間)
Amos
Amos 2025 年 4 月 28 日
コメント済み: Amos 2025 年 4 月 30 日
I downloaded glove.6B.zip
Per the documentation, the file contains 400k vocabulary words, each of which is represented as a 300d vector.
I want, then, to create a matrix in Matlab, 400k X 300 that lists all the 400k embedded vectors of the vocabulary. I do not need to save the text-word equivalent of each vector.
What might be the simplest Matlab code to create such matrix from glove.6B.zip ?
Thanks for your anticipated help!

採用された回答

Shantanu Dixit
Shantanu Dixit 2025 年 4 月 30 日
編集済み: Shantanu Dixit 2025 年 4 月 30 日
Hi Amos,
You can create an embedding matrix for the 'GLoVE' embeddings by initializing a matrix of size 400K × 300 initialized with 'zeros': https://www.mathworks.com/help/matlab/ref/zeros.html Corresponsingly each line can be read and stored (only the numeric part) in the matrix, discarding the word. As the file is in the text format, for storing the word vectors 'str2double':https://www.mathworks.com/help/matlab/ref/str2double.html can be used to convert the text to numbers. Each line in the file looks like this:
the 0.04656 0.21318 -0.0074364 -0.45854 ...
Overall after reading each line the corresponding vector can be stored as follows:
fid = fopen('glove.6B.300d.txt', 'r');
embeddingMatrix = zeros(400000, 300);
for i = 1:400000
line = fgetl(fid);
tokens = strsplit(line);
embeddingMatrix(i, :) = str2double(tokens(2:end));
end
fclose(fid);
You can also refer to following other useful documentation pages by MathWorks:
Hope this helps!
  1 件のコメント
Amos
Amos 2025 年 4 月 30 日
Thanks much! I used it and it worked perfectly!!

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeIntroduction to Installation and Licensing についてさらに検索

製品


リリース

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by