Creating the matrix of GloVe embedded vocabulary
1 回表示 (過去 30 日間)
古いコメントを表示
Per the documentation, the file contains 400k vocabulary words, each of which is represented as a 300d vector.
I want, then, to create a matrix in Matlab, 400k X 300 that lists all the 400k embedded vectors of the vocabulary. I do not need to save the text-word equivalent of each vector.
What might be the simplest Matlab code to create such matrix from glove.6B.zip ?
Thanks for your anticipated help!
0 件のコメント
採用された回答
Shantanu Dixit
2025 年 4 月 30 日
編集済み: Shantanu Dixit
2025 年 4 月 30 日
Hi Amos,
You can create an embedding matrix for the 'GLoVE' embeddings by initializing a matrix of size 400K × 300 initialized with 'zeros': https://www.mathworks.com/help/matlab/ref/zeros.html Corresponsingly each line can be read and stored (only the numeric part) in the matrix, discarding the word. As the file is in the text format, for storing the word vectors 'str2double':https://www.mathworks.com/help/matlab/ref/str2double.html can be used to convert the text to numbers. Each line in the file looks like this:
the 0.04656 0.21318 -0.0074364 -0.45854 ...
Overall after reading each line the corresponding vector can be stored as follows:
fid = fopen('glove.6B.300d.txt', 'r');
embeddingMatrix = zeros(400000, 300);
for i = 1:400000
line = fgetl(fid);
tokens = strsplit(line);
embeddingMatrix(i, :) = str2double(tokens(2:end));
end
fclose(fid);
You can also refer to following other useful documentation pages by MathWorks:
Hope this helps!
その他の回答 (0 件)
参考
カテゴリ
Help Center および File Exchange で Introduction to Installation and Licensing についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!