Matrix index is out of range for deletion

Question

oliver 2023 年 4 月 10 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1944759-matrix-index-is-out-of-range-for-deletion

コメント済み: Walter Roberson 2023 年 4 月 10 日

採用された回答: Walter Roberson

IMBD_reviews_smol.csv

MATLAB Online で開く

my project is sentiment analysis I am trying to follow the tutorial "Create Simple Text Model for Classification"

my database is a list of reviews with labelled sentiment (either 'positive' or 'negative)

I am trying to remove any documents containing no words from the bag-of-words model, and remove the corresponding entries in labels

my code is:

filename = "IMBD_reviews_smol.csv"; 
data = readtable(filename,'TextType','string');
data.sentiment = categorical(data.sentiment);
cvp = cvpartition(data.sentiment,'Holdout',0.1);
dataTrain = data(cvp.training,:);
dataTest = data(cvp.test,:);
 
textDataTrain = dataTrain.review;
textDataTest = dataTest.review;
YTrain = dataTrain.sentiment;
YTest = dataTest.sentiment;
documents = preprocessText(textDataTrain);
bag = bagOfWords(documents);
bag = removeInfrequentWords(bag,2);
[bag,idx] = removeEmptyDocuments(bag);
Ytrain(idx) = []; %produces an error 
Deletion requires an existing variable.
Xtrain = bag.Counts;
mdl = fitcecoc(Xtrain,YTrain,"Learners","linear");
function documents = preprocessText(textData)
documents = tokenizedDocument(textData);
documents = addPartOfSpeechDetails(documents);
documents = removeStopWords(documents);
documents = erasePunctuation(documents);
documents = removeShortWords(documents,2);
documents = removeShortWords(documents,15);
end

7 件のコメント
5 件の古いコメントを表示5 件の古いコメントを非表示

oliver 2023 年 4 月 10 日

MATLAB Online で開く

with the code i recieve the error message "Error using classreg.learning.classif.FullClassificationModel.prepareData

No class names are found in input labels." about line 25 "mdl = fitcecoc(Xtrain,YTrain,"Learners","linear");"

filename = "IMBD_reviews_smol.csv"; 
data = readtable(filename,'TextType','string');
data.sentiment = categorical(data.sentiment);
cvp = cvpartition(data.sentiment,'Holdout',0.1);
dataTrain = data(cvp.training,:);
dataTest = data(cvp.test,:);
textDataTrain = dataTrain.review;
textDataTest = dataTest.review;
YTrain = dataTrain.sentiment;
YTest = dataTest.sentiment;
documents = preprocessText(textDataTrain);
bag = bagOfWords(documents);
bag = removeInfrequentWords(bag,2);
[bag,idx] = removeEmptyDocuments(bag);
YTrain = [];
XTrain = bag.Counts;
mdl = fitcecoc(Xtrain,YTrain,"Learners","linear");
documentsTest = preprocessText(textDataTest);
XTest = encode(bag,documentsTest);
YPred = predict(mdl,XTest);
acc = sum(YPred == YTest)/numel(YTest);
str = [
    "i hated this movie."
    "this was really good"
    "sometimes slow movies work out in the way you want and thats how this movie went"];
documentsNew = preprocessText(str);
XNew = encode(bag,documentsNew);
labelsNew = predict(mdl,XNew);
function documents = preprocessText(textData)
documents = tokenizedDocument(textData);
documents = addPartOfSpeechDetails(documents);
documents = removeStopWords(documents);
documents = erasePunctuation(documents);
documents = removeShortWords(documents,2);
documents = removeShortWords(documents,15);
end

Walter Roberson 2023 年 4 月 10 日

Yes, as I indicated, you are removing all documents from the bag, so your training information becomes empty.

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Walter Roberson 2023 年 4 月 10 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1944759-matrix-index-is-out-of-range-for-deletion#answer_1213124

移動済み: Walter Roberson 2023 年 4 月 10 日

MATLAB Online で開く

IMBD_reviews_smol.csv

filename = "IMBD_reviews_smol.csv"; 
data = readtable(filename,'TextType','string');
data.sentiment = categorical(data.sentiment);
cvp = cvpartition(data.sentiment,'Holdout',0.1);
dataTrain = data(cvp.training,:);
dataTest = data(cvp.test,:);
 
textDataTrain = dataTrain.review;
textDataTest = dataTest.review;
Ytrain = dataTrain.sentiment;
Ytest = dataTest.sentiment;
documents = preprocessText(textDataTrain);
bag = bagOfWords(documents);
bag = removeInfrequentWords(bag,2);
[bag,idx] = removeEmptyDocuments(bag);
whos Ytrain idx
  Name          Size             Bytes  Class          Attributes

  Ytrain      181x1                423  categorical              
  idx           1x181             1448  double                   
Ytrain(idx) = []; %produces an error 
Xtrain = bag.Counts;
whos
  Name                 Size              Bytes  Class                Attributes

  Xtrain               0x0                  24  double               sparse    
  Ytest               20x1                 262  categorical                    
  Ytrain               0x1                 242  categorical                    
  ans                  1x46                 92  char                           
  bag                  1x1                 640  bagOfWords                     
  cmdout               1x33                 66  char                           
  cvp                  1x1                3278  cvpartition                    
  data               201x2              543470  table                          
  dataTest            20x2               66077  table                          
  dataTrain          181x2              478944  table                          
  documents          181x1               43321  tokenizedDocument              
  filename             1x1                 178  string                         
  idx                  1x181              1448  double                         
  textDataTest        20x1               64602  string                         
  textDataTrain      181x1              477308  string                         
mdl = fitcecoc(Xtrain, Ytrain, "Learners", "linear");
Error using classreg.learning.classif.FullClassificationModel.prepareData
No class names are found in input labels.

Error in ClassificationECOC.prepareData (line 128)
                classreg.learning.classif.FullClassificationModel.prepareData(X,Y,varargin{:});

Error in classreg.learning.FitTemplate/fit (line 246)
                    this.PrepareData(X,Y,this.BaseFitObjectArgs{:});

Error in ClassificationECOC.fit (line 119)
            this = fit(temp,X,Y);

Error in fitcecoc (line 357)
    obj = ClassificationECOC.fit(X,Y,varargin{:});
function documents = preprocessText(textData)
documents = tokenizedDocument(textData);
documents = addPartOfSpeechDetails(documents);
documents = removeStopWords(documents);
documents = erasePunctuation(documents);
documents = removeShortWords(documents,2);
documents = removeShortWords(documents,15);
end

You are removing all of the documents. The bag is left empty.

2 件のコメント
なしを表示なしを非表示

oliver 2023 年 4 月 10 日

編集済み: Walter Roberson 2023 年 4 月 10 日

I am trying to follow this matlab link https://uk.mathworks.com/help/textanalytics/ug/create-simple-text-model-for-classification.html but using my own dataset. can you help with what i need to change?

Walter Roberson 2023 年 4 月 10 日

MATLAB Online で開く

IMBD_reviews_smol.csv

You were calling removeShortWords twice, so all words less than 15 characters were being removed. The remaining "words" all happened to be unique, so removing infrequent words resulted in an empty bag.

filename = "IMBD_reviews_smol.csv";

data = readtable(filename,'TextType','string');

data.sentiment = categorical(data.sentiment);

cvp = cvpartition(data.sentiment,'Holdout',0.1);

dataTrain = data(cvp.training,:);

dataTest = data(cvp.test,:);

textDataTrain = dataTrain.review;

textDataTest = dataTest.review;

Ytrain = dataTrain.sentiment;

Ytest = dataTest.sentiment;

documents = preprocessText(textDataTrain);

bag = bagOfWords(documents);

bag = removeInfrequentWords(bag,2);

[bag,idx] = removeEmptyDocuments(bag);

Ytrain(idx) = [];

Xtrain = bag.Counts;

mdl = fitcecoc(Xtrain, Ytrain, "Learners", "linear");

mdl

mdl =

CompactClassificationECOC ResponseName: 'Y' ClassNames: [negative positive] ScoreTransform: 'none' BinaryLearners: {[1×1 ClassificationLinear]} CodingMatrix: [2×1 double] Properties, Methods

function documents = preprocessText(textData)

documents = tokenizedDocument(textData);

documents = addPartOfSpeechDetails(documents);

documents = removeStopWords(documents);

documents = erasePunctuation(documents);

documents = removeShortWords(documents,2);

documents = removeLongWords(documents,15);

end

サインインしてコメントする。

Matrix index is out of range for deletion

7 件のコメント
5 件の古いコメントを表示5 件の古いコメントを非表示

採用された回答

2 件のコメント
なしを表示なしを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Matrix index is out of range for deletion

7 件のコメント 5 件の古いコメントを表示5 件の古いコメントを非表示

採用された回答

2 件のコメント なしを表示なしを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

7 件のコメント
5 件の古いコメントを表示5 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示