low weighted cross entropy values

I am building a network for semantic segmentation, with a weighted cross entropy loss. It seems possible to add weights related to my 8 classes ( inverse-frequency and normalized weights for each class) with the crossentropy() function. My issue is that the loss values that are calculated during training seem to be lower than what i should expect (values are between 0 and 1 but I would have expected them to be between 2-3).
My class weights vector is
norm_weights =[ 0.0011 0.4426 0.0023 0.0037 0.0212 0.0022 0.0065 1.0000]
And this is how I implement my loss function:
lossFcn = @(Y,T) crossentropy(Y,T,norm_weights,WeightsFormat="UC",...
NormalizationFactor="all-elements",ClassificationMode="multilabel")*n_class_labels;
[netTrained2, info] = trainnet(augmented_ds,net2,lossFcn,options);
If anyone would have a clue about the issue, that would be helpful!

3 件のコメント

Matt J
Matt J 2025 年 8 月 19 日
values are between 0 and 1 but I would have expected them to be between 2-3.
Why?
Ève
Ève 2025 年 8 月 19 日
I am reproducing a network from a research paper. My network architecture & training options are the same. My data is also from the same database. In their loss graphs, the initial loss values during training are between 2 and 3 so I assumed that this should also be the case for my network. When I use the crossentropy function without weights, such as:
[netTrained1, info] = trainnet(augmented_ds,net1,'crossentropy',options);
I do get higher loss values than when I 'personnalize' my crossentropy loss function so that it has weights
Ève
Ève 2025 年 8 月 19 日
I reproduced the methodology from this research article as closely as I could, including how they format their network input and such. I am questionning wether there is a problem with my loss function or not because the loss values that I obtain are actually very small. I said that they were between 0 and 1, and I should have specified that they actually currently gravitate around 0.026502. I know that the goal is for the loss to tend towards zero, but my network isn't trained (I reproduced a SegNet architecture), my training accuracy is around 20%, so the loss values seem very low to me.

サインインしてコメントする。

 採用された回答

Matt J
Matt J 2025 年 8 月 19 日

0 投票

There are a few possible reasons for the discrepancy that I can think of,
(1) Your norm_weights do not add up to 1
(2) You have selected the NormalizationFactor="all-elements" in crossentropy(). According to the doc, though, trainnet does not normalize with all elements. It ignores the channel dimensions
(3) Other hidden normalization factors that may be buried in the blackbox that is trainnet(). I don't know if it is possible or worthwhile trying to dig them out.

7 件のコメント

Ève
Ève 2025 年 8 月 19 日
編集済み: Ève 2025 年 8 月 19 日
For (1), I didn't think it was necessary to make sure they add up to 1. It seems like in various examples (https://www.mathworks.com/help/vision/ug/semantic-segmentation-using-deep-learning.html ,
they don't, from my comprehension
For (2), I get what you are saying, but I think that's is if we only specify 'crossentropy' as the function in trainnet. The way I did it, with crossentropy(), there seem to be multiple normalization options, from my understanding. I tried to train it with no NormalizationFactor and my loss values are now in the hundreds. so that seems odd once again. I'm trying to familiarize myself with the equations (algorithms) provided by the doc.
I also removed ClassificationMode="multilabel" because even tho I thought semantic segmentation was multilabel classification, it seems like this input argument isn't specified in none of the semantic segmentation MATLAB examples I see.
Matt J
Matt J 2025 年 8 月 20 日
編集済み: Matt J 2025 年 8 月 20 日
For (1), I didn't think it was necessary to make sure they add up to 1.
I don't say that it is necessarry, but it should affect the scale of the loss function.
For (2), I get what you are saying, but I think that's is if we only specify 'crossentropy' as the function in trainnet.
What do you mean "only"? What other usage of trainnet() are we comparing with?
Ève
Ève 2025 年 8 月 20 日
編集済み: Ève 2025 年 8 月 20 日
That's a good point (for (1))! I'll check that. Thanks for clarifying, the weights are indeed a multiplying factor in the loss.
For (2), what I meant was that from my understanding, if you only specify your loss function this way,
[netTrained1, info] = trainnet(augmented_ds,net1,'crossentropy',options);
the normalization will be 'normalized by dividing by the number of non-channel elements of the network output', as the doc says. But if you rather implement it using additionnal options from the crossentropy function, such as
[netTrained2, info] = trainnet(augmented_ds,net2,lossFcn = @(Y,T) crossentropy(Y,T,norm_weights,WeightsFormat="UC",...
NormalizationFactor="all-elements"),options);
you have to choose between 4 NormalizationFactor (between batch-size, all-elements, mask included, none), which are other ways to normalize the loss. Am I getting this right?
Matt J
Matt J 2025 年 8 月 20 日
you have to choose between 4 NormalizationFactor (between batch-size, all-elements, mask included, none), which are other ways to normalize the loss. Am I getting this right?
Yes, but that was my entire point in (2). You cannot expect agreement between trainnet and your personalized loss function, because their normalization strategies do not seem to coincide.
Matt J
Matt J 2025 年 8 月 20 日
編集済み: Matt J 2025 年 8 月 20 日
If trainnet is successfully training your network with lossFcn='crossentropy', but with larger (by a factor of approximately K) loss values, then with your personalized loss function, you could try increasing your learning rates and decreasing your regularization weights by K. Or, just scale your custom lossFcn by K.
The point is, if the loss function computation only differs in each case by a global scale factor, it shouldn't impact training so much as long as the learning rates and regularization weights are kept in the same general ratio with the loss function values.
Ève
Ève 2025 年 8 月 21 日
I understand, I'll try your suggestions. Thanks a lot for the feedback, I really appreciate it.
Ève
Ève 2025 年 8 月 21 日
I'll accept your answer because as you suggested, my loss values are low simply because they reflect the scale of my weights, which are for the most part very small values. I may revise the way I calculate them. I'll also add for anyone reading this that I was wrong about the ClassificationMode in my lossFcn; for my type of classification problem, it should be set to "single-label" (default). I left the rest of the function the same.

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

質問済み:

2025 年 8 月 19 日

コメント済み:

2025 年 8 月 21 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by