Should I use a sequence input layer or an image input layer for a combined CNN/LSTM neural network?

Question

Jade 2024 年 11 月 6 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2164400-should-i-use-a-sequence-input-layer-or-an-image-input-layer-for-a-combined-cnn-lstm-neural-network

回答済み: Adarsh 2025 年 3 月 26 日

I am attempting to use a CNN/LSTM to take in a series of frames from a video of two liquids mixing together to predict their viscosities.

My initial layout is shown in the attached image and I planned on seperating a cell array of frames into stacks of sequences to use as inputs.

I was told that this would not work and an alternative approach is to use 2D or 3D (not sure which) image input layers and then use time as a seperate input for the LSTM portion. I'm not sure I understand what this means or why my approach was said to be wrong.

Which, if any, approach is best? Also, if neither of them are, is there a better method?

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

Matt J 2024 年 11 月 7 日

OK, well it doesn't look like network analyzer is showing any errors. Is there something that's not working?

Jade 2024 年 11 月 7 日

Matt,

It seems to run with 1 video so far, and I'm in the process of scaling it up now. Training loss returned NaN at first, but adjusting the learning rate seems to have solved that issue.

Just wanted to make sure this was the correct approach to the problem before going too far in the wrong direction. There's a lot of different network structures and I'm still learning. Really appreciate the help!

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Adarsh 2025 年 3 月 26 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2164400-should-i-use-a-sequence-input-layer-or-an-image-input-layer-for-a-combined-cnn-lstm-neural-network#answer_1562560

Hi @Jade,

I understand that you are trying to train a Deep Learning model to predict the viscosities of liquids from a series of frames from a video of two liquids mixing.

To achieve this, it is essential to extract the relationship between both spatial and temporal features.

This can be implemented by either of the two approaches

3D CNN
CNN + LSTM

In a 3D CNN, features are extracted from series of frames using higher dimensional kernel. This captures the integrated Spatio-temporal features resulting in simpler architecture.

On the contrary, this can be computationally expensive due to high dimensionality of video data and may require a larger dataset for efficient extraction of features. Sometimes the 3D CNN may fail to capture long term temporal dependencies which may be necessary in certain cases.

On the other hand, CNN + LSTM can efficiently capture long term temporal dependencies from the extracted spatial features and may require less data due to dimensionality reduction.

The exact answer to “which model would be better” will depend on the task, dataset size and other factors.

If you have a smaller dataset and the task requires to take account of long-term temporal features, then CNN + LSTM might be the better choice.

Experimenting with both the models on your dataset will guide the best choice.

I hope this helps.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Should I use a sequence input layer or an image input layer for a combined CNN/LSTM neural network?

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

回答 (1 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Should I use a sequence input layer or an image input layer for a combined CNN/LSTM neural network?

4 件のコメント 2 件の古いコメントを表示2 件の古いコメントを非表示

回答 (1 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示