input shape to the LSTM net when doing inference for VAD tasks

1 回表示 (過去 30 日間)
YUKAI SHEN
YUKAI SHEN 2023 年 3 月 7 日
回答済み: Brian Hemmat 2023 年 3 月 7 日
Hi, I am following this article to train a LSTM network for VAD tasks: https://www.mathworks.com/help/deeplearning/ug/voice-activity-detection-in-noise-using-deep-learning.html
My question is, when testing a trained LSTM network, as in the article did, the input data is not shaped as the training input as (#frames, #time_steps, #features), does this mean, when doing inference, the trained LSTM network will take each frame as a input independetly, and classify if this frame is noise or voice, so basically there is no hidden states used when doing inference, am I right?
Thank you in advance!

採用された回答

Brian Hemmat
Brian Hemmat 2023 年 3 月 7 日
I did not look at the dimensions you're discussing, but I can say that you are correct that the "streaming" code in the example classifies chunks independently. Note that it is calling classify and not classifyAndUpdateState.
Stay tuned for the R2023a release, where we have updated the example to maintain state (should be coming in the next few weeks).

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeSequence and Numeric Feature Data Workflows についてさらに検索

製品


リリース

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by