I am working on classifying different tasks happening in a construction site from the ambient sounds. I have extracted some time-domain, frequency-domain, and cepstral-domain (e.g., MFCC, PLPCC) features from the audio. MFCC, and PLPCC are working great till now. Now I would like to try out some features from wavelet transform. I have never used wavelets before, and finding it a little hard to decide on which wavelet family and deconstruction level to use. Another thing is, how the final feature space is designed based on wavelet transforms. I mean, what type of time varying features are important for details and approximations? Any expert opinion will be highly appriciated.
I am attaching a sample 20 second audio waveform and spectrogram for reference if it helps.