Let me explain in brief.
I have generated the code for deep neural network for regression purpose using numerical data to predict the formation of clusters.
when I run the code, for four hidden layers i can get the lowest value of mean square error as compared to 2 hidden layers,3 hidden layers,5 hidden layers, and 6 hidden layers.
So,I can say four hidden layers are optimal in my case.
But I would like to know is there any other reason other the mean square error to justify why four hidden layers are optimal.
Also let me know, for an image based on pixel, I can find low level features, high level features and so on.
But for numerical data what represent low level and high level features.
Could anyone please clarify me.