step 1: analyze the signal and see if you can determine what frequencies are to be kept and what frequencies are to be discarded. FFTs as well as possibly short-time fourier transforms will be helpful here. You will also need some a priori knowledge of the problem domain (i.e. what frequencies are typically contained in human speech). Your instructor has been kind enough to give you versions with and without the noise, so comparison between the two should show you the frequency cutoff.
step 2: design a filter. start with a first order butterworth lowpass filter, with a cutoff frequency that is between the frequency you think should be kept and the frequency you think should be discarded.
step 3: analyze the filtered signal the same way you analyzed the original signal to see what changed. Also play it back to hear the difference.
step 4: try some other types of filters to see if you can get something better than a 1st order butterworth. e.g. try higher orders, different cutoff frequencies, and different filter types.