Processing wav files in MATLAB: wavread(), downsampling and signed vs unsigned PCM

Question

Hassan Iqbal 2016 年 4 月 28 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/281556-processing-wav-files-in-matlab-wavread-downsampling-and-signed-vs-unsigned-pcm

コメント済み: Walter Roberson 2016 年 5 月 6 日

I have used wavread and wavwrite functions to work with wav files. I have some questions about it.

1) Why does reading a wav file generate floating point values for the samples, certainly actual samples are stored as 8 or 16 bit PCM are they not?

2) why does the wav file header contain a mix of little and big endian format data? certainly the data must in a single format either only little endian or only big endian.

3) Why does MATLAB ignore almost everything in the header from the wav file and only stores sampling rate and actuarial samples into memory variables?

4) How does one carry out upsampling or downsampling of an audio file?

5) What is the difference between storing audio as signed or unsigned samples in a wav file i.e can the same sound be expressed using both formats?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Walter Roberson 2016 年 4 月 28 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/281556-processing-wav-files-in-matlab-wavread-downsampling-and-signed-vs-unsigned-pcm#answer_219909

1) For convenience, so that people do not need to know the internal format. You can add the 'native' option if you want the data type stored in the file.

2) For the "why" of WAV format you would need to ask Microsoft and IBM, which created the format together in 1991. Certainly you do not get to impose your notions of what the format "must" be. The "why" of WAV format is not a question about MATLAB.

Scanning some documentation it appears to me that all of the fields that are formally defined as binary fields are in little-ending format, but that all of the fields that are formally defined as being character strings have consecutive bytes; if you were to read those as if they were a 32 bit integer then it would look like they were big-endian, but that would be due to a misunderstanding of their fundamental type.

3) Those are all that people typically need. If you want to extract additional information, see http://www.mathworks.com/help/matlab/ref/audioinfo.html

4) Read the audio into memory and do upsampling or downsampling. There a number of different ways to do that.

http://www.mathworks.com/help/signal/ref/upsample.html

http://www.mathworks.com/help/signal/ref/upfirdn.html

http://www.mathworks.com/help/signal/ug/filtering-after-upsampling-interpolation.html

http://www.mathworks.com/help/signal/ref/interp.html

and probably others.

5) Some audio container formats and some audio representation formats only support signed or unsigned; data will be automatically converted if necessary. Beyond that it is a matter of convenience and interpretation.

Traditionally, DAQs only returned unsigned numbers that were to be interpreted with a "bias" when the DAQ was configured for reading both positive and negative; the bias was not always at the half-way point since there are applications where the permitted negative range is only a portion as large as the permitted positive range (e.g., -2V to +9V). Thus internal representation might be in unsigned format, but it is more convenient to do computations on signed format, and more convenient to write out in signed format, letting the interface software make any necessary adjustments. For efficiency this led to situations where unsigned format data was collected in real time, processed by a program, and then since the program had more time available, written out in the signed format that corresponds more directly to how humans think about the data.

2 件のコメント
なしを表示なしを非表示

Hassan Iqbal 2016 年 5 月 6 日

This answer has been very helpful. I am trying to do FFT of the audio files. I have seen some examples of usage of FFT and done is myselt too. It seems that using FFT function in MATLAB does not return amplitude-frequency plot. It returns something else which needs manipulation to obtain amplitude-frequency curve.

Can you tell me what do we get when we run just the MATLAB FFT function on a chunk of data of N samples? What does the term "bin" refer to with regards to FFT?

Do I need to use a "window function" if I am doing FFT on the whole audio file sample data? I think not.

Walter Roberson 2016 年 5 月 6 日

In the amplitude domain, the data progresses by time. The first entry in the amplitude array corresponds to the first time, the second entry corresponds to the second time, and so on. The N'th entry is stored at array index N. Given the sampling frequency, FS, in Hz, you can calculate the time between adjacent samples, delta_time = 1/FS. You can then map between an array index and a corresponding time as t = (N-1) * delta_time . This array index, N, can be thought of as the bin number in which the N'th amplitude sample is stored, and associated with time (N-1)*delta_time. The individual amplitude storage locations can be referred to as "bins". The N'th bin might store the integral of the energy received between time (N-1)*delta_time to N*delta_time (as would be the case for pictures) or it might just store one of the energy values passed through during that time.

When you change to the frequency domain using fft, you get out an array. The array can be referred to as a series of bins. The bins (array elements) are associated with frequencies rather than with times. When you know the delta_frequency then for the first half of the array the N'th bin (N'th array element) is associated with frequency (N-1)*delta_frequency, and represents the integral of the contributions between frequency (N-1)*delta_frequency to N*delta_frequency. The first bin, array index 1, is thus frequency 0, which is the "constant" frequency, and physically it corresponds to the "constant bias", which in turn can be calculated as the mean of the amplitudes. [For real-valued signals, the second half of the array is the reverse order of the complex conjugate of the first half, conj(fliplr(X(1:end/2)) ]

The array index of the fft output is not the frequency itself just as in the time domain the array index was not the time itself. The fft routine does not need to know what the sampling frequency is in order to work: it just needs position number and signal length (in number of elements) and you can then put an interpretation on the array indices (bin numbers) afterwards.

One small thing though: it is common to talk about the array location for the constant frequency, the first array location, index 0, as being bin 0, so when you hear about bin numbers, they often correspond to MATLAB index numbers one higher.

The delta_frequency calculation depends upon the kind of data being transformed. fft is often used with angular frequency, so the conversion factor for that is a factor of 2*Pi different than would be used for sound. I do not recall at the moment what the conversion formulas are, but you can have a look at http://www.mathworks.com/help/matlab/ref/fft.html#buuutyt-9 to see how they calculate the frequency labels.

サインインしてコメントする。