現在この質問をフォロー中です
- フォローしているコンテンツ フィードに更新が表示されます。
- コミュニケーション基本設定に応じて電子メールを受け取ることができます。
Audio compression using DCT - but i get same size of files after inverse DCT
4 ビュー (過去 30 日間)
古いコメントを表示
Mohamad
2018 年 5 月 4 日
Hi I have a file ( 1.wav) - I'm trying to compress the first two seconds for this audio by using Discrete cosine transform . I attached the code , but when i use the command ( whos ) for the original samples and reconstructed samples after inverse DCT i get the same size and number of bytes So any explanation , and how i get the compression ratio ?
採用された回答
Walter Roberson
2018 年 5 月 4 日
編集済み: Walter Roberson
2018 年 5 月 4 日
That is expected. You are writing out the re-expanded data as samples. There will be the same number of samples as before, so it is going to take the same output size (probably.)
See also my recent discussion at https://www.mathworks.com/matlabcentral/answers/398289-how-can-i-do-audio-compression-using-huffman-encoding#comment_563731 . For DCT you would not need to write out a dictionary, but you would not write out the coefficients you had zeroed out. You would, however, need to write out the original number of coefficients so when you read the values in, you knew how many zeros to pad with before reconstruction.
28 件のコメント
Mohamad
2018 年 5 月 4 日
編集済み: Mohamad
2018 年 5 月 4 日
Hi - I made DCT for the first 5 seconds of WAV file . I will use only DCT coefficients which contain 99.9 % of energy - set all remaining coefficients to zero . Now I need to Create a Huffman Code Dictionary for these DCT coefficients which have 99.9 % - so how to make this ? Do I need to make Quantization for these DCT coefficients ( i.e. to make symbols) for Huffman encoding ? How I make this ?
Walter Roberson
2018 年 5 月 4 日
The code in https://www.mathworks.com/matlabcentral/fileexchange/34958-jpeg-compression--dct- shows construction of dct coefficients as integer values. You would still set the extra coefficients to 0. And then you would use the set of integer values as the symbols while you follow the steps outlined in the post I linked to.
Mohamad
2018 年 5 月 4 日
編集済み: Walter Roberson
2018 年 5 月 4 日
Sorry for inconvenience - but the link : https://www.mathworks.com/matlabcentral/fileexchange/34958-jpeg-compression--dct- shows image compression and it uses a normalization matrix - so how i made this on audio file ( one column ) ?
How I construct a stream of 0's and 1's that encode the samples then using huffman encoding ?
Walter Roberson
2018 年 5 月 4 日
For samples to bits:
Use huffmandict() on the samples to build the encoding tables, and then use huffmanenco() to perform the encoding to a stream of 0 and 1 values.
Mohamad
2018 年 5 月 4 日
編集済み: Mohamad
2018 年 5 月 4 日
I'm trying this . But I get error : The Huffman dictionary provided does not have the codes for all the input signals. I made quantization for the DCT coefficients to be mapped to 32 different levels . I used hist to get the probability vector . I have size of DCT coefficients = 55125 x 1 So how to make the symbols for Huffman Dictionary ?
Walter Roberson
2018 年 5 月 4 日
[filename, pathname] = uigetfile('*.wav', 'pick a file');
if ~ischar(filename); error('no file chosen'); end
[x1,Fs] = audioread(filename);
samples = [1,min(5*Fs, length(x1))];
[x1,Fs] = audioread(filename,samples);
L1=length(x1)
X=dct(x1);
% Sort the coefficients from largest to smallest.
[XX,ind] = sort(abs(X),'descend');
need = 1;
while norm(X(ind(1:need)))/norm(X)<0.9999
need = need+1;
end
Coefficents_need=need
xpc = need/length(X)*100
% Set to zero the coefficients that contain the remaining 0.1% of the energy
X(ind(need+1:end)) = 0;
partition =linspace(min(X),max(X),32);
codebook = linspace(min(X)-1/32,max(X),33); % Length 33, one entry for each interval
[index,quantized] = quantiz(X,partition,codebook); % Quantize.
histogram(quantized,33,'Normalization','probability');
h2 = histc(index+1,1:length(codebook));
p = h2/length(X);
dict = huffmandict(codebook,p);
comp = huffmanenco(quantized,dict);
Mohamad
2018 年 5 月 4 日
I get Warning: Data clipped when writing file Also the compressed file 2_cc.wav which created from dsig = huffmandeco(A,dict); filename = '2_cc.wav'; audiowrite(filename,dsig,Fs); Has the same size of the original file . Also when I play it , it is very noisy
Walter Roberson
2018 年 5 月 4 日
What you get back from huffmandeco is not the original sounds. What you get back is the DCT coefficients. You need to do inverse DCT.
Mohamad
2018 年 5 月 5 日
I closed the binary file after writing the encoded stream . Then I read the binary file - then used huffmandeco - then used inverse DCT -then Audiowrite to make (.WAV) file . Sound reconstructed is ok . But again i see size of original file ( .WAV ) similar to the size of reconstructed audio ( .WAV ) and the size of the binary file is larger than both (.WAV ) files . So where is the compression ?
Walter Roberson
2018 年 5 月 5 日
With regards to the file size: you did not write using ubit1 like I said was needed.
With regards to the "Warning: Data clipped when writing file.":
Once you have quantized the DCT coefficients, if you were to then immediately idct() the quantized coefficients, without having removed any coefficients and without having gone through the huffman and file and huffman decode -- just straight dct, quantize, idct of quantized coefficients -- then it turns out that the range of reconstructed values is not -1 to +1 and instead can be like -2.7 to +3.7. This is a pure effect of quantization with dct, and you are going to need to account for it.
My tests show that the idct of the quantized value can be a factor of 10^4 or more higher than the original signal. The parts that seem to do especially poorly are the parts of the signal that have near silence: the reconstructed values can end up fairly large there (I do not know why that might be so.)
When you zero out the extra coefficients, then the reconstructed value can be about -5 to +4.5 . And remember that it is the places of near silence that are especially badly reconstructed (on relative terms), so this introduces noticeable noise into the reconstruction.
Mohamad
2018 年 5 月 5 日
Hi 1. Do i need to normalize audio samples to be in the range (-1 to 1 ) before making DCT ? 2. Do i need to make DCT on blocks of audio samples instead of the whole length of audio samples ? Thanks
Walter Roberson
2018 年 5 月 5 日
The samples you get from audioread() are already in the range -1 to +1 before you dct(), and if you did not quantize you would recover the same data.
Testing with a sound sample I happened to have, I found that if I increased my dictionary size to 85 or larger that the reconstructed signal was within range.
You do need to ensure that your reconstructed signal is of the correct length: when you read with ubit1 format, you will always get a multiple of 8 samples (bits) back, and chances are that your huffman encoding was not an exact multiple of 8. Those extra bits will cause problems for decoding.
I experimented with adding an extra entry to the dictionary with value inf and with probability 1/(length(x1)+1), making sure that I normalized the other entries by (length(x1)+1) instead of length(x1) . Then on reconstruction I used isinf() to find the inf in the input stream, and I trim out everything from that point on. This turned out to work just fine.
Mohamad
2018 年 5 月 5 日
Hi - I write binary file using ubit1 . No more warning for Data clippling . I still using Quantization for DCT coefficients . I increased Dictionary to 100 . The binary file size on Disk is around 24KB . I play the sound , still noisy in the background . Why did you add extra entry to the Dictionary with value inf ? How I add this ?
Mohamad
2018 年 5 月 5 日
I also try dct - quantize dct , idct of quantized coefficients . I get idct values in range -0.4353 to 0.3361 .
Walter Roberson
2018 年 5 月 5 日
160044/23522 is about 6.8 which is decent compression.
My tests show that the main way to reduce noise on playback is to use a higher number of dictionary entries.
A lot of the dictionary entries turn out to be unused or barely used, so the main effect of using more dictionary entries is to provide a higher resolution on the entries that are used.
Also, if you were properly handling dictionary entries by writing them to the binary file and restoring them from the file (the binary file should have all of the information needed to recover the sound), then using more entries could raise the size of the compressed file -- which is a standard trade-off in lossy compression, that the better quality you want, the larger the file size needs to be.
Mohamad
2018 年 5 月 6 日
編集済み: Mohamad
2018 年 5 月 6 日
Hi - Please . I made quantization using 512 . So now I have 512 entries for the Dictionary But still I have noise in background . But i noticed by using more Quantization - leads to more compression ratio ( i.e improvement ) . So any means to reduce noise ? Do I need to use more quantization levels ? ( but processing becomes slower ) . I'm only writing the 0's and 1's from huffmanenco - so how i write these 0's and 1's and add all information needed to reconstruct Audio ? I'm using huffmandeco to decode 0's and 1's - so is this decoding don't have all information to reconstruct audio ? If I'm not going to Quantize DCT coefficients - then how to make the Dictionary table for Huffman ?
Walter Roberson
2018 年 5 月 6 日
The only way to avoid having any background noise is either have perfect reconstruction, or else to filter out the high frequency after reconstruction.
For perfect reconstruction you would not quantize and you would not zero out any coefficients. If you quantize or if you zero out coefficients (or both, as you do) then you are certain to get noise. The question becomes how much noise is acceptable. The more partition entries you use, the lower the noise.
Mohamad
2018 年 5 月 6 日
Hi Please , again i got warning of Data clipped - although i’m using 256 quantization levels . Althpugh the inverse DCT are in the range -1 to 1 So how i overcome this warning ? If this warning due to Quantization - how i make Huffman Dictionary with all these DCT coefficients ? Which isvery large number of coefficients . Thanks
Walter Roberson
2018 年 5 月 6 日
The greatest source of noise with that many coefficients is that you are doing the idct of the full dsig, which is the result of the huffmandeco on the data read in as ubit1 . As I described to you before, when you read using ubit1, a full byte is read at the end, leaving you with up to 7 extra 0 bits at the end. When you do the huffman decoding, those 7 extra 0 are likely to turn into one or more extra data samples in dsig. Those extra data samples affect the reconstruction audibly.
You need to figure out some way of ensuring that you extract the same length of signal from the huffman decoding as you put into the huffman encoding. I already described one method to you: add a distinct "end of stream" data element, and after decoding, detect that marker and remove from there onward. Another way to handle the situation is to write the length as part of the binary file.
The second greatest source of noise is the zeroing of the low-energy coefficients.
It takes a lot of dictionary entries to counter-act the effect of zeroing the low-energy coefficients. There seems to be an RMS limit of about 1.86 when the coefficients are zeroed, where-as with the coefficients not zeroed, you can get down to about 0.38 with 512 coefficients.
I am still testing what you can do with more coefficients. It turns out that the internal routines that validate the dictionary are inefficient, involving operations proportional to the square of the number of entries, so there are practical limits in how far out you can test.
Mohamad
2018 年 5 月 7 日
Hi - I get Error using huffmandeco : The encoded signal contains a code which is not present in the dictionary I'm using all the DCT coefficients without zeroing , I checked length of dictionary length_dict = 200 length_comp = 115845 count1 = 115845 length_A = 231696 But why the length of A not equal to length of comp - I get around double length . How I modify the code to extract the same length of signal from the huffman decoding as you put into the huffman encoding ? Thanks
Walter Roberson
2018 年 5 月 7 日
I will look at this after I get up; it is my bedtime now (5 in the morning!)
Mohamad
2018 年 5 月 7 日
Hi - please how i add one distinct bit at the end of sream and detect it ? I tried to add inf to the codebook with probability 1/length(x+1) But i got error : sum of probability must equal to one . Thanks
Mohamad
2018 年 5 月 9 日
編集済み: Walter Roberson
2018 年 5 月 9 日
Hi - please I added inf to the dict , but when used isinf(A) i get 0 - I don't know why
Also I get :
length(quantized) = 80000
length(comp) = 273938
length(A) = 273944
length(dsig) = 81935
So why A is not the same length as comp ?
Hos I make dsig length = 80000 ?
Some times i get I still get data clip warning .
W=I don't know why i'
Walter Roberson
2018 年 5 月 9 日
I do it like this:
p = h2/(L1+1);
%code end of file as infinity
dict = huffmandict([codebook, inf],[p; 1/L1]);
comp = huffmanenco([quantized, inf], dict);
[...]
dsig = huffmandeco(A,dict);
eofpos = find(isinf(dsig), 1, 'first');
if ~isempty(eofpos); dsig(eofpos:end) = []; end
その他の回答 (1 件)
参考
カテゴリ
Help Center および File Exchange で Source Coding についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!エラーが発生しました
ページに変更が加えられたため、アクションを完了できません。ページを再度読み込み、更新された状態を確認してください。
Web サイトの選択
Web サイトを選択すると、翻訳されたコンテンツにアクセスし、地域のイベントやサービスを確認できます。現在の位置情報に基づき、次のサイトの選択を推奨します:
また、以下のリストから Web サイトを選択することもできます。
最適なサイトパフォーマンスの取得方法
中国のサイト (中国語または英語) を選択することで、最適なサイトパフォーマンスが得られます。その他の国の MathWorks のサイトは、お客様の地域からのアクセスが最適化されていません。
南北アメリカ
- América Latina (Español)
- Canada (English)
- United States (English)
ヨーロッパ
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
アジア太平洋地域
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)