I want to know if I am calculating the results are right?
1 回表示 (過去 30 日間)
古いコメントを表示
I have applied run length encoding on a string or characters. Let it be
'ttttttPPttttttPPNNNtttttt'
which after Run length encoding turns to be t6P2t6P2N3t6
so I calculated the compression ratio to be (12/25)*100= 48%
Is it correct???
0 件のコメント
採用された回答
Walter Roberson
2015 年 5 月 6 日
編集済み: Walter Roberson
2015 年 5 月 6 日
Close. The original is length 25 not 26.
But how are you going to handle counts of greater than 9? And how are you going to handle characters that do not repeat?
8 件のコメント
Walter Roberson
2015 年 5 月 8 日
Your aaAA is a vector that contains a mix of original characters and binary counts. Because of the details of how you built it, the vector will be of type char(). The spaces or not that you see are the results of looking at char() of the binary counts. For example if the count happened to be 9 then the corresponding position would hold char(9) which is the tab character. If you want the character '9' to show up instead of binary 9, then you need to change your program, and you need to take into account that counts of more than 9 might occur so that multiple output character positions in the string might be required to do the encoding.
その他の回答 (1 件)
Thomas Koelen
2015 年 5 月 6 日
a='ttttttPPttttttPPNNNtttttt';
b='t6P2t6P2N3t6';
A=whos('a');
B=whos('b');
compression=B.bytes/A.bytes*100;
compression =
48
7 件のコメント
Walter Roberson
2015 年 5 月 12 日
What Thomas had written originally was the calculation for percentage of the original file size, but as I explained earlier the standard for compression ratio is number of original bytes to number of compressed bytes which is the inverse and has no percentage calculation. If the compressed file was 50% of the original size then it is half of the original size, so two bytes of original data are represented by 1 byte of compressed data for a ratio of 2:1. Better compression would have the first number increase. 4:1 would mean 4 bytes of original became 1 compressed, so the compressed file was 1/4 (25%) of the original size.
Walter Roberson
2015 年 5 月 12 日
In general, using the whos() byte count to define the compression ratio is wrong. whos() will report the number of bytes used to store the data in the data type you used for your storage, but if you did not pack that datatype completely full then you could create a byte array that was smaller and would be the "real" size.
For example it might be convenient (or sloppy programming) to use double precision values to store results that are integral from 0 to 2^24-1 . That's only 3 bytes worth of value stored in 64 bits (8 bytes). The 3 useful bytes from each value could be saved to an array or written to a file, producing a smaller result.
When you use the whos() byte count you need to understand how much information you have packed into those bytes and adjust accordingly.
For this reason, the calculations would more often be done not by asking the programming language for the number of bytes used for storage (which might include overheads), but instead by multiplying the number of elements stored by the number of used bytes per element. In some cases it instead becomes more useful to multiply the count of the number of elements stored by the number of useful bits per element, and divide the total by the bits-per-byte of 8, rounding up: the result is the number of bytes that the information could be written to in a file, after going through procedures to pack down the information. It isn't always worth going through that trouble of packing down the information just to prove that you could. For example you might run different algorithms, find the one with the smallest theoretical size, and only then bother to do the packing down, since packing down can be a sort-of-expensive operation when you can't access hardware bit-rotate instructions.
参考
カテゴリ
Help Center および File Exchange で Characters and Strings についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!