Get as many data processing

good,
I previously had a binary sequence and my purpose was the creation of substrings of various lengths, eg length 4:
Sequence
1(1), 0(2), 1(3), 1(4), 0(5), 0(6), 1(7), 0(8), 0(9), 1(10), 1(11), 1(12),
1(13), 0(14), 0(15), 0(16), 1(17), 1(18), 1(19), 0(20)
Substrings
01: 1(01), 0(02), 1(03), 1(04) -> [1,0,1,1],
02: 1(01), 1(03), 0(05), 1(07) -> [1,1,0,1],
03: 1(01), 1(04), 1(07), 1(10) -> [1,1,1,1],
04: 1(01), 0(05), 0(09), 1(13) -> [1,0,0,1],
05: 1(01), 0(06), 1(11), 0(16) -> [1,0,1,0],
06: 1(01), 1(07), 1(13), 1(19) -> [1,1,1,1],
07: 0(02), 1(03), 1(04), 0(05) -> [0,1,1,0],
08: 0(02), 1(04), 0(06), 0(08) -> [0,1,0,0],
09: 0(02), 0(05), 0(08), 1(11) -> [0,0,0,1],
10: 0(02), 0(06), 1(10), 0(14) -> [0,0,1,0],
11: 0(02), 1(07), 1(12), 1(17) -> [0,1,1,1],
12: 0(02), 0(08), 0(14), 0(20) -> [0,0,0,0],
13: 1(03), 1(04), 0(05), 0(06) -> [1,1,0,0],
14: 1(03), 0(05), 1(07), 0(09) -> [1,0,1,0],
15: 1(03), 0(06), 0(09), 1(12) -> [1,0,0,1],
16: 1(03), 1(07), 1(11), 0(15) -> [1,1,1,0],
17: 1(03), 0(08), 1(13), 1(18) -> [1,0,1,1],
18: 1(04), 0(05), 0(06), 1(07) -> [1,0,0,1],
19: 1(04), 0(06), 0(08), 1(10) -> [1,0,0,1],
20: 1(04), 1(07), 1(10), 1(13) -> [1,1,1,1],
21: 1(04), 0(08), 1(12), 0(16) -> [1,0,1,0],
22: 1(04), 0(09), 0(14), 1(19) -> [1,0,0,1],
23: 0(05), 0(06), 1(07), 0(08) -> [0,0,1,0],
24: 0(05), 1(07), 0(09), 1(11) -> [0,1,0,1],
25: 0(05), 0(08), 1(11), 0(14) -> [0,0,1,0],
26: 0(05), 0(09), 1(13), 1(17) -> [0,0,1,1],
27: 0(05), 1(10), 0(15), 0(20) -> [0,1,0,0],
28: 0(06), 1(07), 0(08), 0(09) -> [0,1,0,0],
29: 0(06), 0(08), 1(10), 1(12) -> [0,0,1,1],
30: 0(06), 0(09), 1(12), 0(15) -> [0,0,1,0],
31: 0(06), 1(10), 0(14), 1(18) -> [0,1,0,1],
32: 1(07), 0(08), 0(09), 1(10) -> [1,0,0,1],
33: 1(07), 0(09), 1(11), 1(13) -> [1,0,1,1],
34: 1(07), 1(10), 1(13), 0(16) -> [1,1,1,0],
35: 1(07), 1(11), 0(15), 1(19) -> [1,1,0,1],
36: 0(08), 0(09), 1(10), 1(11) -> [0,0,1,1],
37: 0(08), 1(10), 1(12), 0(14) -> [0,1,1,0],
38: 0(08), 1(11), 0(14), 1(17) -> [0,1,0,1],
39: 0(08), 1(12), 0(16), 0(20) -> [0,1,0,0],
40: 0(09), 1(10), 1(11), 1(12) -> [0,1,1,1],
41: 0(09), 1(11), 1(13), 0(15) -> [0,1,1,0],
42: 0(09), 1(12), 0(15), 1(18) -> [0,1,0,1],
43: 1(10), 1(11), 1(12), 1(13) -> [1,1,1,1],
44: 1(10), 1(12), 0(14), 0(16) -> [1,1,0,0],
45: 1(10), 1(13), 0(16), 1(19) -> [1,1,0,1],
46: 1(11), 1(12), 1(13), 0(14) -> [1,1,1,0],
47: 1(11), 1(13), 0(15), 1(17) -> [1,1,0,1],
48: 1(11), 0(14), 1(17), 0(20) -> [1,0,1,0],
49: 1(12), 1(13), 0(14), 0(15) -> [1,1,0,0],
50: 1(12), 0(14), 0(16), 1(18) -> [1,0,0,1],
51: 1(13), 0(14), 0(15), 0(16) -> [1,0,0,0],
52: 1(13), 0(15), 1(17), 1(19) -> [1,0,1,1],
53: 0(14), 0(15), 0(16), 1(17) -> [0,0,0,1],
54: 0(14), 0(16), 1(18), 0(20) -> [0,0,1,0],
55: 0(15), 0(16), 1(17), 1(18) -> [0,0,1,1],
56: 0(16), 1(17), 1(18), 1(19) -> [0,1,1,1],
57: 1(17), 1(18), 1(19), 0(20) -> [1,1,1,0],
using the following code
if true
% code
N = 20;
n = 4;
A = hankel(1:N-n+1,N-n+1:N);
k = 0:n-1;
c = ceil((N - A(:,end) + 1)/k(end));
i2 = cumsum(c);
i1 = i2 - c + 1;
idx = zeros(i2(end),n);
for jj = 1:N-n+1
idx(i1(jj):i2(jj),:) = bsxfun(@plus,A(jj,:),(0:c(jj)-1)'*k);
end
[j1,j2,j2] = unique(s(idx),'rows')
out = [j1, histc(j2,1:max(j2))/i2(end)]; % This row corrected
end
and at the end get a count of the times to repeat each pattern and their relative frequency:
0 0 0 0------ 161697-- 0,0606515378844711
0 0 0 1------ 163593-- 0,0613627156789197
0 0 1 0------ 164201-- 0,0615907726931733
0 0 1 1------ 166680-- 0,0625206301575394
0 1 0 0------ 164105-- 0,0615547636909227
0 1 0 1------ 166501-- 0,0624534883720930
0 1 1 0------ 167099-- 0,0626777944486122
0 1 1 1------ 168835-- 0,0633289572393098
1 0 0 0------ 164086-- 0,0615476369092273
1 0 0 1------ 166963-- 0,0626267816954239
1 0 1 0------ 166931-- 0,0626147786946737
1 0 1 1------ 169470-- 0,0635671417854464
1 1 0 0------ 166622-- 0,0624988747186797
1 1 0 1------ 169326-- 0,0635131282820705
1 1 1 0------ 169251-- 0,0634849962490623
1 1 1 1------ 170640-- 0,0640060015003751
The problem that arises is that when I processed this way I only processes some 4000 data and need to process many more. I have 4GB of RAM and Matlab 2012. What I thought is this: Assign each patron an integer:
0 0 0 0------ 1
0 0 0 1-------2
0 0 1 0-------3
0 0 1 1-------4
0 1 0 0-------5
0 1 0 1-------6
0 1 1 0-------7
0 1 1 1-------8
1 0 0 0-------9
1 0 0 1-------10
1 0 1 0-------11
1 0 1 1-------12
1 1 0 0-------13
1 1 0 1-------14
1 1 1 0-------15
1 1 1 1-------16
and set as a counter to assign the number of times to repeat that integer. In this way perhaps get as many data processing. thank you very much

回答 (1 件)

Walter Roberson
Walter Roberson 2013 年 10 月 25 日

0 投票

If you are going to do that, consider using accumarray() to do the additions.
If B is the array of bits, such as
B = [0 0 0 0; 1 0 0 0; 0 1 0 0; 1 0 0 0]
then
counts = accumarray( B(:,1) * 8 + B(:,2) * 4 + B(:,3) * 2 + B(:,4) * 1 + 1, 1 );

16 件のコメント

FRANCISCO
FRANCISCO 2013 年 10 月 25 日
Do not quite understand what you mean. I'm trying to transform each of the 16 patterns to an integer so you can process more data. From there to count and relative frequency calculation. But I find it hard creating substrings from integers and the order established
Walter Roberson
Walter Roberson 2013 年 10 月 26 日
You have some existing logic that can figure out the 1 0 0 0 part of your
1 0 0 0------ 164086-- 0,0615476369092273
line, for each combination you are trying to process. Convert that existing logic slightly to produce a row-oriented matrix (Samples by 4) of these decoded values. The accumarray() call that I showed will then convert the 4 bits into an integer subscript and accumarray() will do the totaling for you.
The result will be a vector of (probably) 16 elements, one count per element. The bit patterns corresponding are the binary representations of (the index minus 1). So [0 0 0 0] for the first vector entry, [0 0 0 1] for the second vector entry, and so on.
FRANCISCO
FRANCISCO 2013 年 10 月 27 日
編集済み: Walter Roberson 2013 年 10 月 27 日
I tried to do it but I did want to verify correctly. Have if I understand correctly:
I have a long sequence of 1 and 0, probably about 171000 data. I using the following code:
if true
% code
accumArray counts = (B (:, 1) * 8 + B (:, 2) * 4 + B (:, 3) * 2 + B (:, 4) * 1 + 1, 1);
end
get the times to repeat each pattern, where the pattern is represented by integers.
If I wanted to build substrings of length 5, transforming them to integers and count the times that repeat as you would in the expression above?.
thank you very much
Walter Roberson
Walter Roberson 2013 年 10 月 27 日
B(:,1) * 16 + B(:,2) * 8 + B(:,3) * 4 + B(:,4) * 2 + B(:,5) * 1 + 1
Notice the pattern, [8 4 2 1]. You can calculate that pattern for substrings of length N, and do not need to represent it explicitly:
B * (2.^fliplr(1:N)).' + 1
Note: that is * and not .* as it is matrix multiplication.
FRANCISCO
FRANCISCO 2013 年 10 月 27 日
okei, I'm beginning to understand. I for example I have the following binary sequence:
s = [1 0 1 1 0 0 1 0 1 0 0 0];
and I want to calculate how many times are patterns of length 4 in that sequence, according to the sequence established at the beginning of the thread. For this I use:
if true
% code
accumArray counts = (s (:, 1) * 8 + s (:, 2) * 4 + s (:, 3) * 2 + s (:, 4)
  • 1 + 1, 1).
end
Here I would count:
1-0000
2-0001
.
.
16-1111.
To get the count of substrings of length 5 apply:
if true
% code
s * (2. ^ fliplr (1: N)). '+ 1
end
to get the count of length 6:
if true
% code
s * (2. ^ fliplr (1: N)). '+ 1
end
Walter Roberson
Walter Roberson 2013 年 10 月 27 日
The s * (2. ^ fliplr (1: N)). '+ 1 form can be used for N = 4 as well.
FRANCISCO
FRANCISCO 2013 年 10 月 28 日
I arrived at the solution. What should I do to from the sequence of binary numbers:
s = [0 1 0 0 1 1 1 1 0 1 1 1 1 1 0 0 0 0 0];
count the number of occurrences of substrings of length 4,5,6, .... 20, but due to the amount of data, I do I just count patterns and not to store the substrings as otherwise processed a total of 171000 would not reach as data processing produce.
FRANCISCO
FRANCISCO 2013 年 10 月 28 日
some solution??
Walter Roberson
Walter Roberson 2013 年 10 月 28 日
編集済み: Walter Roberson 2013 年 10 月 28 日
accumarray( (s(1:4:end) * 8 + s(2:4:end) * 4 + s(3:4:end) * 2 + s(4:4:end) * 1 + 1) .', 1)
FRANCISCO
FRANCISCO 2013 年 10 月 28 日
I just applied but i dont i get the same result as the previous code. I think the processing of these data is impossible 171000
Walter Roberson
Walter Roberson 2013 年 10 月 28 日
In your original code, how do you handle the boundary cases at the end, such as when there are only 3 bits left ?
If you could upload a .txt file with your bit pattern, I will run it through a couple of different counting methods and see if I get agreement.
FRANCISCO
FRANCISCO 2013 年 10 月 28 日
I will send two. Binary sequences are approximately 200,000 data. Post one in the form s = [0 1 0 0 1 ....] and another in the form s = s' From this I have to create substrings sequence 4,5,6 ... 20. and enumeration of patterns. The problem is that not all data processed because of insufficient memory. But if I could treat the data differently, I have not created substrings need storage, but the count of patterns if needed storage. thank you very much
FRANCISCO
FRANCISCO 2013 年 10 月 29 日
Any solution?? Many thanks
Walter Roberson
Walter Roberson 2013 年 10 月 29 日
Sorry, I have been busy, and now I need to go sleep.
FRANCISCO
FRANCISCO 2013 年 10 月 29 日
I tried several ways but it is impossible. Maybe I should use c #
FRANCISCO
FRANCISCO 2013 年 10 月 29 日
Walter, you know c #?. I have the code in c # but I would like to build it in matlab but nose if possible

サインインしてコメントする。

カテゴリ

ヘルプ センター および File ExchangeMatrix Indexing についてさらに検索

製品

質問済み:

2013 年 10 月 25 日

コメント済み:

2013 年 10 月 29 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by