problem with binary code

good I ask a question that has nothing to do at the moment with programming in Matlab, but with statistical issues and wonder if anyone can help,
My purpose is predicting the following number of binary string. For this, I have a sequence of binary digits that is:
s= 1(1) 0(2) 1(3) 1(4) 0(5) 0(6) 1(7) 0(8) 0(9) 1(10) 1(11) 1(12) 1(13) 0(14) 0(15) 0(16) 1(17) 1(18) 1(19) 0(20).
What I did then is creating substrings produced as follows:
1) 1 (1) 0 (2) 1 (3) 1 (4) --- [1 0 1 1]
2) 1 (1) 1 (3) 0 (5) 1 (7) --- [1 1 0 1]
3) 1 (1) 1 (4) 1 (7) 1 (10) --- [1 1 1 1]
4) 1 (1) 0 (5) 0 (9) 1 (13) --- [1 0 0 1]
5) 1 (1) 0 (6) 1 (11) 0 (16) --- [1 0 1 0]
6) 1 (1) 1 (7) 1 (13) 1 (19) --- [1 1 1 1]
7) 0 (2) 1 (3) 1 (4) 0 (5) --- [0 1 1 0]
8) 0 (2) 1 (4) 0 (6) 0 (8) --- [0 1 0 0]
9) 0 (2) 0 (5) 0 (8) 1 (11) --- [0 0 0 1]
10) 0 (2) 0 (6) 1 (10) 0 (14) --- [0 0 1 0]
11) 0 (2) 1 (7) 1 (12) 1 (17) --- [0 1 1 1]
12) 0 (2) 0 (8) 0 (14) 0 (20) --- [0 0 0 0]
13) 1 (3) 1 (4) 0 (5) 0 (6) --- [1 1 0 0]
14) 1 (3) 0 (5) 1 (7) 0 (9) --- [1 0 1 0]
15) 1 (3) 0 (6) 0 (9) 1 (12) --- [1 0 0 1]
16) 1 (3) 1 (7) 1 (11) 0 (15) --- [1 1 1 0]
17) 1 (3) 0 (8) 1 (13) 1 (18) --- [1 0 1 1]
18) 1 (4) 0 (5) 0 (6) 1 (7) --- [1 0 0 1]
19) 1 (4) 0 (6) 0 (8) 1 (10) --- [1 0 0 1]
20) 1 (4) 1 (7) 1 (10) 1 (13) --- [1 1 1 1]
21) 1 (4) 0 (8) 1 (12) 0 (16) --- [1 0 1 0]
22) 1 (4) 0 (9) 0 (14) 1 (19) --- [1 0 0 1]
23) 0 (5) 0 (6) 1 (7) 0 (8) --- [0 0 1 0]
24) 0 (5) 1 (7) 0 (9) 1 (11) --- [0 1 0 1]
25) 0 (5) 0 (8) 1 (11) 0 (14) --- [0 0 1 0]
26) 0 (5) 0 (9) 1 (13) 1 (17) --- [0 0 1 1]
27) 0 (5) 1 (10) 0 (15) 0 (20) --- [0 1 0 0]
28) 0 (6) 1 (7) 0 (8) 0 (9) --- [0 1 0 0]
29) 0 (6) 0 (8) 1 (10) 1 (12) --- [0 0 1 1]
30) 0 (6) 0 (9) 1 (12) 0 (15) --- [0 0 1 0]
31) 0 (6) 1 (10) 0 (14) 1 (18) --- [0 1 0 1]
32) 1 (7) 0 (8) 0 (9) 1 (10) --- [1 0 0 1]
33) 1 (7) 0 (9) 1 (11) 1 (13) --- [1 0 1 1]
34) 1 (7) 1 (10) 1 (13) 0 (16) --- [1 1 1 0]
35) 1 (7) 1 (11) 0 (15) 1 (19) --- [1 1 0 1]
36) 0 (8) 0 (9) 1 (10) 1 (11) --- [0 0 1 1]
37) 0 (8) 1 (10) 1 (12) 0 (14) --- [0 1 1 0]
38) 0 (8) 1 (11) 0 (14) 1 (17) --- [0 1 0 1]
39) 0 (8) 1 (12) 0 (16) 0 (20) --- [0 1 0 0]
40) 0 (9) 1 (10) 1 (11) 1 (12) --- [0 1 1 1]
41) 0 (9) 1 (11) 1 (13) 0 (15) --- [0 1 1 0]
42) 0 (9) 1 (12) 0 (15) 1 (18) --- [0 1 0 1]
43) 1 (10) 1 (11) 1 (12) 1 (13) --- [1 1 1 1]
44) 1 (10) 1 (12) 0 (14) 0 (16) --- [1 1 0 0]
45) 1 (10) 1 (13) 0 (16) 1 (19) --- [1 1 0 1]
46) 1 (11) 1 (12) 1 (13) 0 (14) --- [1 1 1 0]
47) 1 (11) 1 (13) 0 (15) 1 (17) --- [1 1 0 1]
48) 1 (11) 0 (14) 1 (17) 0 (20) --- [1 0 1 0]
49) 1 (12) 1 (13) 0 (14) 0 (15) --- [1 1 0 0]
50) 1 (12) 0 (14) 0 (16) 1 (18) --- [1 0 0 1]
51) 1 (13) 0 (14) 0 (15) 0 (16) --- [1 0 0 0]
52) 1 (13) 0 (15) 1 (17) 1 (19) --- [1 0 1 1]
53) 0 (14) 0 (15) 0 (16) 1 (17) --- [0 0 0 1]
54) 0 (14) 0 (16) 1 (18) 0 (20) --- [0 0 1 0]
55) 0 (15) 0 (16) 1 (17) 1 (18) --- [0 0 1 1]
56) 0 (16) 1 (17) 1 (18) 1 (19) --- [0 1 1 1]
57) 1 (17) 1 (18) 1 (19) 0 (20) --- [1 1 1 0]
And I've also calculated the relative frequency of these substrings
0 0 0 0------ 0,0175438596491228
0 0 0 1------ 0,0350877192982456
0 0 1 0------ 0,0877192982456140
0 0 1 1------ 0,0701754385964912
0 1 0 0------ 0,0701754385964912
0 1 0 1------ 0,0701754385964912
0 1 1 0------ 0,0526315789473684
0 1 1 1------ 0,0526315789473684
1 0 0 0------ 0,0175438596491228
1 0 0 1 0,122807017543860
1 0 1 0 0,0701754385964912
1 0 1 1 0,0701754385964912
1 1 0 0 0,0526315789473684
1 1 0 1 0,0701754385964912
1 1 1 0 0,0701754385964912
1 1 1 1 0,0701754385964912
Now let's say I want to know if the number 21 of the succession will be "0" or "1". To do this do the following:
s= 1(1) 0(2) 1(3) 1(4) 0(5) 0(6) 1(7) 0(8) 0(9) 1(10) 1(11) 1(12) 1(13) 0(14) 0(15) 0(16) 1(17) 1(18) 1(19) 0(20) X(21)
and now build substrings that have to do with X:
1: 1 (18), 1 (19), 0 (20), X (21) --- [1,1,0, X]
2: 0 (15), 1 (17), 1 (19), X (21) --- [0,1,1, X]
3: 1 (12), 0 (15), 1 (18), X (21) --- [1,0,1, X]
4: 0 (9), 1 (13), 1 (17), X (21) --- [0,1,1, X]
5: 0 (6), 1 (11), 0 (16), X (21) --- [0,1,0, X]
6: 1 (3), 0 (9), 0 (15), X (21) --- [1,0,0, X]
And replacing the X I have:
X = 1,
1: 1 (18), 1 (19), 0 (20), X (21) --- [1,1,0, 1 ]
2: 0 (15), 1 (17), 1 (19), X (21) --- [0,1,1, 1 ]
3: 1 (12), 0 (15), 1 (18), X (21) --- [1,0,1, 1 ]
4: 0 (9), 1 (13), 1 (17), X (21) --- [0,1,1, 1 ]
5: 0 (6), 1 (11), 0 (16), X (21) --- [0,1,0, 1 ]
6: 1 (3), 0 (9), 0 (15), X (21) --- [1,0,0, 1 ]
X = 0,
1: 1 (18), 1 (19), 0 (20), X (21) --- [1,1,0, 0 ]
2: 0 (15), 1 (17), 1 (19), X (21) --- [0,1,1, 0 ]
3: 1 (12), 0 (15), 1 (18), X (21) --- [1,0,1, 0 ]
4: 0 (9), 1 (13), 1 (17), X (21) --- [0,1,1, 0 ]
5: 0 (6), 1 (11), 0 (16), X (21) --- [0,1,0, 0 ]
6: 1 (3), 0 (9), 0 (15), X (21) --- [1,0,0, 0 ]
Okay, from here someone could tell me how I can study the probability to predict the next number in the string?

10 件のコメント

Walter Roberson
Walter Roberson 2013 年 8 月 26 日
Relative frequency of the substrings compared to what? Compared to the first 20 entries?
FRANCISCO
FRANCISCO 2013 年 8 月 27 日
compare the relative frequencies of the substrings with the relative frequencies of the substrings that include X. Substituting X by "0" and "1" get another 12 substrings. But do not know how I can calculate the odds. I'm a little busy Many thanks
Walter Roberson
Walter Roberson 2013 年 8 月 27 日
Before you get there, you wrote "And I've also calculated the relative frequency of these substrings". What are the frequencies being compared relative to? To the other 15 length-4 binary substrings?
FRANCISCO
FRANCISCO 2013 年 8 月 27 日
basically what I intend to analyze how changing the relative frequencies and then calculate the probability. I just calculated the relative frequencies of the subsets produced. These relative frequencies obtained from the different patterns produced in the subseries I have to carry the patterns in which X appears, and from there study the probability. And it is in this step where I am a bit lost
dpb
dpb 2013 年 8 月 27 日
編集済み: dpb 2013 年 8 月 28 日
The first question is how did you get the first 20? If the subsequent process is to be the same as the first, then that's the way to generate the next.
If it's purely empirical from the observation, see if can reject a hypothesis of having come from iiu; if not just set
x(n+1)=randi([0 1]);
If so, then select alternate hypotheses.
FRANCISCO
FRANCISCO 2013 年 8 月 29 日
is a random process. What I want to know is if the next bit is 0 or 1, using the above sequence and subsequences produced. As I can perform analysis of probabilities?
dpb
dpb 2013 年 8 月 29 日
編集済み: dpb 2013 年 8 月 30 日
Again, what is the underlying process? Why is it any more or less likely the next is 0 or 1? If it's random, you can't know that a priori.
Sure, you can do all kinds of heuristic things; I'm trying to figure out the basis for why you would choose to do any particular one over any other.
As an example of the most basic idea, what if one assumes a binomial process w/ p=q=0.5? In your sequence of 20, Nobs=11.
For a binomial P(T >=tobs) = 1-P(T <= tobs-1)
p-value = 1-binocdf(10,20,0.5)
>> 1-binocdf(10,20,.5) ans = 0.4119
Certainly no reason to reject altho the sample size is quite small, of course. That is doubly so when you try to use subsamples of the sequence. There are also runs tests, etc., etc., etc, for additional tests. You might look at the NIST battery for a handle on way in which such sequences are tested when qualifying PRNGs, for example.
FRANCISCO
FRANCISCO 2013 年 8 月 30 日
sorry, is that English is not my language and may misunderstand or do not know clearly express. Imagine that we flip a coin 20 times (binary sequence). When "face" is "1" and when "cross" is "0". What I want to know is the probability of getting "heads" or "tails" in the next release (Release No. 21). To do what I have done is to make combinations with subsequence length = 4 and find the most repeated patterns. And then find the sub in which I get the release (21), in order to increase my probability of success. I do not know how to find the probabilities of the subsequences. For example let's look at the first pattern with the X:
1: 1 (18) 1 (19) 0 (20), X (21) --- [1,1,0, X]
Here are two possible combinations if Substituting X = 1 and X = 0:
[1,1,0, 1]
[1,1,0, 0]
If we look at the above table shows the relative frequencies where we can see that:
1 1 0 1 ------ 0.0701754385964912
1 1 0 0 ------ 0.0526315789473684
then we see that along all combinations 1101 has been repeated more times. What I want to find is the probability that 1101 will repeat once again, or to repeat 1100.
dpb
dpb 2013 年 8 月 30 日
編集済み: dpb 2013 年 8 月 30 日
If it's a fair coin, then the P(H) on the (N+1)th trial is still 0.5 whatever the preceding sequence -- even if the preceding N were all T (or H).
If it's not fair, then estimate the actual bias of p (or q). As noted above, there's insufficient evidence on the result of the number of H above to conclude that p~=0.5 isn't as good a value as any.
What other information is there to use? The point is that one random realization of a process is subject to randomness such that another realization from the same process could produce the obverse case from what you have above -- namely that
Pobs(1101) ~ 0.05
Pobs(1100) ~ 0.07
instead or some other values entirely. I don't see any reason not to use the expected value for either sequence of 1/16 = 0.0625. Note that your values are scattered on either side of that, additional evidence that the underlying assumption of binomial w/ p=q=0.5 is as good a model as any.
ADDENDUM: The above 1/16 and the observed values roughly equal correlate w/ the other note just posted that basically implies that despite the selection of other than the sequence samples as they arrived the generating process looks w/ this limited sample as though it is pretty much a fair binomial with p=0.5.
dpb
dpb 2013 年 8 月 30 日
Another comment on the "probabilities" you've calculated from sequences. If the process is one of generating a sequence, then the observed sequence from the process is not represented by sequences other than those from n:m. The selection of arbitrary subsequences such as many of those you've listed above are not actual sample sequences unless the previous assumption you've claimed is so about there being serial dependence is violated as you've arbitrarily selected samples with different steps between samples.
All I can see that are valid observations if you have reason to look at four subsequent samples are the 16 that you can construct from 1:4, 2:5, ..., 16:20. Everything else is dependent upon there being no correlation at all from one to the other to be a valid sequence (which seems to violate the earlier assertion regarding the underlying randomness not being purely random).

サインインしてコメントする。

回答 (1 件)

David Sanchez
David Sanchez 2013 年 8 月 30 日

0 投票

In your case, the relative frequency of the sequence:
0 0 1 1 is 0,0701754385964912
Then,it means that if you take your whole sequence as a projection of what's going to happen in the future, that very same relative frequency will be the likelihood (probability) of the value to happen again. The likelihood of 0000 is 0,0175438596491228, the likelihood of 0001 is 0,0350877192982456, and so on.

3 件のコメント

FRANCISCO
FRANCISCO 2013 年 8 月 30 日
Yes, but I want to study probability, ie the probability that a repeat that pattern. This is because the process is not completely random but pseudorandom and each time are repeated patterns
dpb
dpb 2013 年 8 月 30 日
But, unless the generator is one that is particularly flawed for some reason like the period is very short or it does have a peculiarity of serial correlation of period N or the like that you can exploit, there's not much to be said other than the particular realization gave you that particular case.
If you're just trying to study a given generator, as mentioned above look for the NIST battery of tests for randomness for ideas on how and what is tested for in general.
Perhaps if you tried to outline the end objective of where you're headed as a final result rather than focusing on the mechanics it would lead to a better response but as is I just don't see what good this is going to do you to look at it this way unless it is trying to qualify the PRNG.
Walter Roberson
Walter Roberson 2013 年 8 月 31 日
Perhaps you should be checking the autocorrelation.

サインインしてコメントする。

タグ

質問済み:

2013 年 8 月 26 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by