Efficient algorithm to compute only the most probable sequences of a random variable out of all possible sequences

Question

Bill Tubbs 2021 年 6 月 23 日

1
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/862870-efficient-algorithm-to-compute-only-the-most-probable-sequences-of-a-random-variable-out-of-all-poss

編集済み: Matt J 2021 年 6 月 24 日

採用された回答: Matt J

MATLAB Online で開く

I'm looking for a better way to compute the possible sequences of a random variable whos value at time k is given by.

x(k) = 1 with probability p(1), 2 with probability p(2), ... np with probability p(np).

However, since the number of possible sequences increases exponentially with n and np, I only want to compute the most probable sequences which have a combined probability less than a threshold p_cut.

I've devised a function to do this below but it has two major drawbacks.

It doesn't generalize to sequences longer than 3.
It is quite memory inefficient since it always computes all possible permutations when only a subset may be needed.

function S = most_probable_sequences(p, n, p_cut)
    assert(sum(p) == 1);
    np = numel(p);
    switch n
        case 1
            all_perms = (1:np)';
        case 2
            [X,Y] = ndgrid(1:np,1:np);
            all_perms = [X(:) Y(:)];
        case 3
            [X,Y,Z] = ndgrid(1:np,1:np,1:np);
            all_perms = [X(:) Y(:) Z(:)];
        otherwise
            error("n > 3 not implemented")
    end
    probs = prod(reshape(p(all_perms), [], n), 2);
    [probs_sorted, order] = sort(probs, 'descend');
    most_prob = cumsum(probs_sorted) <= p_cut;
    S = all_perms(order(most_prob), :);
end

Examples of correct output of this function:

>> S = most_probable_sequences([0.95 0.04 0.01], 1, 0.99)

S =

1

2

>> S = most_probable_sequences([0.01 0.04 0.95], 2, 0.99)

S =

3 3

3 2

2 3

3 1

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

Bill Tubbs 2021 年 6 月 23 日

Thanks, yes I'm expecting some of the orderings to be arbitrary when p_cut < 1. A next step might be to think about what particular orderings I want but at the moment I'm just trying to cap the total number sequences. As long as the number of sequences is a minimum.

Matt J 2021 年 6 月 23 日

編集済み: Matt J 2021 年 6 月 23 日

MATLAB Online で開く

It is still a question though, whether you need the sequences that differ only by ordering listed explicitly. You could save memory and computation time if instead of listing the solution as,

S =[ 3     3
   2
   3
   1];

you instead listed the result in tabular form like,

C=cell2table({[3 3], 1; [3 2], 2; [3 1], 1},'VariableNames',{'Sequence','Permutations'})
C = 3×2 table
    Sequence    Permutations
    ________    ____________

     3    3          1      
     3    2          2      
     3    1          1      

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Matt J 2021 年 6 月 23 日

2
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/862870-efficient-algorithm-to-compute-only-the-most-probable-sequences-of-a-random-variable-out-of-all-poss#answer_731520

編集済み: Matt J 2021 年 6 月 24 日

MATLAB Online で開く

This might work, but it requires uniqueperms from the file exchange:

p=[0.01 0.04 0.95];
S = most_probable_sequences(p, 2, 0.99)
S = 4×2
     3     3
     2     3
     3     2
     1     3
function S=most_probable_sequences(p, n, p_cut)
    assert( abs(sum(p)-1) <= 1e-12);
    S =  recursor(p, n, p_cut);
    
end
function S = recursor(p, n, p_cut,~)
  if n==1
    [p,S]=sort(p(:),'descend');   
    cut=cumsum(p)<=p_cut;
    S=S(cut);
    return
  end
  
   s=recursor(p, n-1, p_cut,[]);
   s=unique(sort(s,2),'rows');
   
   np=numel(p);
   ns=size(s,1);
   
   [I,J]=ndgrid(1:ns,1:np);
   
   s=unique(sort([s(I(:),:),J(:)],2),'rows');
   
   s=num2cell(s,2);
   for i=1:numel(s), s{i}=uniqueperms(  s{i} ); end
   s=cell2mat(s);
   
   sp=prod(p(s),2);
   [Probs,is]=sort(sp(:),'descend');
  
   cut=cumsum(Probs)<=p_cut;
   
   S=s(is(cut),:);
   
end

6 件のコメント
4 件の古いコメントを表示4 件の古いコメントを非表示

Bill Tubbs 2021 年 6 月 23 日

What is uniqueperms()? I see more than one thing on File Exchange with this name.

Matt J 2021 年 6 月 23 日

I used this one,

https://www.mathworks.com/matlabcentral/fileexchange/72552-uniqueperms

but I suppose it doesn't matter, if they all do the same thing and are equally fast.

サインインしてコメントする。

Answer 2

Matt J 2021 年 6 月 24 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/862870-efficient-algorithm-to-compute-only-the-most-probable-sequences-of-a-random-variable-out-of-all-poss#answer_732280

編集済み: Matt J 2021 年 6 月 24 日

MATLAB Online で開く

but it has two major drawbacks....It doesn't generalize to sequences longer than 3.

The first drawback at least is easy to fix.

function S = most_probable_sequences(p, n, p_cut)
    assert( abs(sum(p)-1) <= 1e-12);
    np = numel(p);
    
    [all_perms{1:n}] = ndgrid(1:np);
    all_perms=reshape( cat(n+1,all_perms{:})  , [],n);
    
    probs = prod(reshape(p(all_perms), [], n), 2);
    [probs_sorted, order] = sort(probs, 'descend');
    most_prob = cumsum(probs_sorted) <= p_cut;
    S = all_perms(order(most_prob), :);
end

This will be faster, but less memory efficient than my other answer.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Answer 3

Matt J 2021 年 6 月 24 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/862870-efficient-algorithm-to-compute-only-the-most-probable-sequences-of-a-random-variable-out-of-all-poss#answer_732470

編集済み: Matt J 2021 年 6 月 24 日

MATLAB Online で開く

Here is a far more efficient version of my original answer based on my recommendation above to avoid explicitly listing permutations of the same sequence. It requires no File Exchange downloads and, as demonstrated below, easily handles up to n=12 and np=10.

%Small example
S = most_probable_sequences([0.01 0.04 0.95], 2, 0.99),
S = 3×2 table
    Sequences    NumPerms
    _________    ________

     3    3         1    
     2    3         2    
     1    3         1    
%Large example
p=rand(1,10);p=p/sum(p);
tic
S = most_probable_sequences(p, 12, 0.99);
toc
Elapsed time is 1.789715 seconds.
function S=most_probable_sequences(p, n, p_cut)
    assert( abs(sum(p)-1) <= 1e-12)
    S =  recursor(p(:), n, p_cut);
    
end
function S = recursor(p, n, p_cut)
  if n==1
    [p,S]=sort(p(:),'descend');   
    ncut=find(cumsum(p)<=p_cut,1,'last');
    S=S(1:ncut);
    S=table(S,ones(size(S)),'VariableNames',{'Sequences','NumPerms'});
    return
  end
  
   s=recursor(p, n-1, p_cut);
   s=s.Sequences;
   
   np=numel(p);
   ns=size(s,1);
   
   [I,J]=ndgrid(1:ns,1:np);
   s=unique( sort( [s(I(:),:), J(:)] ,2) ,'rows','stable'); clear I J
   ns=size(s,1);
   
   
     X=repmat((1:ns).',1,n);
     K=histcounts2(X,s,1:ns+1,1:np+1);
     F = cumprod([1 1 2:n]) ; % trick to get all needed factorials (note that all K < N)                                
   w   = F(n+1) ./ prod(F(K+1),2) ; % number of unique permutations
     clear F K X
   
   sProbs=prod(p(s),2); %single sequence probabilities
   wProbs=w.*sProbs; %weighted probabilities
   [wProbs,order]=sort(wProbs(:),'descend');
     sProbs=sProbs(order);
   
   c=cumsum(wProbs);
   
   mcut=find( c<=p_cut,1,'last');
   kcut=floor( (p_cut-c(mcut))/sProbs(mcut+1));
   ncut=mcut+(kcut>0);
   order=order(1:ncut);
   
   S=s(order,:);
   w=w(order);
   if kcut>0, w(end)=kcut; end
   
   S=table(S,w,'VariableNames',{'Sequences','NumPerms'});
end

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Efficient algorithm to compute only the most probable sequences of a random variable out of all possible sequences

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

採用された回答

6 件のコメント
4 件の古いコメントを表示4 件の古いコメントを非表示

その他の回答 (2 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Efficient algorithm to compute only the most probable sequences of a random variable out of all possible sequences

4 件のコメント 2 件の古いコメントを表示2 件の古いコメントを非表示

採用された回答

6 件のコメント 4 件の古いコメントを表示4 件の古いコメントを非表示

その他の回答 (2 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

6 件のコメント
4 件の古いコメントを表示4 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示