Improving efficiency of a char array function
1 回表示 (過去 30 日間)
古いコメントを表示
I have built a function to be run 100000 times in a loop. Its input, "motifs", is a full 20x15 char array, containing only four types of characters, 'A', 'C', 'G', 'T'. I am wondering if my implementation, below, can be improved of if it is pretty much as fast as it gets:
% Find consensus string and score of a motifs array
function [score, count, consensus] = scoremotifs_2(motifs)
count = [ sum(motifs == 'A',1) ;
sum(motifs == 'C',1) ;
sum(motifs == 'G',1) ;
sum(motifs == 'T',1) ];
[count_max, consensus_num] = max(count);
consensus(consensus_num == 1) = 'A';
consensus(consensus_num == 2) = 'C';
consensus(consensus_num == 3) = 'G';
consensus(consensus_num == 4) = 'T';
score = sum(sum(motifs ~= consensus));
3 件のコメント
the cyclist
2017 年 1 月 8 日
編集済み: the cyclist
2017 年 1 月 8 日
One thing to be aware of ...
It looks to me like your algorithm will be biased to overcount 'A' relative to the other elements, because in the case of a tied number of counts, consensus_num_ is going to choose the first index. (Similarly, 'C' will be more likely than 'G' and 'T', etc.)
採用された回答
the cyclist
2017 年 1 月 8 日
編集済み: the cyclist
2017 年 1 月 8 日
This is starting to get a little obfuscated, but it's significantly faster:
list ='ACGT';
count = sum(motifs == reshape(list,[1 1 4]));
[count_max, consensus_num] = max(count,[],3);
consensus = list(consensus_num);
score = sum(sum(motifs ~= consensus));
The key algorithmic difference here is that I am able to compare against all four elements 'ACGT' in parallel, by permuting them into a third dimension.
You need R2016b or later in order for MATLAB to make the implicit dimension expansion to compare the 20x15x1 array against the 1x1x4 array. If you have an older version, you will explicitly need to use repmat.
2 件のコメント
the cyclist
2017 年 1 月 8 日
Also, because you are calling this function lots of time, you might want to do something like defining
list = 'ACGT';
plist = reshape(list,[1 1 4]);
in the calling function, and pass them in as argument, rather than needing to do that reshape every time. But I think that will be only a tiny speedup, if any.
その他の回答 (1 件)
the cyclist
2017 年 1 月 8 日
It's somewhat faster to calculate consensus like this:
list = 'ACGT';
consensus = list(consensus_num);
0 件のコメント
参考
カテゴリ
Help Center および File Exchange で Creating and Concatenating Matrices についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!