Levenshtein distance: String scanning
古いコメントを表示
I'm in desperate need of help with creating a Levenshtein distance function with scanning behavior.
Basically there is a target string (s) which is supposed to scan a longer experimental string (s2). For example a target string (s) of length m = 3 should be compared to the first 3 entries of an experimental string (s2). Afterwards the target string (s) should be "moved" by 1 step to be compared to the entries 2-4 of the experimental string (s2) and so on.
Creating such a scanning behavior is not a problem, the real problems are insertions and deletions. After an insertion or deletion I still want to compare 2 strings of equal length. Therefore when inserting another entry, the last entry of the experimental string (s2) has to be removed, and when deleting an entry the next value of the experimental string should be added. Simply spoken the string lengths (m and n) should always be constant and have the same length.
Here a simple example:
For the strings s = [1 2 3] and s2 = [1 3 2 3 ...] I want to compare parts of the same length m = 3, therefore the first comparison should be between s = [1 2 3] and s2 = [1 3 2]. A deletion at position #2 should now change s2 to [entry1 entry3 entry4], in this case to s2 = [1 2 3], leading to an edit distance of 1 (with a deletion cost of 1). Here, an insertion of the value 2 at position #2 should lead to s2 = [entry1 insertion entry2] and therefore also s2 = [1 2 3].
Any ideas how I could achieve this?
I'm using an Levenshtein function similar to this:
% Variables
s = [1 2 3];
m=length(s);
s2 = [1 3 2 3 1 2 3];
t = s2(1:1:m);
n=length(t);
% Edit Matrix
mat=zeros(m+1,n+1);
for i=1:1:m
mat(i+1,1)=i;
end
for j=1:1:n
mat(1,j+1)=j;
end
for i=1:m
for j=1:n
if (s(i) == t(j))
mat(i+1,j+1)=mat(i,j);
else
mat(i+1,j+1)= min([mat(i+1,j)+1,mat(i,j+1)+1,mat(i,j)+1]);
end
end
end
%Edit distance
d = mat(m+1,n+1);
5 件のコメント
Stephen23
2018 年 12 月 13 日
@Marcel Dorer: your actual question is not very clear:
- Calculating the Levenshtein distance is not hard, so this is not the problem.
- Creating a sliding window is also not hard, a loop would do it easily.
- You mention making insertions and deletions so that the strings match, but do you not describe any restrictions on these... so without any restriction, you can insert/delete as you desire until the strings match. So far this seems to be equivalent to your current description:
S2 = repmat(S1,1,N)
but I doubt that that is actually what you want. What exactly do you want? Please show the complete expected output vector for your two example input vectors (or whatever output you expect to get - all of it please).
Marcel Dorer
2018 年 12 月 14 日
編集済み: Marcel Dorer
2018 年 12 月 14 日
"I now want to know if the long string (s2) contains the short string (s). "
Do you want the actual steps required to make this transformation from one string to the other? The code you gave (which seems to work, calculating the Levenshtein distance) does not actually need to make those changes to the input strings: it simply calculates the distance (which is usually what people are interested in). This means there are no strings with values being inserted or deleted, and nothing changes size. So it is not clear what your explanation relates to.
In any case, your goal "I now want to know if the long string (s2) contains the short string (s). " does not require Levenshtein distance at all: I can think of several simple ways to check this using MATLAB. What is the actual goal?
Marcel Dorer
2018 年 12 月 14 日
編集済み: Marcel Dorer
2018 年 12 月 14 日
per isakson
2018 年 12 月 14 日
回答 (0 件)
カテゴリ
ヘルプ センター および File Exchange で Get Started with MATLAB についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!