The Levenshtein distance is a charater-based string metric used to measure the difference between two strings (for details, look here). In this problem, you need to implement a word-based version of the Levenshtein distance.
Given two strings, compute the minimum number of word-edits to transform one string into another. The allowable edits are insertion, deletion, or substitution of a single word. Assume words are case-insensitive. Contractions and hyphenated words are allowed, but you may ignore other punctuation.
Example
If
s1 = 'I do not like MATLAB'
s2 = 'I love MATLAB a lot'
then
d = 4
because at least four edits are required to transform s1 into s2 (substitution on the last four words).
Solution Stats
Problem Comments
3 Comments
Solution Comments
Show comments
Loading...
Problem Recent Solvers133
Suggested Problems
-
308 Solvers
-
Find the sum of the elements in the "second" diagonal
1202 Solvers
-
5053 Solvers
-
134 Solvers
-
Find the sides of an isosceles triangle when given its area and height from its base to apex
2136 Solvers
More from this Author43
Problem Tags
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
Question on Test Suite #2.
s1 = 'Which words need to be edited?';
s2 = 'Can you tell which words need to be edited?';
d_correct = 3;
I see this as a 2, Substitute w for W and insert 'Can you tell ' - done. Where is the 3rd change?
Richard - you do not have to substitute w for W (words are case-insensitive). And inserting 'Can you tell' is three edits, one for each word.
I understand how this problem was designed, but I disagree. For instance transforming s1 = 'I do not like MATLAB' into s2 = 'I love MATLAB a lot' should be a 2-word edit, because the 3-words 'do not like' could be grouped into 1-word and changed into 'love', as well as 'a lot' could be treated as just 1-word insertion after the word MATLAB. And that's probably how the original Leveshtein distance would measure it, since the algorithm returns 15-character edits: transforming 'do not like' into 'love' requires 9-character edits, and inserting ' a lot' after 'MATLAB' requires 6-character edits.