How to use Unicode numeric values in regexprep?
古いコメントを表示
How can "Häagen-Dasz" be converted to "Haagen-Dasz" using Uincode numeric values? For example,
regexprep('Häagen-Dasz','ä','A')
works fine, but
regexprep('Häagen-Dasz','\x{C4}','a')
does not. Here, the hexadecimal \x{C4} stands for [latin capital letter a] with diaeresis, i.e. [ä].
1 件のコメント
VBBV
2024 年 3 月 28 日
I am not sure if i understand your question right, but Read this answer below
採用された回答
その他の回答 (2 件)
inp = 'Häagen-Dasz';
baz = @(v)char(v(1)); % only need the first decomposed character.
out = arrayfun(@(c)baz(py.unicodedata.normalize('NFKD',c)),inp) % remove diacritics.
Read more:
https://docs.python.org/3/library/unicodedata.html
https://stackoverflow.com/questions/16467479/normalizing-unicode
regexprep('Häagen-Dasz','ä','A')
regexprep('Häagen-Dasz','ä','\x{C4}')
2 件のコメント
regexprep('Häagen-Dasz','\x{e4}','a')
VBBV
2024 年 3 月 28 日
The unicode character for small a is \x{e4}
カテゴリ
ヘルプ センター および File Exchange で App Building についてさらに検索
製品
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!