Replacing special character 'É' to 'E'

Hi,
Is there a Matlab function to replace the special characters (like 'É') to the regular UTF-8 or ISO-8859-1?
Thanks,

1 件のコメント

Stephen23
Stephen23 2022 年 11 月 28 日
"regular UTF-8 or ISO-8859-1"
Both UTF-8 (encodes all Unicode characters) and ISO-8859-1 include "É"... Perhaps you meant to ask something like "how to remove diacritics from characters?", which would match your question title.

サインインしてコメントする。

 採用された回答

Jonas
Jonas 2022 年 11 月 28 日

0 投票

looks like there are only manual solutions.
Stackoverflow is your friend ;-)

6 件のコメント

Stephen23
Stephen23 2022 年 11 月 28 日
"Stackoverflow is your friend"
Stackoverflow: Caecus caeco dux
Jonas
Jonas 2022 年 11 月 28 日
non cogito ergo sum
Pete sherer
Pete sherer 2022 年 11 月 28 日
移動済み: Stephen23 2022 年 11 月 29 日
Thanks very much guys. I used the function in the stack overflow.
Jonas
Jonas 2022 年 11 月 29 日
for competeness, i repeat the code of the given site here:
written by Jim Goodall
please also note, that the list does not have to be complete, see e.g. this list of Wikipedia
function [clean_s] = removediacritics(s)
%REMOVEDIACRITICS Removes diacritics from text.
% This function removes many common diacritics from strings, such as
% á - the acute accent
% à - the grave accent
% â - the circumflex accent
% ü - the diaeresis, or trema, or umlaut
% ñ - the tilde
% ç - the cedilla
% å - the ring, or bolle
% ø - the slash, or solidus, or virgule
% uppercase
s = regexprep(s,'(?:Á|À|Â|Ã|Ä|Å)','A');
s = regexprep(s,'(?:Æ)','AE');
s = regexprep(s,'(?:ß)','ss');
s = regexprep(s,'(?:Ç)','C');
s = regexprep(s,'(?:Ð)','D');
s = regexprep(s,'(?:É|È|Ê|Ë)','E');
s = regexprep(s,'(?:Í|Ì|Î|Ï)','I');
s = regexprep(s,'(?:Ñ)','N');
s = regexprep(s,'(?:Ó|Ò|Ô|Ö|Õ|Ø)','O');
s = regexprep(s,'(?:Œ)','OE');
s = regexprep(s,'(?:Ú|Ù|Û|Ü)','U');
s = regexprep(s,'(?:Ý|Ÿ)','Y');
% lowercase
s = regexprep(s,'(?:á|à|â|ä|ã|å)','a');
s = regexprep(s,'(?:æ)','ae');
s = regexprep(s,'(?:ç)','c');
s = regexprep(s,'(?:ð)','d');
s = regexprep(s,'(?:é|è|ê|ë)','e');
s = regexprep(s,'(?:í|ì|î|ï)','i');
s = regexprep(s,'(?:ñ)','n');
s = regexprep(s,'(?:ó|ò|ô|ö|õ|ø)','o');
s = regexprep(s,'(?:œ)','oe');
s = regexprep(s,'(?:ú|ù|ü|û)','u');
s = regexprep(s,'(?:ý|ÿ)','y');
% return cleaned string
clean_s = s;
end
Jonas
Jonas 2022 年 11 月 29 日
also it is qeustionable to do this whole thing since the change of letters can change th emeaning of the words, also in German for example, ä, ö and ü are changed to ae, oe and ue, but the same procedure does not make sence in other languages like turkish
Stephen23
Stephen23 2023 年 12 月 12 日
編集済み: Stephen23 2023 年 12 月 14 日
@Jonas: your concern is well-founded. That function confuses two related (yet distinct) aspects of languages:
Note that ligatures are not diacritics, so splitting the ligatures Æ,Œ, etc. is not removing diacritics. The esszett character ß also does not have any diacritics nor is it considered to be a ligature (although it does derive from one). The lexicographical sorting rules of some languages do require treating those ligatures and characters as being equivalent to some other characters.... but that is distinct from removing diacritics from characters.
The function also fails to remove diacritics from other (even Latin-based) characters, e.g. Ǣ.
The function also returns the wrong character in some cases, e.g. eth ð has no diacritic. That it is commonly transliterated into latin script as d is irrelevant (and misleading: the digraph th would be better).
In short: the function is misnamed and does not really do what it claims.

サインインしてコメントする。

その他の回答 (2 件)

Stephen23
Stephen23 2022 年 11 月 28 日
編集済み: Stephen23 2022 年 11 月 28 日

0 投票

"Is there a Matlab function to replace the special characters (like 'É')"
You can call Python from MATLAB, and it can do the heavy-lifting:
inp = 'É';
baz = @(v)char(v(1)); % only need the first decomposed character.
out = baz(py.unicodedata.normalize('NFKD',inp)) % to remove diacritics.
out = 'E'
Read more:
John D'Errico
John D'Errico 2022 年 11 月 28 日
編集済み: John D'Errico 2022 年 11 月 28 日

0 投票

Easy peasy.
str = 'ABCDEFGHIJKÉÉÀÀÄÄabcdefghijkl'
str = 'ABCDEFGHIJKÉÉÀÀÄÄabcdefghijkl'
strrep(str,'É','E')
ans = 'ABCDEFGHIJKEEÀÀÄÄabcdefghijkl'
If there are other special characters you want replaced, strrep will handle them too, but it looks like you would need to do them one at a time with strrep. But other tools would certainly work too. Certainly regexp, but I've never been very good at regular expressions. :) This will work though:
badchar = 'ÉÀÄ';
goodchar = 'EAA';
[u,v] = ismember(str,'ÉÀÄ');
str(u) = goodchar(v(u))
str = 'ABCDEFGHIJKEEAAAAabcdefghijkl'

1 件のコメント

Robert Wagner
Robert Wagner 2023 年 12 月 12 日
but I've never been very good at regular expressions. :) ---> I've never tried to be in the first place... :-)))

サインインしてコメントする。

カテゴリ

ヘルプ センター および File ExchangeCharacters and Strings についてさらに検索

製品

リリース

R2022a

質問済み:

2022 年 11 月 28 日

編集済み:

2023 年 12 月 14 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by