Categorical Data preprocessing for Data mining

Hello friends
I have been working on the Tanzania wells state ,with Taarifa data obtained from DrivenData, problem for my ML practice; and I am now trying to remove misspellings in the installer and funder columns. Anyone who's tried this to please help me on how to go about it. And if there be a faster way, that would be very helpful.
Oh, thanks
I am trying to clean out misspellings from the installer and funder columns. For the moment I am using regular expressions; though the data is too much, and seems to be taking longer.
For instance, when trying to correct those for world bank I tried this expression which is still failing
pat11='wo(rd|rdl|uld|rld)?\s((b\w*|nk|divisio)$)?[^vd]';
newDataClean.installer=regexprep(newDataClean.installer,pat11,'world bank');
Here i was testing the expression in Atom, but it fails to correctly replace those selected words
However, I am still wondering if there could be another "faster" way of approaching the issue!

1 件のコメント

KSSV
KSSV 2021 年 10 月 6 日
Question is not clear. Can you elaborate with an example?

サインインしてコメントする。

回答 (0 件)

カテゴリ

ヘルプ センター および File ExchangeMATLAB についてさらに検索

質問済み:

2021 年 10 月 6 日

編集済み:

2021 年 10 月 6 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by