Categorical Data preprocessing for Data mining

Samuel Katongole

2021 10 月 6

0 回答

4 ビュー (30 日間)

0 投票

Hello friends

I have been working on the Tanzania wells state ,with Taarifa data obtained from DrivenData, problem for my ML practice; and I am now trying to remove misspellings in the installer and funder columns. Anyone who's tried this to please help me on how to go about it. And if there be a faster way, that would be very helpful.

Oh, thanks

I am trying to clean out misspellings from the installer and funder columns. For the moment I am using regular expressions; though the data is too much, and seems to be taking longer.

For instance, when trying to correct those for world bank I tried this expression which is still failing

pat11='wo(rd|rdl|uld|rld)?\s((b\w*|nk|divisio)$)?[^vd]';
newDataClean.installer=regexprep(newDataClean.installer,pat11,'world bank');

Here i was testing the expression in Atom, but it fails to correctly replace those selected words

However, I am still wondering if there could be another "faster" way of approaching the issue!

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

KSSV 2021 年 10 月 6 日

Question is not clear. Can you elaborate with an example?

サインインしてコメントする。

サインインしてこの質問に回答する。

Follow Question

回答 (0 件)

サインインしてこの質問に回答する。

カテゴリ

ヘルプセンターおよび File Exchange で MATLAB についてさらに検索

製品

リリース

R2017b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

Categorical Data preprocessing for Data mining

1 件のコメント -1 件の古いコメントを表示 -1 件の古いコメントを非表示

回答 (0 件)

カテゴリ

製品

リリース

タグ

参考

Community Treasure Hunt

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示