Categorical Data preprocessing for Data mining
1 回表示 (過去 30 日間)
古いコメントを表示
Hello friends
I have been working on the Tanzania wells state ,with Taarifa data obtained from DrivenData, problem for my ML practice; and I am now trying to remove misspellings in the installer and funder columns. Anyone who's tried this to please help me on how to go about it. And if there be a faster way, that would be very helpful.
Oh, thanks
I am trying to clean out misspellings from the installer and funder columns. For the moment I am using regular expressions; though the data is too much, and seems to be taking longer.
For instance, when trying to correct those for world bank I tried this expression which is still failing
pat11='wo(rd|rdl|uld|rld)?\s((b\w*|nk|divisio)$)?[^vd]';
newDataClean.installer=regexprep(newDataClean.installer,pat11,'world bank');
Here i was testing the expression in Atom, but it fails to correctly replace those selected words
data:image/s3,"s3://crabby-images/1a5a4/1a5a470e3773051fdfd358c6b58a1f167e518b19" alt=""
However, I am still wondering if there could be another "faster" way of approaching the issue!
1 件のコメント
回答 (0 件)
参考
カテゴリ
Help Center および File Exchange で Analysis of Variance and Covariance についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!