How to identify data set characteristics which influence the success of a model using those data sets as input.

Question

Wayne Martin 2024 年 4 月 22 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2110286-how-to-identify-data-set-characteristics-which-influence-the-success-of-a-model-using-those-data-set

コメント済み: Wayne Martin 2024 年 5 月 3 日

I am studying the effect of hurricanes on coral reefs and have developed a damage prediction model which uses as inputs the fragility and distribution of different coral species at 150 post-storm survey sites. I can also create multiple simulated reefs by randomly assigning species, colonies and damage from the measured probability distribution functions of those parqameters for each species. When I make 1000 simulated reef experiments the results of my damage prediction are widly distributed from terrible to great. I need to mine the 1000 simultaed reefs to identify patterns which are influencing the success of the model. I expect this is a common scenario and would apprecieate any guidance on which tools to use and how to proceed. I have the statistics and machine learning toolbox.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Yatharth 2024 年 5 月 3 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2110286-how-to-identify-data-set-characteristics-which-influence-the-success-of-a-model-using-those-data-set#answer_1451831

Hello Wayne,

To answer your question on how you can identify data characteristics which influence the success of a model.

You can perform some basic Exploratory Data Analysis (EDA) to understand the distributions of your parameters and outcomes, identify outliers, and see if there are any obvious patterns or correlations.

Use "histogram", "boxplot", or "scatter" functions to visualize the distributions of your parameters and outcomes.
Use "corrplot" to visualize correlations between parameters and between parameters and outcomes.

With many input parameters, it's crucial to identify which ones significantly impact the model's outcome. Feature selection techniques can help reduce dimensionality and focus on the most influential variables.

Use "sequentialfs" (sequential feature selection) to identify the most important features. This function can help you find a subset of the input variables that most effectively predict the outcome.
Consider using principal component analysis (PCA) with "pca" to reduce dimensionality and possibly uncover underlying patterns in your data.

Here are the links for some of the mentioned functions:

scatter: https://www.mathworks.com/help/matlab/ref/scatter.html
corrplot: https://www.mathworks.com/help/econ/corrplot.html
sequentialfs: https://www.mathworks.com/help/stats/sequentialfs.html
pca: https://in.mathworks.com/help/stats/pca.html

Here are some examples that might be useful in your case:

For feature selection: https://www.mathworks.com/help/stats/selecting-features-for-classifying-high-dimensional-data.html
For classification : https://www.mathworks.com/help/stats/classification-example.html
For cross validation: https://www.mathworks.com/help/stats/crossval.html

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Wayne Martin 2024 年 5 月 3 日

Thank you very much! Wayne

サインインしてコメントする。

How to identify data set characteristics which influence the success of a model using those data sets as input.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

How to identify data set characteristics which influence the success of a model using those data sets as input.

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示