- random: Selects data points randomly for the active set. This is the fastest option because it doesn't involve any optimization or criterion-based selection.
- sgma (Subset of Data using a Greedy Method for Approximation): Uses a greedy approach to select points that are most representative of the data distribution. This method is more computationally intensive than random selection but aims to choose a more informative subset.
- entropy: Selects points based on maximizing the differential entropy of the predictive distribution. This method tries to choose the most informative points and is computationally expensive, which explains the longer runtime compared to random selection.
- likelihood: Chooses points that maximize the marginal likelihood of the model. This method is also computationally intensive as it involves optimizing the likelihood function over subsets of the data.
ActiveSetMethod: entropy | GPR
11 ビュー (過去 30 日間)
古いコメントを表示
Hello,
I was wondering what option to select for ActiveSetMethod when fitting a Gaussian procces model. Since I have too many data I use the option subset of data point ('FitMethod','sd'), and -'ActiveSetSize',2000- to select only two thousands points. So far I understood, fitrgp select randomly 2000 points from the data set. Some questions arrises:
- Do GPR use the other points in the data set (for training)? Where? I saw that in the RegressionGP object there is saved all the data and some matrices have the size of all data (for example matrix W, Alpha,...).
- In spite of choosing the points randomly Matlab have the option 'ActiveSetMethod' with four possible values: random (default), sgma, entropy, likelihood. Is there any documentation of what does each option specifically? When I choose entropy, fitgpr takes so long in comparison to random (21 min. vs less than 5). Why is so different?
0 件のコメント
採用された回答
Aditya
2025 年 2 月 4 日 4:52
Hi Marius,
When using Gaussian Process Regression (GPR) with a large dataset in MATLAB, you can employ the 'FitMethod', 'sd' option to fit the model using a subset of data points, known as the active set. This approach helps manage computational complexity by reducing the number of data points used in training. Here's a breakdown of your questions and the options available:ActiveSetMethod Options
0 件のコメント
その他の回答 (0 件)
参考
カテゴリ
Help Center および File Exchange で Gaussian Process Regression についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!