Possible Incorrect Documentation on ksdensity

Question

David Gillcrist 2024 年 10 月 8 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2158420-possible-incorrect-documentation-on-ksdensity

回答済み: Umar 2024 年 10 月 9 日

I'm trying to implement a custom version of ksdensity. In the documentation the default way of calculating the bandwidth is said to be via Silverman's Rule-of-Thumb, i.e. for a bandwidth h this rule would give

This is according to the wikipedia article on Kernal Density Estimation. However, upon rooting about in matlab files the default bandwidth is calculated in the matlab function: matlab.internal.math.validateOrEstimateBW (run open matlab.internal.math.validateOrEstimateBW if you want to view it in its entirety). Lines 64–68 are shown below and are what is relevant

if isequal(bw, 'normal-approx')
      if all(sigma>0)
          % Default window parameter is optimal for normal distribution
          % Scott's rule
          bw = sigma * (4/((d+2)*N))^(1/(d+4));
      else
          ... % Unimportant
      end
else
    ... % Unimportant
end     

The 'normal-approx' is the default setting for bandwidth estimation and it should be the rule presented above, however, it is clearly different and is referenced as "Scott's Rule". This could be an issue of wikipedia referencing the wrong bandwidth calculation and that Scott's Rule is, in fact, the same as Silverman's Rule-of-Thumb, but it's been hard to find proper confirmation of this—for example this presentation from UBC has different rule labelled as Silverman's Rule-of-Thumb—as I cannot find Silverman's original paper where he preportedly first introduced this rule. If someone could confirm that this is in fact an error in code or an error in my understanding of the bandwidth calculation, I would be greatly appreciative.

2 件のコメント
なしを表示なしを非表示

Torsten 2024 年 10 月 8 日

You should address this question to the MATLAB development team, not to the forum members as poor end users.

the cyclist 2024 年 10 月 9 日

This question triggered a distant memory. I searched and found this question and answer from 8 years ago.

Spoiler: It's not going to help.

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Umar 2024 年 10 月 9 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2158420-possible-incorrect-documentation-on-ksdensity#answer_1529265

Hi @David Gillcrist,

After going through your comments and studying the documentation provided at the link below

https://www.mathworks.com/help/stats/ksdensity.html?s_tid=doc_ta#btpl6_1-1

To clarify your inquiry regarding the bandwidth estimation for kernel density estimation (KDE) in MATLAB versus traditional statistical rules, let me delve into each component:

Understanding Silverman’s and Scott’s Rules

Both formulas aim to optimize density estimation under different distributional assumptions.

MATLAB's Bandwidth Calculation

In your provided MATLAB snippet from matlab.internal.math.validateOrEstimateBW, it appears that MATLAB defaults to a bandwidth estimation method labeled as "normal-approx," which aligns more closely with Scott's Rule rather than Silverman's:

bw = sigma * (4/((d+2)*N))^(1/(d+4));

This formula indeed suggests that it uses Scott’s approach by employing a constant derived from normal distribution assumptions.

Clarification on Literature References

The confusion often arises because both Silverman and Scott provide estimates based on similar principles but differ slightly in their constants due to their unique derivations. For instance: Silverman adjusts his constants to achieve optimality across various distributions, while Scott focuses specifically on normal distributions and reference you mentioned from UBC likely conflates these methods or may be contextualizing them differently.

Practical Implications

Your personal experience resonates with common practice among statisticians. Many practitioners prefer adjusting bandwidth downwards (e.g., using factors like 0.5 or lower) to avoid over-smoothing, especially with smaller sample sizes where finer details are crucial.

Here are some additional insights I would like to share with you.

Depending on your data distribution characteristics (e.g., skewness or presence of outliers), you might want to explore robust bandwidth selectors beyond Silverman’s or Scott’s rules. For instance, adaptive methods can provide better performance in heterogeneous data contexts. Also, bear in mind that different statistical software packages may implement these rules with slight variations, leading to discrepancies in output. Therefore, when comparing results across platforms (e.g., R vs MATLAB), it's essential to understand these underlying implementations.

I do agree with @Torsten’s comments about, “You should address this question to the MATLAB development team, not to the forum members as poor end users.”

Hope this helps.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Possible Incorrect Documentation on ksdensity

2 件のコメント
なしを表示なしを非表示

回答 (1 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Possible Incorrect Documentation on ksdensity

2 件のコメント なしを表示なしを非表示

回答 (1 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

2 件のコメント
なしを表示なしを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示