List of built-in data sets, please

154 ビュー (過去 30 日間)
Adam Danz
Adam Danz 2019 年 11 月 13 日
編集済み: Josh Meyer 2021 年 9 月 23 日
It would be great if Matlab documentation included a list of built-in data sets. This question has been asked before (2012, 2013, 2014, on statckoverflow) with at least 2200 views in the past 30 days but still I cannot find such a list.
The built-in mat files can be found within the matlabroot (see line of code below) but a list provided in the documentation would be helpful because...
  1. The built-in data sets aren't just in 1 folder; they are distributed across many folders - some folders that we may not think to look in
  2. It's best for users to avoid poking around in the root directories anyway
  3. Not all mat files in the root directories are example data sets so a function that scrapes mat files from root directories isn't efficient
  4. A documented list makes it easy to quickly recall a file name to a data set we are already familiar with
  5. Quickly accessing built-in data makes it easy to test ideas rather than having to generate a fake data set
  6. With a documented list, we can become more familiar with data sets provided by Matlab instead of discovering them after years of daily Matlab use
  7. In the future, we can reference archived documentation to determine when a data set became available
winopen(fullfile(matlabroot,'toolbox','matlab','demos'))
Who do we need to lobby to get this list in the official documentation?
  2 件のコメント
Adam Danz
Adam Danz 2019 年 11 月 13 日
Moved to answers ;)
I started with base Matlab files but toolbox files are good, too. I know the fisheriris data set is often used in the Stats and ML toolbox.

サインインしてコメントする。

採用された回答

Josh Meyer
Josh Meyer 2021 年 9 月 23 日
編集済み: Josh Meyer 2021 年 9 月 23 日
In R2021b, there is now a documentation page with a selection of useful data sets that are in MATLAB:

その他の回答 (3 件)

Wendy Fullam
Wendy Fullam 2019 年 11 月 25 日
Just a side note that this request has also been shared with our documentation team for consideration on how to make this easier, going forward.
  1 件のコメント
Adam Danz
Adam Danz 2019 年 11 月 25 日
Thanks, Wendy & the documentation team!

サインインしてコメントする。


Adam Danz
Adam Danz 2019 年 11 月 13 日
編集済み: Adam Danz 2021 年 9 月 17 日
I started a list of built-in mat files and a brief description of their variables (r2019b). Since numeric variables are fairly easy to create on the fly, I only listed files that contain other classes of variables for now. Anyone with edit privileges should feel free to add to this list.
Filename Variable Class size ref Description
-----------------------------------------------------------------------------------
airlineResults.mat result table 29x2 [1] Col1: cell array of airline IDs; Col2: cell array of number of flights per day;
fatalities.mat fatalities table 49x8 [2] Numeric table of highway fatalities and othat statistics by US state
mapredout.mat Key cell 29x1 [3] Cell array of airline initials - some with additional chars
mapredout.mat Value cell 29x1 [3] Cell array of numbers
patients.mat 10-vars (mix) 100x1 [4] Column variables (cells, double, logical) containing information on 100 patients
usastates.mat usastates struct 49x1 [5] Fields: Lon, Lat, Name containing vectors of latitude, longitude, and state name
fisheriris.mat species cell 150x1 [6] 3 species of iris specimens
fisheriris.mat mean double 150x4 [6] sepal length, sepal width, petal length, and petal width for 150 iris specimens
hospital.mat hospital dataset 100x7 [7] 100 patients and 5 biometrics
census.mat cdate & pop double 21x1 [8] U.S. population data for the years 1790 through 1990 at 10 year intervals
outages.csv N/A timetable 1468x5 [9] T = table2timetable(readtable('outages.csv')); electric utility outages in the United States.
carsbig.mat 13-vars dbl, char 406x1+ [?] Car mak & model, engine, MPG, weight, etc.
Data_Canada 5-vars table, mat 41x5 [11] Canadian inflation & interest rates 1954-94
Also see a related thread on built-in images.
References

Steven Lord
Steven Lord 2019 年 11 月 13 日
What do you consider a data set?
The census MAT-file in toolbox/matlab/demos (used here among other places) seems like it's obviously a data set. So does outages.csv (used here.)
Are peppers.png and ngc6543a.jpg (used here) also data sets? They're image data, "peppers" appearing on 38 pages in the documentation for MATLAB and 61 times in Image Processing Toolbox. "ngc6543a" occurs less frequently, but still on 19 pages in the documentation for MATLAB.
What about the video file shuttle.avi (used here?)
Does the peaks function (often used in examples to create a simple piece of data to visualize as a surface plot) count as data or code? What about the functions that generate the predefined colormaps or the gallery function?
You can submit an enhancement request to Technical Support asking for such a list to be added to the documentation for one or more products. It would be useful to include in that enhancement request what constitutes a data set in your mind and what information about each data set you'd expect to see in such a list.
  1 件のコメント
Adam Danz
Adam Danz 2019 年 11 月 13 日
That's a good point. I realize the 2012 link in my question asks about image data sets (ie, image files such as jpg and png) but what I'm really looking for is a list of mat-file data sets such as the ones listed in the Statistics and Machine Learning Toolbox, Econometrics Toolbox, and Deep Learning Toolbox.. I've used the gallery function and find it very useful but it would be great to have instant access to tables, timetables, cell arrays, strctures, etc. Maybe it's just a problem with my ability to remember the mat file names.
Let's see if there are any bites from the Tech Support.

サインインしてコメントする。

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by