Requires MATLAB 2016b or later.
Use this code to provide a framework for your own big data analysis.
Contains all MATLAB files needed to replicate the demos featured in the fast-paced "Using Tall Arrays with Big Data" video [ http://www.mathworks.com/videos/matlab-tall-arrays-in-action-122883.html ], which is highly recommended for you to watch and obtain context:
1. Pickups demo [.mlx - MATLAB live script] - requires Mapping Toolbox and Distributed Computing Toolbox
2. Averages demo [.mlx - MATLAB live script] - requires Statistics Toolbox and Distributed Computing Toolbox
3. wms.mat [needed for Pickups demo]
4. load_settings.m [needed for Pickups demo]
This zip file does NOT contain datasets. Datasets can be downloaded at http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml. Only one dataset is needed to run the scripts.
This zip file DOES contain the following additional files, which are generated from running the Pickups demo on ALL 2015 Yellow cab datasets:
5. .gif of all 2015 pickups by hour ("raw" version)
6. .gif of all 2015 pickups by hour ("cleaned" version)
7. .fig of all 2015 pickups summarized in a 2D histogram. This can be opened (and manipulated) in MATLAB.
Gabriel Ha (2019). Using Tall Arrays with Big Data - NYC Taxi Demos (https://www.mathworks.com/matlabcentral/fileexchange/59353-using-tall-arrays-with-big-data-nyc-taxi-demos), MATLAB Central File Exchange. Retrieved .
Thank you for this useful demo. Could you please point me to a tutorial for beginners on how to start up a spark instance to be able to process the data in some cloud service (ECS or Google)? Thanks!
Happy to help out with your questions/issues! .mlx files are MATLAB Live Scripts, which were introduced as of R2016a as part of the Live Editor feature. .mlx files can be automatically opened in MATLAB, just like a normal .m file (and run with F5). We encourage users to try out the Live Editor, and since the code provided is intended to work on R2016b or later (when tall arrays were introduced), you should necessarily be able to try out the code in a release that supports opening .mlx files.
What version of MATLAB are you currently on? Also, in the code failure, were you attempting to work with a tall array, or was it some other data structure? (I.e. did you modify the variable tt to not be a tall array object/How is tt initialized in your code?)
This, '2015-01-15 19:05:39', is one of the record for tt.tpep_pickup_datetime. I am not able to use hour() function with this as the input parameter. Then the code is failed at thie line
% Derive new values with simple syntax.
tt.HourOfPickup = hour(tt.tpep_pickup_datetime);
Thanks for the package. Would you let me know once I open pickup demo .mlx, what should I do then? I never used .mlx before. Could we run .mlx directly? Or do we need to copy & paste your codes? I just like to know how to run these codes.
Fixed a critical syntax bug in the NYC Averages demo that was causing the final tall array to contain only data outliers instead of excluding them (involving inserting a single ~ character...amazing how that makes all the difference)
updated required products
added hyperlink to video
added MathWorks copyright to .m file in zip file.