File Exchange

image thumbnail

Using Tall Arrays with Big Data - NYC Taxi Demos

version (16.5 MB) by Gabriel Ha
Simple coding techniques to access and process big data, using NYC taxi datasets as an example


Updated 01 Nov 2016

View License

Requires MATLAB 2016b or later.
Use this code to provide a framework for your own big data analysis.
Contains all MATLAB files needed to replicate the demos featured in the fast-paced "Using Tall Arrays with Big Data" video [ ], which is highly recommended for you to watch and obtain context:
1. Pickups demo [.mlx - MATLAB live script] - requires Mapping Toolbox and Distributed Computing Toolbox
2. Averages demo [.mlx - MATLAB live script] - requires Statistics Toolbox and Distributed Computing Toolbox
3. wms.mat [needed for Pickups demo]
4. load_settings.m [needed for Pickups demo]
This zip file does NOT contain datasets. Datasets can be downloaded at Only one dataset is needed to run the scripts.
This zip file DOES contain the following additional files, which are generated from running the Pickups demo on ALL 2015 Yellow cab datasets:
5. .gif of all 2015 pickups by hour ("raw" version)
6. .gif of all 2015 pickups by hour ("cleaned" version)
7. .fig of all 2015 pickups summarized in a 2D histogram. This can be opened (and manipulated) in MATLAB.

Comments and Ratings (9)

Thank you for this useful demo. Could you please point me to a tutorial for beginners on how to start up a spark instance to be able to process the data in some cloud service (ECS or Google)? Thanks!

tai nguyen

Chosen Zhou

Gabriel Ha

Hi Hsiang-Yu,

Happy to help out with your questions/issues! .mlx files are MATLAB Live Scripts, which were introduced as of R2016a as part of the Live Editor feature. .mlx files can be automatically opened in MATLAB, just like a normal .m file (and run with F5). We encourage users to try out the Live Editor, and since the code provided is intended to work on R2016b or later (when tall arrays were introduced), you should necessarily be able to try out the code in a release that supports opening .mlx files.

What version of MATLAB are you currently on? Also, in the code failure, were you attempting to work with a tall array, or was it some other data structure? (I.e. did you modify the variable tt to not be a tall array object/How is tt initialized in your code?)

This, '2015-01-15 19:05:39', is one of the record for tt.tpep_pickup_datetime. I am not able to use hour() function with this as the input parameter. Then the code is failed at thie line
% Derive new values with simple syntax.
tt.HourOfPickup = hour(tt.tpep_pickup_datetime);

Thanks for the package. Would you let me know once I open pickup demo .mlx, what should I do then? I never used .mlx before. Could we run .mlx directly? Or do we need to copy & paste your codes? I just like to know how to run these codes.

Sara Egidi


Fixed a critical syntax bug in the NYC Averages demo that was causing the final tall array to contain only data outliers instead of excluding them (involving inserting a single ~ character...amazing how that makes all the difference)

updated required products

added hyperlink to video

added MathWorks copyright to .m file in zip file.

MATLAB Release Compatibility
Created with R2016b
Compatible with any release
Platform Compatibility
Windows macOS Linux