How do I extract the contents of an HTML table on a web page into a MATLAB table?

65 ビュー (過去 30 日間)
Pat Canny
Pat Canny 2020 年 6 月 23 日
コメント済み: Bucks Lin 2023 年 4 月 14 日
I'd like to plot and analyze the TSA traveler data from this website: https://www.tsa.gov/coronavirus/passenger-throughput
The data is embedded on the page as an HTML table element.
How do I extract the table content into a MATLAB table?

採用された回答

Pat Canny
Pat Canny 2020 年 6 月 23 日
You can extract the <table> content, which is all stored in a set of <td> tags, as a string array and go from there.
You first need to use findElement and extractHTMLText on an htmlTree object.
You then can use reshape to arrange the data, then use array2table to convert to a table.
Here is one approach:
travel_data = webread('https://www.tsa.gov/coronavirus/passenger-throughput');
travel_data_tree = htmlTree(travel_data);
selector = "td";
subtrees = findElement(travel_data_tree,selector);
str = extractHTMLText(subtrees);
table_data = str(4:end); % first three elements are just the column names
reshape_ncols = 3;
reshape_nrows = length(table_data)/reshape_ncols;
table_data_reshaped = reshape(table_data,reshape_ncols,reshape_nrows)';
% Convert to table
traveler_data_table = array2table(table_data_reshaped,'VariableNames',["Date" "Travelers_Today" "Travelers_Last_Year"]); % I got lazy with VariableNames, I know.
% Convert data types from strings to appropriate types
traveler_data_table.Date = datetime(traveler_data_table.Date);
traveler_data_table.Travelers_Today = str2double(traveler_data_table.Travelers_Today);
traveler_data_table.Travelers_Last_Year = str2double(traveler_data_table.Travelers_Last_Year);
traveler_data_table.Traveler_Ratio = traveler_data_table.Travelers_Today ./ traveler_data_table.Travelers_Last_Year;
% Plot the results
figure
plot(traveler_data_table.Date,traveler_data_table.Traveler_Ratio)
title("TSA Traveler Ratio by Date (2020 vs. 2019)")
grid on
% Some more fun analysis
% When did it bottom out?
[min_ratio,idx] = min(traveler_data_table.Traveler_Ratio);
min_ratio_pct = 100*min_ratio;
min_date = traveler_data_table.Date(idx);
disp("The minimum traveler ratio of " + min_ratio_pct + "% occurred on " + string(min_date))
latest_pct = 100*traveler_data_table.Traveler_Ratio(1);
disp("The current ratio is " + latest_pct + "%")
  2 件のコメント
Yesid Fonseca Vargas
Yesid Fonseca Vargas 2021 年 3 月 16 日
Thank u, It works perfectly to solve my problem
andrea ggdsd
andrea ggdsd 2021 年 10 月 26 日
thanks!

サインインしてコメントする。

その他の回答 (1 件)

Christopher Creutzig
Christopher Creutzig 2022 年 6 月 7 日
Starting in R2021b, you can directly use readtable for HTML tables:
readtable("https://www.tsa.gov/coronavirus/passenger-throughput",...
FileType="html",ReadVariableNames=true,ThousandsSeparator=",")
ans = 364×5 table
Date 2022 2021 2020 2019 __________ __________ __________ __________ __________ 06/05/2022 2.3872e+06 1.9847e+06 4.4126e+05 2.6699e+06 06/04/2022 1.9814e+06 1.6812e+06 3.5302e+05 2.226e+06 06/03/2022 2.3326e+06 1.8799e+06 4.1968e+05 2.6498e+06 06/02/2022 2.2132e+06 1.8159e+06 3.9188e+05 2.6239e+06 06/01/2022 1.9991e+06 1.5879e+06 3.0444e+05 2.3702e+06 05/31/2022 2.1081e+06 1.6828e+06 2.6774e+05 2.2474e+06 05/30/2022 2.3122e+06 1.9002e+06 3.5326e+05 2.499e+06 05/29/2022 2.0965e+06 1.6505e+06 3.5295e+05 2.5556e+06 05/28/2022 1.9942e+06 1.6058e+06 2.6887e+05 2.1172e+06 05/27/2022 2.3847e+06 1.9596e+06 3.2713e+05 2.5706e+06 05/26/2022 2.3799e+06 1.8545e+06 3.2178e+05 2.4858e+06 05/25/2022 2.1477e+06 1.6182e+06 2.6117e+05 2.269e+06 05/24/2022 2.0207e+06 1.4708e+06 2.6484e+05 2.4536e+06 05/23/2022 2.329e+06 1.7474e+06 3.4077e+05 2.5122e+06 05/22/2022 2.3509e+06 1.8637e+06 2.6745e+05 2.0707e+06 05/21/2022 1.9888e+06 1.55e+06 2.5319e+05 2.1248e+06

カテゴリ

Help Center および File ExchangeGraph and Network Algorithms についてさらに検索

タグ

製品


リリース

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by