Importing table/array from website to Matlab

2 ビュー (過去 30 日間)
Erik Börgesson
Erik Börgesson 2019 年 7 月 3 日
回答済み: Erik Börgesson 2019 年 7 月 3 日
Hello. I am trying to import the central table from this web url 'https://tennisabstract.com/reports/atp_elo_ratings.html', from the players name to the "grass" column into MatLab, as an array or table. I want to do it this way because it keeps updating every week, but I do not know how to approach this problem which makes me need your help.
Thank you, Erik

採用された回答

Guillaume
Guillaume 2019 年 7 月 3 日
First, note that importing data from html is always going to be very iffy. html is a presentation format designed to display things to humans, it's not design for data transfer and you're going to have to remove all the presentation cruft to get at your data.
So, the first thing you should try is contacting the website to see if they have a direct interface to the underlying database.
Bearing this in mind, the following will import your data with the current format of the website. Any change, even minor to the format of that page may break the code.
%definition of patterns used to locate required information with a table row:
intpattern = '<td[^>]*>(\d+)</td>';
linkpattern = '<td[^>]*><a[^>]+>([^<]+)</a></td>';
numberpattern = '<td[^>]*>(\d+(\.\d+)?)</td>';
anypattern = '<td[^>]*>([^<]+)</td>';
emptypattern = '<td[^>]*></td>';
%read and parse html
html = webread('https://tennisabstract.com/reports/atp_elo_ratings.html');
tabledata = regexp(html, ['<tr[^>]*>', ...
intpattern, ...
linkpattern, ...
numberpattern, ...
numberpattern, ...
emptypattern, ...
numberpattern, ...
numberpattern, ...
numberpattern, ...
emptypattern, ...
anypattern, ...
numberpattern, ...
numberpattern], 'tokens');
assert(~isempty(tabledata), 'Failed to parse html according to pattern. The format of the page may have changed');
tabledata = cell2table(vertcat(tabledata{:}), 'VariableNames', {'Rank', 'Player', 'Age', 'ELO', 'Hard', 'Clay', 'Grass', 'Peak_Match', 'Peak_Age', 'Peak_ELO'});
tabledata = convertvars(tabledata, [1, 3:7, 9, 10], @str2double);
tabledata.Player = strrep(tabledata.Player, '&nbsp;', ' ')
  5 件のコメント
Guillaume
Guillaume 2019 年 7 月 3 日
Don't use c as a variable name. It's meaningless and doesn't say anything about what it contains.
Assuming, you've imported the data as tabledata:
>> elo(tabledata, 'Ivan Nedelko', 'Kevin King')
ans =
0.230209216637309
0.769790783362691
Guillaume
Guillaume 2019 年 7 月 3 日
Note: you could calculate the odds for matches of every player against any player with:
Q = 10 .^ (tabledata.ELO / 400);
odds = Q ./ (Q + Q.');
odds(r, c) is then the odds of tabledata.Player{r} winning against tabledata.Player{c}

サインインしてコメントする。

その他の回答 (1 件)

Erik Börgesson
Erik Börgesson 2019 年 7 月 3 日
Thank you so much. I understand and it works just like I wanted it to.
Erik

カテゴリ

Help Center および File ExchangeData Import and Export についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by