Reading conetent from web url

Question

0 投票

I know how to read urls and save the content for further analyzing the data.

The issue I am facing is that I want to read certain content of a url in a specif way;

For e.g from this url https://www.gem.wiki/Almaty-2_power_station. I would like to read table 2 in a table format or tables with having specific words in it.

On exploring internet I figured out that I can read table directly from urls but I am not sure the table I want to read from the url is actual table or just text content.

Any help will be great

2 件のコメント
なしを表示なしを非表示

Mario Malic 2024 年 8 月 29 日

There is no content on this page.

Voss 2024 年 8 月 29 日

Try without the period at the end:

https://www.gem.wiki/Almaty-2_power_station

サインインしてコメントする。

サインインしてこの質問に回答する。

Follow Question

Answer 1

Rahul 2024 年 8 月 30 日

MATLAB Online で開く

1 投票

Hi @PS,

I understand that you are trying to read the content of 'Table 2' from url https://www.gem.wiki/Almaty-2_power_station .

You can achieve the desired result by following the following code:

url = 'https://www.gem.wiki/Almaty-2_power_station';  
htmlContent = webread(url);  % Reading the content from the url
tree = htmlTree(htmlContent);
tables = findElement(tree, "table"); % Finding the tables from the DOM tree
secondTableElement = tables(4); % Here I have tables the index as 4 as some other elemts are of the HTML page are also getting considered as tables.
% Find all rows in the second table
rows = findElement(secondTableElement, "tr");
% Initialize a cell array to store table data
tableData = {};
columnNames = {};
headerCells = findElement(rows(1), "th");
% Extract header text
for j = 1:numel(headerCells)
    columnNames{j} = strtrim(extractHTMLText(headerCells(j)));
end
% Extract data rows
for i = 2:numel(rows)  
    
    cells = findElement(rows(i), "td");
    
    % Extract text from each cell
    rowData = cell(1, numel(cells));
    for j = 1:numel(cells)
        rowData{j} = strtrim(extractHTMLText(cells(j)));
    end
    tableData = [tableData; rowData];
end
% The following part is just to get a string cell array for the header
headerCellstring = cell(size(columnNames));
for i = 1:numel(columnNames)
    headerCellstring{i} = columnNames{i}{1};
end
% Obtain the table using 'cell2table' function
secondTable = cell2table(tableData, 'VariableNames', headerCellstring);