How to read files from a particular website?

Question

Pouya 2022 年 3 月 3 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1662330-how-to-read-files-from-a-particular-website

回答済み: VINAYAK LUHA 2024 年 1 月 4 日

Hello,

I'm having problem with matlab not recognizing the files in this link ( https://swarm-diss.eo.esa.int/#swarm/Level1b/Entire_mission_data/MAGx_HR/Sat_A )

There should be multiple files each about 300mb with their names starting with "SW_OPER_MAGA_HR". But instead matlab read something else as " 1x136910 char ".

Please see the code below:

clc
clear
web='https://swarm-diss.eo.esa.int/#swarm/Level1b/Entire_mission_data/MAGx_HR/Sat_A';
str=webread(web); 
fn=regexpi(str,'SW[A-Z_0-9]+.zip','match');
for k=1:size(fn,2)
 file=fn{k};
 unzip([web file(8:9)]);
end

Thank you in advance.

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Ive J 2022 年 3 月 3 日

Your url is protected by cookies, I guess your best chance is to try with Python. MATLAB is quite immature for web scraping.

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

VINAYAK LUHA 2024 年 1 月 4 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1662330-how-to-read-files-from-a-particular-website#answer_1383086

MATLAB Online で開く

htmlFile.txt

Hello Pouya,

I understand that you're looking to download files organized as a table from the mentioned website using MATLAB and have already attempted to use the "webread" function, but instead, it gave you a character array.

The webread function did indeed deliver the HTML content of the page as anticipated.

To accomplish your goal, it's important to note that the table data on the website is dynamically generated, which means webread might not be the right tool for the task. Instead, you should consider saving the webpage as an HTML file and then utilizing htmlTree to extract the necessary links from the HTML source code.

Here's a code along with explanations on how to proceed:

% Read the HTML content from a saved file
html = fileread('htmlFile.html');
% Parse the HTML content to create a tree structure
tree = htmlTree(html);
% Locate all 'a' (anchor) elements within the parsed HTML tree
anchorElements = findElement(tree, "A");
% Retrieve the 'href' attributes from the identified anchor elements
hrefAttributes = getAttribute(anchorElements, "href");
% Identify the 'href' attributes that include the download keyword
downloadLinks = hrefAttributes(contains(hrefAttributes, "?do=download"));
% Iterate over the first 10 download links (or fewer if there are not as many)
for i = 1:min(10, numel(downloadLinks))
    % URL-decode each download link to get a human-readable format
    decodedText = urldecode(downloadLinks(i));
    
    % Split the decoded URL by '/' to isolate the file name
    parts = strsplit(decodedText, '/');
    
    % Extract the file name, which is the last segment after splitting
    lastPart = parts(end);
    
    % Formulate the full download URL by adding the base URL to the relative path
    modifiedLink = "https://swarm-diss.eo.esa.int/" + downloadLinks(i);
    
    % Download the file using websave and name it with the extracted file name
    websave(lastPart{1}, modifiedLink);
end

You can refer to the following documentations for more details about the used MATLAB functions-

I hope this guidance clarifies how to retrieve files from the desired website.Additionally, I've included the website's html source code as a text file as an attachment for your reference.

Regards

Vinayak Luha

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

How to read files from a particular website?

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

回答 (1 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

How to read files from a particular website?

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

回答 (1 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示