How to download multiple files from a website

Chad Greene
Chad Greene 2023 年 11 月 21 日
コメント済み: Dyuman Joshi 2023 年 11 月 22 日
This question has been asked many times in various ways on this forum, but I've never found a simple answer to this very simple question:
It seems like there should be a two-line solution along the lines of :
url_list = get_urls('','extension','.nc');
if get_urls were a function and websave were as easy to use as entering a list of file urls to download and having it save them in the current directory.
Chad Greene
Chad Greene 2023 年 11 月 21 日
Wow, thank you @Dyuman Joshi!
Dyuman Joshi
Dyuman Joshi 2023 年 11 月 22 日
You are welcome!



Voss 2023 年 11 月 21 日
url = '';
% webread() the main page and parse out the links to .nc files:
data = webread(url);
C = regexp(data,'<a href=".*?(\?[^"]*.nc)">','tokens');
temp_urls = strcat(url,vertcat(C{:}));
% webread() each linked url:
data = cell(size(temp_urls));
for ii = 1:numel(temp_urls)
data{ii} = webread(temp_urls{ii});
% get the download link in each of those pages:
C = regexp(data,'<a href="([^"]*)">\s*<b>HTTPServer','tokens','once');
% append them to the (sub-)domain of the main URL to get the actual URLs
% for downloading the .nc files:
idx = find(url == '/',3);
nc_urls = strcat(url(1:idx(end)-1),vertcat(C{:}));
% construct file names to save to locally:
[~,filenames,ext] = fileparts(nc_urls);
filenames = strcat(filenames,ext);
% download all the files:
for ii = 1:numel(nc_urls)
Voss 2023 年 11 月 21 日
You're welcome!
Each link on the main page goes to a distinct intermediate page which contains the link to download the actual .nc file.
The first webread/regexp gets the set of urls to those intermediate pages. Then webread each of those intermediate pages in a loop, and regexp all the contents to get the download urls (which is the url immediately preceding 'HTTPServer' on each intermediate page - there are several other urls on those pages, and that was the only way I could think of to be sure to get the right one).
Chad Greene
Chad Greene 2023 年 11 月 22 日
Ooh, okay, that makes a lot of sense. Thanks @Voss!


