Pull webpage from MATLAB site using MATLAB (but with login)

29 ビュー (過去 30 日間)
Highphi 2020 年 7 月 22 日
回答済み: Pascal Geschwill 2021 年 4 月 30 日
Hello there
I have recently been working on a code that pulls information from a webpage and stores it in a file
webread() isn't very hard to use
however, I have gotten to the point where I want to pull pages that can only be seen when logged in
I am using a MATLAB webpage (only visible when logged in) to work on my solution, but I can't quite figure it out
for example,
pageLink = 'https://www.mathworks.com/matlabcentral/cody/groups/345/problems/15-find-the-longest-sequence-of-1-s-in-a-binary-sequence/solutions/new';
options = weboptions;
options.Username = 'myEmail@email.com';
options.Password = 'myPassw0rd';
pageRead = webread(pageLink, options);
(obviously with real information)
This does not work, it always returns the 'You must log in page'
I have also tried to webwrite my options, as well as renaming them the parameters called, such as...
userPage = 'https://www.mathworks.com/login?uri=https%3A%2F%2Fwww.mathworks.com%2Fproducts%2Fmatlab.html';
userId = 'myEmail@email.com';
password = 'myPassw0rd';
webwrite(userPage, 'userId', userId, 'password', password)
and all various options between webwrite and webread and options and named parameters
but it won't return the page as if I was logged in
Could someone direct me along the right path? Is it just MATLAB and should I have tried with a different website or can this be done?
  1 件のコメント
Highphi 2020 年 7 月 22 日
tried using...
system(['wget --auth-no-challenge --user=', userId, ' --password=', password, ' ', pageLink])
which started to feel like a step in the right direction... but I get:
SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc
syswgetrc = C:\Program Files (x86)\Gow/etc/wgetrc
--2020-07-22 13:00:05-- https://www.mathworks.com/matlabcentral/cody/groups/345/problems/15-find-the-longest-sequence-of-1-s-in-a-binary-sequence/solutions/new
Resolving www.mathworks.com...
Connecting to www.mathworks.com||:443... connected.
ERROR: cannot verify www.mathworks.com's certificate, issued by `/C=US/O=DigiCert Inc/CN=DigiCert SHA2 Secure Server CA':
Unable to locally verify the issuer's authority.
To connect to www.mathworks.com insecurely, use `--no-check-certificate'.
Unable to establish SSL connection.
where is (potentially) an IP address that I censored since I'm not sure what its significance is



Highphi 2020 年 7 月 22 日
Figured it out...
By myself ...............
No worries. Here's how I did it for future reference:
1. Fix your default web browser preferences
Option 1: MANUALLY
A. Under the 'Home' tab, click 'Preferences'
Option 2: From the COMMAND WINDOW
preferences Web
B. In the 'Preferences' window, now go to the 'Web' subsection make sure the box next to "Use system browser when opening links to external sites (recommended).". Then click Apply
(Please forgive my handwriting, as I wrote it in Snipping Tool with my mouse lol)
A. Use the following code to open your window:
[a,h] = web(pageLink);
It will popup a window with that link you told it to go to
B. IF prompted to login to the desired page, do so and try to click 'Remember Me' if it is an option.
Otherwise, do this step at the beginning of every script and leave one browser window open. I will explain in a second.
C. Use the following code to pull your HTML and then close the browser:
[a, h2] = web(pageLink);
pageHTML = get(h2, 'HtmlText');
Notice I used the handle 'h2' in the second part. This is so that you don't close 'h', if necessary. Closing h2 will ONLY close h2, allowing you to remain logged in.
D. Rinse and repeat.
  3 件のコメント
Diego Rodriguez
Diego Rodriguez 2021 年 1 月 10 日
Thank you so much! Helps me a lot.


その他の回答 (1 件)

Pascal Geschwill
Pascal Geschwill 2021 年 4 月 30 日
while this approach seems to work for now, it looks like this is deprecated functionality. At least with 2020a I am getting a warning:
Warning: [STAT,H] = WEB(___) does not return a handle for pages that open in the system browser. Use STAT = WEB(___) instead.
> In web>displayWarningMessage (line 432)
In web (line 96)
In my case, the solution described in this thread worked just as well. I am pulling build histories from our CI server via its REST API and then parsing them in MATLAB.


Help Center および File ExchangeDesktop についてさらに検索




Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by