Is there a way to pull a specific link after using webread() to get the content from a page?

1 回表示 (過去 30 日間)
Essentially I'm using webread() to obtain the contents of a google search. If there's a Wikipedia link in the contents, I want to extract it. I've been using regexp(content,exp,'match') but I'm confused on how to create an expression that'll match the Wikipedia link. I know that doing something such as:
regexp(content,'https?://en\.?\w*\.?\w')
Will get me the 'https://en.wikipedia.org' portion of the link, but this expression seems unnecessary just for that part already. I can continue doing that for the whole link but the amount of words in the Wikipedia link will vary so I'm unsure how to contain just the link and not accidentally take text following the link.
(e.g https://en.wikipedia.org/wiki/List_of_landmark_court_decisions_in_the_United_States or https://en.wikipedia.org/wiki/Banana)
In the text that is read, it appears that the link is followed by the &amp. Perhaps I can take all the characters from http to &amp but it would be nice to get some tips on how to create an expression for that!
Thanks for the help!
  1 件のコメント
Matthew Cao
Matthew Cao 2018 年 5 月 1 日
編集済み: Matthew Cao 2018 年 5 月 1 日
Ok, I could simply replace the ('\.?\w*\.?\w'') part of the expression with \S+ which will look for any non-white-space character that appears consecutively. This pulls the Wiki link and a lot afterwards too:
https://en.wikipedia.org/wiki/List_of_landmark_court_decisions_in_the_United_States&(there's the word 'amp' here but it is not shown on the forum);sa=U&.............
I need to stop it right at the &,amp!

サインインしてコメントする。

採用された回答

Matthew Cao
Matthew Cao 2018 年 5 月 1 日
I think I've solved it by putting '\S+' in the expression and '?=&sa'. That way the expression will match all the characters following 'https?://en' but stop at the right point.
regexp(content,'https?://en.\S+(?=&(amp);sa)','match')
This will find everything up until the '&(amp);sa'! If there's a more efficient way of doing this let me know!

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeEnvironment and Settings についてさらに検索

タグ

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by