HTML Page source info
古いコメントを表示
Hello, many-a-times we come across a series of numbered webpages
basePage.html?page=2
basePage.html?page=3
and so forth, wherein there are several fields identified by their labels:
<h2 class="category-heading">Name1</h2>
<label>Parameter1 : </label> <div class="category-related">textOfInterest</div>
<label>Parameter2 : </label> <div class="category-related">textOfInterest</div>
<label>Parameter3 : </label> <div class="category-related">textOfInterest</div>
<h2 class="category-heading">Name2</h2>
<label>Parameter1 : </label> <div class="category-related">textOfInterest</div>
<label>Parameter2 : </label> <div class="category-related">textOfInterest</div>
<label>Parameter3 : </label> <div class="category-related">textOfInterest</div>
<h2 class="category-heading">Name3</h2>
<label>Parameter1 : </label> <div class="category-related">textOfInterest</div>
<label>Parameter2 : </label> <div class="category-related">textOfInterest</div>
<label>Parameter3 : </label> <div class="category-related">textOfInterest</div>
and so on.
How can the "textOfInterest" of one particular parameter, say, Parameter2, of all the Name*, of all the pages,
basePage.html?page=1toInf
be taken (outputted/exported) into one text file, say, Parameter2.txt?
The "textOfInterest" is often alphanumeric with special characters !@#$% also.
Thanks.
6 件のコメント
Rik
2020 年 11 月 26 日
Step by step. You want to parse several pages, so you will probably need a loop. You want to write something to a file, so you will first have to store it in Matlab variables.
What have you tried?
b
2020 年 11 月 26 日
Rik
2020 年 11 月 26 日
Good start.
Now you need to think about how you can extract the text of interest from the webpage content. The strfind function is probably helpful in this context. That is the main thing I used when I had to parse a few thousand webpages for my Bible downloader.
b
2020 年 11 月 27 日
b
2020 年 12 月 1 日
The goal of Bible downloader is religious (although you can use the text of a Bible translation for non-religous purposes as well of course), but the code isn't.
Did you try adapting any of the code? I'll post some code as an answer.
採用された回答
その他の回答 (1 件)
b
2020 年 12 月 3 日
0 投票
3 件のコメント
Rik
2020 年 12 月 3 日
Can you move this to the comment section (by posting a new comment and deleting this answer)? And please also add the code you're using to parse a single element.
b
2020 年 12 月 3 日
Rik
2020 年 12 月 3 日
You're welcome (and thanks for the limerick XD).
If you have follow-up question, feel free to post a link to it here.
カテゴリ
ヘルプ センター および File Exchange で Variables についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!