convert a html table to csv format
11 ビュー (過去 30 日間)
古いコメントを表示
I need to convert the table in the following url to csv format. Since I have to convert many tables, I can't use cope paste. http://climate.weatheroffice.gc.ca/climate_normals/results_e.html?stnID=2046&lang=e&dCode=0&province=ALTA&provBut=Search&month1=0&month2=12
1 件のコメント
Matt Kindig
2013 年 4 月 10 日
Do you have to use Matlab for this purpose? The reason I ask is because other languages that are more commonly used for website development have good HTML parsing capabilities, whereas such features are more limited in Matlab--in Matlab you'd basically have to resort to complex regexp statements.
I would recommend Python and the BeautifulSoup package to do this, actually.
回答 (2 件)
Jan
2013 年 4 月 11 日
You can import the HTML table to Matlab at first by FEX: htmltableToCell or FEX: get-html-table-data-into-matlab. Then an export to CSV depends on the contents of the data.
0 件のコメント
Cedric
2013 年 4 月 10 日
編集済み: Cedric
2013 年 4 月 10 日
As Matt mentions, Python + package would be perfect for this part. Here is one way to do it using REGEXP in MATLAB.. not the full stuff though, but enough to illustrate.
% - Get HTML page.
url = 'http://climate.weatheroffice.gc.ca/climate_normals/results_e.html?stnID=2046&lang=e&dCode=0&province=ALTA&provBut=Search&month1=0&month2=12' ;
buffer = urlread(url) ;
% - Extract horizontal header.
p = '(?<=<td class="dataTableColHeader">).*?(?=</td>)' ;
hheader = regexp(buffer, p, 'match') ;
% - Extract vertical header.
p = '(?<=<td class="dataTableRowHeader">).*?(?=</td>)' ;
vheader = regexp(buffer, p, 'match') ;
% - Extract/reshape data.
p = '(?<=<td class="dataTableRowData">).*?(?=</td>)' ;
data = regexp(buffer, p, 'match') ;
data = reshape(data, 12+2, []).' ;
% - Build and export the whole.
content = [vheader.',[hheader; data]] ;
xlswrite('example.xlsx', content) ;
Let me know if you want to go this way and I can improve a little this code. There would be still quite a bit of work to do on your side, e.g. to manage some inconsistency in the way they build the HTML table, to detect/manage failures in the processing, to export to CSV instead of XLSX, etc.
0 件のコメント
参考
カテゴリ
Help Center および File Exchange で Tables についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!