Parsing or regexp HTML output from urlread

1 回表示 (過去 30 日間)
Philip  Spratt
Philip Spratt 2013 年 6 月 24 日
I need to extract the PubMed IDs from the below HTML, but I am not too fluent in the use of regexp.
Can anyone help with how I would extract the IDs from the below HTML, and store them in a vector?
I'm guessing there is some way to say: what is between '<Id>' and '</Id>' store in...

採用された回答

Tom
Tom 2013 年 6 月 24 日
str = 'version="1.0" ? eSearchResult PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd" eSearchResult880<IdList> Id>16123227</Id Id>9561342</Id Id>8429296</Id Id>1408722</Id Id>2152845</Id Id>2894889</Id Id>2860133</Id Id>6145799</Id /IdList<TranslationSet/><TranslationStack> TermSet Term"ulcerative colitis"[All Fields]</Term> Fields</Field Count>33249</Count Explode>N</Explode /TermSet TermSet Term"Clonidine"[All Fields]</Term> Fields</Field Count>16458</Count Explode>N</Explode /TermSet OP>AND</OP /TranslationStack"ulcerative colitis"[All Fields] AND "Clonidine"[All Fields]</eSearchResult>';
%isolate the ID list string
IDList = regexp(str,'(?<=IdList>).*(?=/IdList)','match');
disp(IDList{1})
%get the ID numbers from the string
IDno = textscan(IDList{1},'Id>%d</Id');
disp(IDno{1})
  1 件のコメント
Philip  Spratt
Philip Spratt 2013 年 6 月 25 日
Very much appreciated Tom
Thanks!

サインインしてコメントする。

その他の回答 (1 件)

Sean de Wolski
Sean de Wolski 2013 年 6 月 24 日
  1 件のコメント
Philip  Spratt
Philip Spratt 2013 年 6 月 24 日
Must apologise, the output was HTML, hence the xml2struct didn't work.

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeString Parsing についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by