Web scraping with regular expression, getting rid of html tags.
2 ビュー (過去 30 日間)
表示 古いコメント
Hi all,
I am doing some webscraping code and consequently, I am using regular expressions. I need to isolate the words from a string, of course html tags should not be included. Html tags are words included in < > (e.g. br). Unfortunately, my code does not work out and I am wondering why. Here an example:
regexp('qu <qa>','(?!<)\w*(?!>)','match')
My expected results is 'qu' but instead I get 'qu' and 'q'. The code works with this string 'qu q'. What may I do to solve this issue?
thanks
Regards,
Pietro
0 件のコメント
採用された回答
Guillaume
2017 年 6 月 3 日
The first part of your expression is a look-ahead. You want a look behind instead. Add a < before the !:
regexp('qu <qa>', '(?<!<)\w*(?!>)', 'match')
その他の回答 (0 件)
参考
カテゴリ
Find more on Web Services in Help Center and File Exchange
製品
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!