How do I use regexp to extract text between numbers

4 ビュー (過去 30 日間)
Ean Hendrickson
Ean Hendrickson 2019 年 11 月 9 日
編集済み: per isakson 2019 年 11 月 9 日
I have a string that I extracted from a pdf
str = "↵↵↵1. Receptacles, general purpose. ↵2. Receptacles with integral GFCI. ↵3. USB Charger receptacles. ↵4. AFCI receptacles. ↵5. Twist-locking receptacles. ↵6. Isolated-ground receptacles. ↵7. Tamper-resistant receptacles. ↵8. Weather-resistant receptacles. ↵9. Pendant cord-connector devices. ↵10. Cord and plug sets. ↵11. Wall box dimmers. ↵12. Wall box dimmer/sensors. ↵13. Wall box occupancy/vacancy sensors. ↵14. Toggle Switches. ↵15. Floor service outlets. ↵16. Associated device plates. ↵↵"
How can I use the function regexp to extract all the descriptions between the numbers to put them into a 16x1 matrix. So the end product I want will be a 16x1 string that looks like
  1. Receptacles, general purpose.
  2. Receptacles with integral GFCI.
  3. USB Charger receptacles.
  4. AFCI receptacles.
  5. Twist-locking receptacles.
  6. Isolated-ground receptacles.
  7. Tamper-resistant receptacles.
  8. Weather-resistant receptacles.
  9. Pendant cord-connector devices.
  10. Cord and plug sets.
  11. Wall box dimmers.
  12. Wall box dimmer/sensors.
  13. Wall box occupancy/vacancy sensors.
  14. Toggle Switches.
  15. Floor service outlets.
  16. Associated device plates.
I also have this line of code
parts = regexp(str,'^\d*+.*$','dotexceptnewline','lineanchors');
which finds the index of each number in the string. I think I could then use all the index values to write a for loop to extract the text that is in between the text
  4 件のコメント
Rik
Rik 2019 年 11 月 9 日
Is this the exact text of your char array? Or are there actually some char(10) in there?
Ean Hendrickson
Ean Hendrickson 2019 年 11 月 9 日
this is the exact text I extracted from a pdf. there should be no char(10) in there. I used extractFileText, strfind and extractBetween to get the above text.

サインインしてコメントする。

回答 (2 件)

per isakson
per isakson 2019 年 11 月 9 日
編集済み: per isakson 2019 年 11 月 9 日
"So the end product I want will be a 16x1 string that looks like" I'm not sure exactly how understand your requirement.
The problem is the delimiter that looks a bit like the character on my ENTER key ( ↵). After copy&paste from your question the hex number of that character is \x21B5.
Try
%%
z = regexp( str, "\x21B5+", 'split' );
z = strtrim( z );
z( isstring(z) & strlength(z)==0 ) = [];
%%
% z = regexp( z, "(?<=\d+\.\x20).+$", 'match', 'once' ); % removes the numbers
out = reshape( z, [],1 );
%%
fprintf( 1, '%s\n', out );
outputs in the command window
1. Receptacles, general purpose.
2. Receptacles with integral GFCI.
3. USB Charger receptacles.
4. AFCI receptacles.
5. Twist-locking receptacles.
6. Isolated-ground receptacles.
....
and
>> out(1:4)
ans =
4×1 string array
"1. Receptacles, general purpose."
"2. Receptacles with integral GFCI."
"3. USB Charger receptacles."
"4. AFCI receptacles."

JESUS DAVID ARIZA ROYETH
JESUS DAVID ARIZA ROYETH 2019 年 11 月 9 日
str = "↵↵↵1. Receptacles, general purpose. ↵2. Receptacles with integral GFCI. ↵3. USB Charger receptacles. ↵4. AFCI receptacles. ↵5. Twist-locking receptacles. ↵6. Isolated-ground receptacles. ↵7. Tamper-resistant receptacles. ↵8. Weather-resistant receptacles. ↵9. Pendant cord-connector devices. ↵10. Cord and plug sets. ↵11. Wall box dimmers. ↵12. Wall box dimmer/sensors. ↵13. Wall box occupancy/vacancy sensors. ↵14. Toggle Switches. ↵15. Floor service outlets. ↵16. Associated device plates. ↵↵"
parts = regexp(str,'\d+\. +[.\w,-/\s]+\.','match')'
parts =
16×1 string array
"1. Receptacles, general purpose."
"2. Receptacles with integral GFCI."
"3. USB Charger receptacles."
"4. AFCI receptacles."
"5. Twist-locking receptacles."
"6. Isolated-ground receptacles."
"7. Tamper-resistant receptacles."
"8. Weather-resistant receptacles."
"9. Pendant cord-connector devices."
"10. Cord and plug sets."
"11. Wall box dimmers."
"12. Wall box dimmer/sensors."
"13. Wall box occupancy/vacancy sensors."
"14. Toggle Switches."
"15. Floor service outlets."
"16. Associated device plates."

カテゴリ

Help Center および File ExchangeString Parsing についてさらに検索

タグ

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by