looking for regular expression to parse sparse data

Hi,
i have a sparse mass matrix exported from ansys, and the data looks as follows:
[ 1, 1]: 1.157e-07 [ 1, 4]: 2.332e-08 [ 1, 7]: 2.146e-08 [ 1, 10]: 5.835e-08 [ 1, 13]: 4.043e-08 [ 1, 16]: 1.011e-08 [ 1, 19]: 8.211e-09 [ 1, 22]: 2.590e-08 [ 1, 25]:-3.475e-08 [ 1, 28]:-2.854e-08 [ 1, 31]:-2.987e-08 [ 1, 34]:-8.897e-08 [ 1, 37]:-1.351e-08 [ 1, 40]:-8.564e-09 [ 1, 43]:-9.072e-09 [ 1, 46]:-3.556e-08 [ 1, 49]:-6.093e-08 [ 1, 52]:-1.343e-08 [ 1, 55]:-8.914e-09 [ 1, 58]:-3.609e-08 [ 1, 61]:-3.609e-08 [ 1, 64]:-6.093e-08 [ 1, 67]:-1.343e-08 [ 1, 70]:-8.914e-09 [ 1, 118]: 5.625e-08 [ 1, 121]: 2.883e-08 [ 1, 130]: 2.507e-08 [ 1, 133]: 1.102e-08 [ 1, 142]:-3.891e-08 [ 1, 154]:-1.175e-08 [ 1, 166]:-3.459e-08 [ 1, 169]:-1.171e-08 [ 1, 181]:-1.171e-08 [ 1, 184]:-3.459e-08 [ 1, 187]:-8.513e-08 [ 1, 190]:-3.947e-08 [ 1, 193]:-3.466e-08 [ 1, 196]:-1.196e-08 [ 1, 958]: 1.944e-08 [ 1, 964]: 7.516e-09 [ 1, 970]:-2.705e-08 [ 1, 979]:-8.340e-09 [ 1, 988]:-7.965e-09 [ 1, 994]:-7.965e-09 [ 1, 1021]: 2.166e-08 [ 1, 1024]: 9.467e-09 [ 1, 1027]:-2.557e-08 [ 1, 1030]:-3.156e-08 [ 1, 1033]:-7.830e-09 [ 1, 1036]:-1.295e-08 [ 1, 1039]:-1.246e-08 [ 1, 1042]:-1.246e-08
Im looking to put this into a dense matrix, but well enough will be to store all the items in a cell array of 3 columns: x, y, data by N rows, where the regular expression will read to the end of the file.
I would then search the cell array for the largest index (X,Y) and initialize an array of that size, then copy the data over from the cell array to the matrix.
Is this possible?

 採用された回答

Star Strider
Star Strider 2020 年 11 月 13 日

1 投票

This uses one regexp call to parse the data into specific cells that are read with sscanf, and then partitioned into individual columns using the reshape function in the ‘Out’ assignment. It may not be exactly what you intended (I doubt that is possible), however it has the virtue of produciing the desired result:
M = '[ 1, 1]: 1.157e-07 [ 1, 4]: 2.332e-08 [ 1, 7]: 2.146e-08 [ 1, 10]: 5.835e-08 [ 1, 13]: 4.043e-08 [ 1, 16]: 1.011e-08 [ 1, 19]: 8.211e-09 [ 1, 22]: 2.590e-08 [ 1, 25]:-3.475e-08 [ 1, 28]:-2.854e-08 [ 1, 31]:-2.987e-08 [ 1, 34]:-8.897e-08 [ 1, 37]:-1.351e-08 [ 1, 40]:-8.564e-09 [ 1, 43]:-9.072e-09 [ 1, 46]:-3.556e-08 [ 1, 49]:-6.093e-08 [ 1, 52]:-1.343e-08 [ 1, 55]:-8.914e-09 [ 1, 58]:-3.609e-08 [ 1, 61]:-3.609e-08 [ 1, 64]:-6.093e-08 [ 1, 67]:-1.343e-08 [ 1, 70]:-8.914e-09 [ 1, 118]: 5.625e-08 [ 1, 121]: 2.883e-08 [ 1, 130]: 2.507e-08 [ 1, 133]: 1.102e-08 [ 1, 142]:-3.891e-08 [ 1, 154]:-1.175e-08 [ 1, 166]:-3.459e-08 [ 1, 169]:-1.171e-08 [ 1, 181]:-1.171e-08 [ 1, 184]:-3.459e-08 [ 1, 187]:-8.513e-08 [ 1, 190]:-3.947e-08 [ 1, 193]:-3.466e-08 [ 1, 196]:-1.196e-08 [ 1, 958]: 1.944e-08 [ 1, 964]: 7.516e-09 [ 1, 970]:-2.705e-08 [ 1, 979]:-8.340e-09 [ 1, 988]:-7.965e-09 [ 1, 994]:-7.965e-09 [ 1, 1021]: 2.166e-08 [ 1, 1024]: 9.467e-09 [ 1, 1027]:-2.557e-08 [ 1, 1030]:-3.156e-08 [ 1, 1033]:-7.830e-09 [ 1, 1036]:-1.295e-08 [ 1, 1039]:-1.246e-08 [ 1, 1042]:-1.246e-08';
V = regexp(M, '\[', 'split');
R = sscanf([V{:}], '%d,%d]: %f');
Out = reshape(R, 3, []);
with:
FirstFiveColumns = Out(:,1:5)
producing:
FirstFiveColumns =
1 1 1 1 1
1 4 7 10 13
1.157e-07 2.332e-08 2.146e-08 5.835e-08 4.043e-08
with ‘x’ being the first row, ‘y’ being the second row, and the floating-point variables (I have no idea what they represent) the third row.

6 件のコメント

Stephen23
Stephen23 2020 年 11 月 14 日
編集済み: Stephen23 2020 年 11 月 14 日
Without regexp or reshape, sscanf can parse it directly:
format long
str = '[ 1, 1]: 1.157e-07 [ 1, 4]: 2.332e-08 [ 1, 7]: 2.146e-08 [ 1, 10]: 5.835e-08 [ 1, 13]: 4.043e-08 [ 1, 16]: 1.011e-08 [ 1, 19]: 8.211e-09 [ 1, 22]: 2.590e-08 [ 1, 25]:-3.475e-08 [ 1, 28]:-2.854e-08 [ 1, 31]:-2.987e-08 [ 1, 34]:-8.897e-08 [ 1, 37]:-1.351e-08 [ 1, 40]:-8.564e-09 [ 1, 43]:-9.072e-09 [ 1, 46]:-3.556e-08 [ 1, 49]:-6.093e-08 [ 1, 52]:-1.343e-08 [ 1, 55]:-8.914e-09 [ 1, 58]:-3.609e-08 [ 1, 61]:-3.609e-08 [ 1, 64]:-6.093e-08 [ 1, 67]:-1.343e-08 [ 1, 70]:-8.914e-09 [ 1, 118]: 5.625e-08 [ 1, 121]: 2.883e-08 [ 1, 130]: 2.507e-08 [ 1, 133]: 1.102e-08 [ 1, 142]:-3.891e-08 [ 1, 154]:-1.175e-08 [ 1, 166]:-3.459e-08 [ 1, 169]:-1.171e-08 [ 1, 181]:-1.171e-08 [ 1, 184]:-3.459e-08 [ 1, 187]:-8.513e-08 [ 1, 190]:-3.947e-08 [ 1, 193]:-3.466e-08 [ 1, 196]:-1.196e-08 [ 1, 958]: 1.944e-08 [ 1, 964]: 7.516e-09 [ 1, 970]:-2.705e-08 [ 1, 979]:-8.340e-09 [ 1, 988]:-7.965e-09 [ 1, 994]:-7.965e-09 [ 1, 1021]: 2.166e-08 [ 1, 1024]: 9.467e-09 [ 1, 1027]:-2.557e-08 [ 1, 1030]:-3.156e-08 [ 1, 1033]:-7.830e-09 [ 1, 1036]:-1.295e-08 [ 1, 1039]:-1.246e-08 [ 1, 1042]:-1.246e-08';
mat = sscanf(str,'[%d,%d]:%f ',[3,Inf]).'
mat = 52×3
1.000000000000000 1.000000000000000 0.000000115700000 1.000000000000000 4.000000000000000 0.000000023320000 1.000000000000000 7.000000000000000 0.000000021460000 1.000000000000000 10.000000000000000 0.000000058350000 1.000000000000000 13.000000000000000 0.000000040430000 1.000000000000000 16.000000000000000 0.000000010110000 1.000000000000000 19.000000000000000 0.000000008211000 1.000000000000000 22.000000000000000 0.000000025900000 1.000000000000000 25.000000000000000 -0.000000034750000 1.000000000000000 28.000000000000000 -0.000000028540000
Tyler
Tyler 2021 年 1 月 3 日
Hi, both of the answers above work if i have the data in a 'string". However, if i import from a text file: Mfile = fileread('brg1_m.dat'); it comes in as a 1x270000000 character vector. I wasnt sure if the size of the vector was the issue, so i just used the first 1000 characters, and it still wont work.
Is there a way to convert a character vector into a string? I am using R2016b
Thanks alot!
Star Strider
Star Strider 2021 年 1 月 3 日
It would be easier to attempt to solve this if ‘brg1_m.dat’ was uploaded so we could work with it. There may be better ways to import it.
With respect to compatibility, the detectImportOptions function could be important here, and since it was introduced in R2016b, you should have it.
Be sure to download and install any Updates if available (I don’t remember what version/release those began with) so that you have the most current version of R2016b.
Stephen23
Stephen23 2021 年 1 月 3 日
"both of the answers above work if i have the data in a 'string'. However... it comes in as a 1x270000000 character vector. ... it still wont work."
I very much doubt that it would make any difference.
The code in my comment already uses a character vector, not a string. Using the equivalent string would give exactly the same output, because either a character vector or a string scalar can be supplied to sscanf, it makes zero difference. Lets try it:
Character vector:
str = '[ 1, 1]: 1.157e-07 [ 1, 4]: 2.332e-08'; % char vector
mat = sscanf(str,'[%d,%d]:%f ',[3,Inf]).'
mat = 2×3
1.0000 1.0000 0.0000 1.0000 4.0000 0.0000
String:
str = "[ 1, 1]: 1.157e-07 [ 1, 4]: 2.332e-08"; % string
mat = sscanf(str,'[%d,%d]:%f ',[3,Inf]).'
mat = 2×3
1.0000 1.0000 0.0000 1.0000 4.0000 0.0000
Most likely your character vector does not have the exact format that you showed us in your original question, e.g. contains some leading characters or non-displaying character, or some other difference. Both Star Strider's and my code rely on the input having the exact format that you showed in your question.
Tyler
Tyler 2021 年 1 月 3 日
Thank you, this is correct. There was one line of header in the file.
Thanks so much
Star Strider
Star Strider 2021 年 1 月 4 日
As always, my pleasure!

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

ヘルプ センター および File ExchangeCharacters and Strings についてさらに検索

製品

リリース

R2016b

質問済み:

2020 年 11 月 13 日

コメント済み:

2021 年 1 月 4 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by