textscan of mixed data type data file

Question

Ashraf Alfandi 2022 年 2 月 17 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1652235-textscan-of-mixed-data-type-data-file

コメント済み: Walter Roberson 2022 年 10 月 27 日

DNP SYS test.txt

I'm trying to import the data from a column-based text file into MATLAB matrix, in which each column of the matrix includes the headerline and its corresponding data column. The file consist of headerline followed by columns of data, as you may see in the attachemnt. I need MATLAB to read the first line (i.e. the headerline: string/char data type) and detect how many headers are there, which corresponds to the number of variables in the file, then read the following data (double data type) in columns based.

2 件のコメント
なしを表示なしを非表示

Walter Roberson 2022 年 2 月 17 日

MATLAB Online で開く

textscan(fid, '', 'HeaderLines', 1)

would tell textscan() to skip one line and then figure out by itself how many columns there are.

Do you need the variable names to be remembered, or were you just looking to figure out how many columns were there?

Ashraf Alfandi 2022 年 2 月 17 日

I need both. Headerlines will help me retreive the data column based on it's title/headerline.

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Ashraf Alfandi 2022 年 2 月 17 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1652235-textscan-of-mixed-data-type-data-file#answer_898625

編集済み: Ashraf Alfandi 2022 年 2 月 17 日

MATLAB Online で開く

Thanks Mathieu for your ansewr. It's definitly very copmrehensive, but time consuming for what I need to run. After all, I came up with the follwoing simple code that takes ~ 0.0009 sec wheras yours takes 0.5 seconds

FileName = "DNP MM SYS test.dat";
test = importdata(FileName);
Data = test.data;       % Extracting the data via importdata
N = length(Data(1,:));  % Detecting the number of columns 
fid = fopen(FileName);
Head = textscan(fid,'%q', N+1,'HeaderLines',1); % use the N to tell textscan how many strings to expect
Head = [Head{:}]'; Head = Head(2:end);
fclose(fid);

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Mathieu NOE 2022 年 2 月 17 日

hello

no problem

I got this for my code : Elapsed time is 0.027212 seconds.

for your code : Elapsed time is 0.047455 seconds.

サインインしてコメントする。

Answer 2

Mathieu NOE 2022 年 2 月 17 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1652235-textscan-of-mixed-data-type-data-file#answer_898580

MATLAB Online で開く

hello

try this

readclm is a old but still valuable function (don't even remember where it came from)

the variable names are stored in cell array var

[DATA,HEAD] = readclm('DNP SYS test.txt');
var = split(HEAD,' "');
var = var(2:end);
var = strrep(var,'"',''); %get rid of double quotes
function  [outdata,head] = readclm(filename,nclm,skip,formt)
% READCLM Reads numerical data from a text file into a matrix.
%	Text file can begin with a header or comment block.
%	[DATA,HEAD] = READCLM(FILENAME,NCLM,SKIP,FORMAT)
%	Opens file FILENAME, skips first several lines specified
%	by SKIP number or beginning with comment '%'.
%	Then reads next several lines into a string matrix HEAD
%	until the first line with numerical data is encountered
%	(that is until first non-empty output of SSCANF).
%	Then reads the rest of the file into a numerical matrix
%	DATA in a format FORMAT with number of columns equal
%	to number of columns of the text file or specified by
%	number NCLM. If data does not match the size of the
%	matrix DATA, it is padded with NaN at the end.
%
%	READCLM(FILENAME) reads data from a text file FILENAME,
%	skipping only commented lines. It determines number of
%	columns by the length of the first data line and uses
%	the floating point format '%g';
%
%	READCLM uses FGETS to read the first lines and 	FSCANF
%	for reading data.
 % Defaults and  parameters ..............................
formt_dflt = '%g';  % Default format for fscanf
addn = nan;         % Number to fill the end if necessary
 % Handle input ..........................................
if nargin<1, error('  File name is undefined'); end
if nargin<4, formt = formt_dflt; end
if nargin<3, skip = 0; end
if nargin<2, nclm = 0; end
if isempty(nclm), nclm = 0; end
if isempty(skip), skip = 0; end
 % Open file ............................
[fid,msg] = fopen(filename);
if fid<0, disp(msg), return, end
 % Find header and first  data line ......................
is_head = 1;
jl = 0;
head = ' ';
while is_head  % Add lines to header.....
  s = fgets(fid);           % Get next line
  jl = jl+1;
  is_skip = jl<=skip;
  is_skip = jl<=skip | s(1)=='%';
  out1 = sscanf(s,formt);   % Try to read this line
   % If unreadable by SSCANF or skip, add to header
  is_head = isempty(out1) | is_skip;
  if is_head & ~is_skip
    head = str2mat(head,s(1:length(s)-1)); end
end
head = head(2:size(head,1),:);
 % Determine number of columns if not specified
out1 = out1(:)';
l1 = length(out1);
if ~nclm, nclm = l1; end
 % Read the rest of the file ..............................
if l1~=nclm  % First line format is different from ncolumns
  outdata = fscanf(fid,formt);
  lout = length(outdata)+l1;
  ncu = ceil(lout/nclm);
  lz = nclm*ncu-lout;
  outdata = [out1'; outdata(:); ones(lz,1)*addn];
  outdata = reshape(outdata,nclm,ncu)';
else              % Regular case
  outdata = fscanf(fid,formt,[nclm inf]);
  outdata = [out1; outdata'];  % Add the first line
end
fclose (fid);     % Close file ..........
end

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Answer 3

Wesser 2022 年 10 月 27 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1652235-textscan-of-mixed-data-type-data-file#answer_1085303

MATLAB Online で開く

So I originally had the script as below. It works perfectly when all the Obs_Node.out files have the same number of rows. But when the Obs_node.out files have a different number of rows, I can't compile the columns from each forloop. For example,

THETA_ObsNode(:,i) = theta_ObsNode(:);

will result in an error like:

"Unable to perform assignment because the size of the left side is 200000-by-1 and the size of the right side is

117648-by-1.

Error in MC_Data_Compile (line 67)

THETA_ObsNode(:,i) = theta_ObsNode(:); "

I am ultimatly trying to compile each column from each forloop into one file for that respective column....if that makes sense. My qestion then is how do I compile the data when the lengths of the column vary?

num_sim = 1000;      %1000 monte carlo simulations
Node_CONC=zeros(200000,num_sim); %200000 is an arbitrarilly large number of rows 
%~~~~~~~~~~Coalesce data from Obs_Node.out files~~~~~~~~~~~~
for i=1:num_sim
    Obs_Node = fopen(["/Users/apple/Dropbox/My Mac (apple’s MacBook Pro)/Desktop/Simulations/MC_"+num2str(i)+'/Obs_Node.out']);   % Open monte carlo output file in Path (i) 
    
    skip_lines=11;   %skip all the lines until the output data of interest
    
    for k=1:(skip_lines)
        x=fgetl(Obs_Node);
    end
    temp1 = fscanf(Obs_Node,'%f',[5,Inf]);        %scan the matrix of data
    TEMP1 = temp1';       % transpose data
    
    theta_ObsNode = TEMP1(:,3);      % Hydraulic Conductivity
    THETA_ObsNode(:,i) = theta_ObsNode(:);  %%%% this line saves each iteration's data in a seperate file    
    
    flux_ObsNode = TEMP1(:,4);      % Water Flux
    FLUX_ObsNode(:,i) = flux_ObsNode(:);  
    
    Conc_ObsNode = TEMP1(:,5);      % Concentration g/cm3
    CONC_ObsNode(:,i) = Conc_ObsNode(:);      
    
    fclose(Obs_Node);
end

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Walter Roberson 2022 年 10 月 27 日

MATLAB Online で開く

Pad the arrays for the shorter data.

Here I use NaN to pad, as it is clear that NaN is not valid data. The code could be a bit shorter if it was acceptable to pad with zeros instead of some other value.

The below code does not assume that all files except the last are the same length: it dynamically grows the array any time it encounters a larger file, making sure to extend the padding for any existing data.

num_sim = 1000;      %1000 monte carlo simulations
Node_CONC=zeros(200000,num_sim); %200000 is an arbitrarilly large number of rows 
%~~~~~~~~~~Coalesce data from Obs_Node.out files~~~~~~~~~~~~
for i=1:num_sim
    Obs_Node = fopen(["/Users/apple/Dropbox/My Mac (apple’s MacBook Pro)/Desktop/Simulations/MC_"+num2str(i)+'/Obs_Node.out']);   % Open monte carlo output file in Path (i) 
    
    skip_lines=11;   %skip all the lines until the output data of interest
    
    for k=1:(skip_lines)
        x=fgetl(Obs_Node);
    end
    temp1 = fscanf(Obs_Node,'%f',[5,Inf]);        %scan the matrix of data
    TEMP1 = temp1';       % transpose data
    
    theta_ObsNode = TEMP1(:,3);      % Hydraulic Conductivity
    flux_ObsNode = TEMP1(:,4);      % Water Flux
    Conc_ObsNode = TEMP1(:,5);      % Concentration g/cm3
    num_obs_here = length(theta_ObsNode);
    
    if i == 1
        THETA_ObsNode = nan(num_obs_here,num_sim);
        FLUX_ObsNode = THETA_ObsNode;
        CONC_ObsNode = THETA_ObsNode;
        
        THETA_ObsNode(:,i) = theta_ObsNode;
        FLUX_ObsNode(:,i) = flux_ObsNode;
        CONC_ObsNode(:,i) = conc_ObsNode;
        
    elseif num_obs_here <= size(Theta_ObsNode,1)
        
        THETA_ObsNode(1:num_obs_here,i) = theta_ObsNode;
        FLUX_ObsNode(1:num_obs_here,i) = flux_ObsNode;
        CONC_ObsNode(1:num_obs_here,i) = conc_ObsNode;
        
    else
        
        THETA_ObsNode(end+1:num_obs_here,:) = NaN;
        FLUX_ObsNode(end+1:num_obs_here,:) = NaN;
        CONC_ObsNode(end+1:num_obs_here,:) = NaN;
        
        THETA_ObsNode(:,i) = theta_ObsNode;
        FLUX_ObsNode(:,i) = flux_ObsNode;
        CONC_ObsNode(:,i) = conc_ObsNode;
    end
    
    fclose(Obs_Node);
end

サインインしてコメントする。

textscan of mixed data type data file

2 件のコメント
なしを表示なしを非表示

採用された回答

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (2 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

textscan of mixed data type data file

2 件のコメント なしを表示なしを非表示

採用された回答

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (2 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

2 件のコメント
なしを表示なしを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示