How would I create a script to read files line-by-line to save memory

Question

EL 2019 年 8 月 20 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/476933-how-would-i-create-a-script-to-read-files-line-by-line-to-save-memory

コメント済み: Adam Danz 2019 年 8 月 21 日

Hey guys,

I've done the MatLab Onramp, but I still feel extremely confused about what the hell I'm doing and it's frustrating me. I don't even know how to google the right qeustions, and interpreting pages from this website is a task that alone is like learning another language. Learning German was easier than this it feels like. So I'm sorry if I'm asking stupid questions, but I feel like I've been thrown into the deep end.

I have a .txt file that is 1,000,000,000 lines long, give or take a few 100,000,000 (no two files are the same length)

It constists of only numbers, no headers that I'm aware of.

Because of the file size, I cannot load the whole file. It needs to be read in portions. I'd rather not split the file or

I'm looking to gather variance data every 100,000 data points, to be organized in a single column/multiple row format.

Idealy, I'd also like to have new columns generated every 360 variance data points, however this isn't as important as generating the varience data first.

Thanks for the help!

6 件のコメント
4 件の古いコメントを表示4 件の古いコメントを非表示

Adam Danz 2019 年 8 月 20 日

編集済み: Adam Danz 2019 年 8 月 20 日

@ Eric, that level of frustration is normal at this stage! You're asking the right questions so I'm sure you're going to succeed.

"I don't even know how to google the right questions"

Reduce your question to key words and add "matlab" to the font of your search. Nine times out of ten you'll end up in this forum or within the matlab documentation. Sometimes it will lead you to other resources but they usually aren't has helpful.

Matlab change plot symbols
Matlab how to delete something on the plot
etc..

"...and interpreting pages from this website is a task that alone is like learning another language"

Yes, it is like that but you'll get the hang of it. I'd estimate that there are less than 50 critical terms to undestand to be able to quickly read through the documentation. Just keep at it.

"Learning German was easier than this it feels like"

Nein! German has cases. Matlab has switch-cases which are much easier to understand.

EL 2019 年 8 月 20 日

編集済み: Adam Danz 2019 年 8 月 21 日

MATLAB Online で開く

x0.txt

I cut off a little section. This is the very top of a file I would use.

EDIT: Here's a script I'm currently using, and the errors I recieved

%% Loading Files for Input
% Currently, this can only do a single file at a time. Future editions intend to
% have multiple files loaded at once to save time. 
prompt = 'Enter the name of the .txt file to run (e.g. Organism_L/D_Media_Temp_mmddyyyy_Signal.txt).';
inputfile = input(prompt, 's');
%% Data Collection Rate
prompt = 'Enter the Data Collection rate(Hz). [20,000]';
Hz=input(prompt);
if isempty(Hz)
    Hz=20000;
end
%% Variance (n)
% This designated the amount of data to use for each datapoint generated.
% The standard amount is 5 seconds (100,000 datapoints). If left empty, 
% this is the value that will be used. Otherwise, this will be done in
% seconds. 
% Variables
%       vt = variance time. The time in seconds is the input, which is then
%       multiplied by 20,000. 
prompt = 'Enter the time length for variance calc in sec (20,000 points/sec) [5 seconds].';
vt=input(prompt);
if isempty(vt)
    vt=5;
end
%% Designating file for export
% This is the name of the .txt file that will contain the variance data
prompt = 'Enter the name for the output file (e.g. Organism_L/D_Media_Temp_mmddyyy_VarianceTime).';
outputfile=input(prompt,'s');
%% Initianting the code
% This is intended to be read line-by-line, then generating a single column
% text file of the variance data. 
infile=fopen(inputfile);
outfile=fopen(outputfile);
fline=fgetl(infile);
line_index=1;
variancewindow = Hz*vt;
data=zeros(1,variancewindow);
while ischar(dline);
    data(line_index) = str2double(dline)  ;  % str2double = Convert string to double precision value. What does that mean......?
    line_index=line_index+1;
    if line_index > variancewindow;
        line_index=1;
        variance_value=variance_function(data);
        fprintf(outfile,'%f\n',variance_value);
        data=zeros(1,variancewindow);
    end
    dline=fgetl(infile);
end
fclose(infile);
data=data(data~=0);
variance_value=variance_function(data);
fprintf(outfile,'%f/n',variance_value);
fclose(outfile);s

EDIT 2: The error's

Error using fgets
Invalid file identifier. Use fopen to generate a valid file identifier.
Error in fgetl (line 32)
[tline,lt] = fgets(fid);
Error in NMDIII_Data (line 59)
fline=fgetl(infile);

Just to be clear, this is something I was workign on while asking this question. That's why I didn't post it in the original question.

Adam Danz 2019 年 8 月 21 日

The methods proposed by myself and Walter involve reading in chunks of data rather than reading in line-by-line (as you're doing with fgets). I suggest you abandon that method and use textscan() instead.

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Adam Danz 2019 年 8 月 21 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/476933-how-would-i-create-a-script-to-read-files-line-by-line-to-save-memory#answer_388415

編集済み: Adam Danz 2019 年 8 月 21 日

MATLAB Online で開く

Here's a demo that shows how to read in multiple lines of a file in chunks. I included lots of comments that explain what's going on. There's a section at the bottom where you can perform whatever operations you want on the data that is being read it. Walter's answer includes the variance calculations you described.

% Set parameters
file = 'x0.txt';  % The file you're reading; it's better to use a full path such as "C:\Users\name\Documents\x0.txt'
nrows = 5; %number of rows to read in at a time (you can change this to 100000 or whatever)
% Initialize the file for reading 
fid = fopen(file); 
% Set some loop variables
ignore = 0; %number of rows to ignore at the beginning (headers etc)
done = false; % flag that detects when file is complete
% Loop through until you've read all lines of file.  When that 
% happens, "done" will be switched to true and the while-loop
% will end.
while ~done
    % Read the next 'nrows'; C will be a cell array of strings.  
    C = textscan(fid,'%s', nrows, 'delimiter', '\n', 'headerlines', ignore);
    % If C is completely empty, you've finished the file.  
    if cellfun(@isempty, C)
        % C has no data so the file is finished. 
        % Set the "done" flag to True so the while-loop ends
        done = true; 
        % Skip the rest of this iteration.
        continue
    end
    % Convert C from a cell array of strings to a numeric vector
    % This assumes the content of the strings are numbers.
    nVec = str2double(C{:}); 
    % Increment the number of lines to ignore
    ignore = ignore + nrows; 
    
    % % % % % % % % % % % % % % % % % % %
    %                                   %
    % HERE IS WHERE YOU'LL DO WHATEVER  %
    % OPERATIONS YOU NEED TO DO WITH    %
    % THE VALUES YOU JUST READ IN.      %
    %                                   %
    % % % % % % % % % % % % % % % % % % %
    
    
end
% Close file 
fclose(fid); 

2 件のコメント
なしを表示なしを非表示

Walter Roberson 2019 年 8 月 21 日

I do not see a purpose on the frewind() ? textscan() will continue from the current file position.

Adam Danz 2019 年 8 月 21 日

Nice catch, Walter. I originally copied a similar code that uses fgetl() and adapted it to this but I guess I overlooked the frewind. I edited and fixed it. Thanks.

サインインしてコメントする。

Answer 2

Walter Roberson 2019 年 8 月 20 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/476933-how-would-i-create-a-script-to-read-files-line-by-line-to-save-memory#answer_388413

MATLAB Online で開く

vary_every = 10000;
expected_buffers = 10000;   %1000000000 / 100000
group_every = 360;
variances = zeros(1, expected_buffers);
filename = 'YourFileNameHere';
[fid, msg] = fopen(filename, 'r');
if fid < 0
    error('Failed to open file "%s" because "%s"', filename, msg)
end
buffcount = 0
while true
    this_buffer = cell2mat( textscan(fid, '%f', vary_every) );
    if isempty(this_buffer); break; end   %end of file
    buffcount = buffcount + 1;
    variances(buffcount) = variance(this_buffer);
end
variances(buffcount+1:expected_buffers) = [];    %trim off any extra
leftover = mod(buffcount,group_every);
if leftover ~= 0
    variances(end+1:end+group_every-leftover) = nan;
end
variances = reshape(variances, group_every, []);
disp(variances)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

How would I create a script to read files line-by-line to save memory

6 件のコメント
4 件の古いコメントを表示4 件の古いコメントを非表示

採用された回答

2 件のコメント
なしを表示なしを非表示

その他の回答 (1 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

How would I create a script to read files line-by-line to save memory

6 件のコメント 4 件の古いコメントを表示4 件の古いコメントを非表示

採用された回答

2 件のコメント なしを表示なしを非表示

その他の回答 (1 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

6 件のコメント
4 件の古いコメントを表示4 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示