Export matched lines from two text files
古いコメントを表示
I need to identify the same lines between the two text files, mwithrm21.txt and virgomrmdist.txt, based on column 7 of each files. These matches should then be exported into a new text file, while removing the matched lines from mwithrm21.txt.
I have attached the text files.
I drafted the code below:
content1 = fileread( 'mwithrm21.txt' ) ;
content2_rows = strsplit( fileread( 'virgomrmdist.txt' ), sprintf( '\n' )) ;
found = cellfun( @(s)~isempty(strfind(content1, s)), content2_rows ) ;
output_rows = content2_rows(found) ;
fId = fopen( 'similarvclf.txt', 'w' ) ;
fprintf( fId, '%s\n', output_rows{:} ) ;
fclose( fId ) ;
output_rows = content2_rows(~found) ;
fId = fopen( 'mwithrm21_new.txt', 'w' ) ; % Remove the '_new' for overwriting original.
fprintf( fId, '%s\n', output_rows{:} ) ;
fclose( fId ) ;
But, I do not know how to make it specific to only searching column 7 and then exporting the entire matched line to a new text file.
6 件のコメント
Cedric
2015 年 8 月 15 日
jgillis16
2015 年 8 月 15 日
Cedric
2015 年 8 月 15 日
Actually it works off of finding the position of separators '|'. This allows to build an array of positions of separators for each row of the file. We do this precisely because you needed to target a specific column in your question. Based on this array, we can then test a specific column (whichever we want, as we have the potion of all separators). In the example, we test if column 4 contains '~' by testing all characters that follow directly all 3rd separators (per row).
Cedric
2015 年 8 月 15 日
I'll answer your question in about 10 minutes, but I have been insisting with this thread because you are doing file content processing which is increasingly complicated, so at one point you may need to understand quite well all the approaches that were involved in the answers to your questions so far.
Actually, you say that you want to match rows based on column 7 only, but when they match the other columns don't always match. What do you want to have in the output?
For example:
file1 : 188.83785|27.56214|-14.4|18.931|0.398|~|SDSSJ123521.05+273343.6
file2 : 188.83785|27.56214|18.931|0.398|-14.4|~|SDSSJ123521.05+273343.6
Should we export both?
jgillis16
2015 年 8 月 15 日
採用された回答
その他の回答 (2 件)
per isakson
2015 年 8 月 16 日
編集済み: per isakson
2015 年 8 月 16 日
Here is an example of a different approach to solve the task. The two output files, mwithrm21_reduced.txt and matches.txt, are identical besides the new line characters.
function et = cssm()
% et(1) = cssm_1();
et(2) = cssm_2();
end
function et = cssm_2()
tic
fid = fopen( 'mwithrm21.txt', 'rt' );
rows1 = textscan( fid, '%s', 'Delimiter','\n' );
fseek( fid, 0, 'bof' );
codes1 = textscan( fid, '%*s%*s%*s%*s%*s%*s%s', 'Delimiter','|' );
fclose( fid );
%
fid = fopen( 'virgomrmdist.txt', 'rt' );
codes2 = textscan( fid, '%*s%*s%*s%*s%*s%*s%s', 'Delimiter','|' );
fclose( fid );
%
ism = ismember( codes1{1}, codes2{1} );
%
fid = fopen( 'matches.txt', 'wt' );
fprintf( fid, '%s\n', rows1{1}{ism} );
fclose( fid ) ;
%
fid = fopen( 'mwithrm21_reduced.txt', 'wt' );
fprintf( fid, '%s\n', rows1{1}{not(ism)} );
fclose( fid );
et = toc;
end
r r
2021 年 5 月 11 日
0 投票
I have two files in which there are numbers in the first column that are similar and I want to print the line that matches and differs in the number of the first column in the two files:
%%%%%%%%%%%%%%%%%%%%%%% Fiel.1
fid1 = fopen( 'E1.txt', 'rt' );
T1 = textscan(fid1,'%s', 'delimiter', '\n');
%codes1 = textscan( fid1, '%*s%*s%*s%*s%*s%*s%s', 'Delimiter','|' );
fclose( fid1 );
%%%%%%%%%%%%%%%%%%%%%%%%%%Fiel.2
fid2 = fopen( 'G1.txt', 'rt' );
T2 = textscan(fid2,'%s', 'delimiter', '\n');
%codes2 = textscan( fid2, '%*s%*s%*s%*s%*s%*s%s', 'Delimiter','|' );
fclose( fid2 );
%%%%%%%%%%%%%%%%%%%%%%%%%%%
T1s = char(T1{:});
T2s = char(T2{:});
%Similar data between two files::
%[C,ix,ic] = intersect(T1s,T2s,'rows')
%Differences data between two files::
[B,ib,ib] = visdiff(T1s,T2s,'rows')
%%%%%%%%%%%%%%%%%%%%print output:::
fid = fopen( 'Similar.txt', 'wt' );%Print all similar lines
fprintf('%s\n',C)
fclose( fid ) ;
fid = fopen( 'Different.txt', 'wt' );%Print all different lines
fprintf('%s\n',B)
fclose( fid );
カテゴリ
ヘルプ センター および File Exchange で Characters and Strings についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!