mex code to read a large tab delimited file

Question

abc 2014 年 1 月 11 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/112120-mex-code-to-read-a-large-tab-delimited-file

編集済み: dpb 2014 年 1 月 12 日

I am writing a mex file to read a tab delimited file with 16 columns each with about five lac rows. The file is tab-delimited.The first column is a date string. The second contains integers. The third contains a character which may be empty. The rest of the columns are either integers or characters. I have a sample image attached. Here is the code I wrote. It does not seem to work. Also I want to ignore the first 3 header lines. How can I do that?

     #include "mex.h"
     #include<stdio.h>
    void mexFunction(int nlhs,mxArray *plhs[],int nrhs,const mxArray *prhs[]) 
    {
        double *Load1,*Tamb1,*TOT1,*WindA1,*WindB1,*WindC1,*Tamb2;
   int M,i,loop;
        mxChar *filename,*QCLoad,*QCTamb,*QCTOT,*QCWindA1,*QCWindB1,*QCWindC1,*QCTamb2;
        filename = mxGetChars (prhs[0]);
        plhs[0]=mxCreateDoubleMatrix(M,1,mxREAL);
        Date1=mxGetChars(plhs[0]);
        plhs[1]=mxCreateDoubleMatrix(M,1,mxREAL);
        Load1=mxGetPr(plhs[1]);
        plhs[2]=mxCreateDoubleMatrix(M,1,mxREAL);
        QCLoad=mxGetChars(plhs[2]);
        plhs[3]=mxCreateDoubleMatrix(M,1,mxREAL);
        Tamb1=mxGetPr(plhs[3]);
        plhs[4]=mxCreateDoubleMatrix(M,1,mxREAL);
        QCTamb=mxGetChars(plhs[4]);
        plhs[5]=mxCreateDoubleMatrix(M,1,mxREAL);
        TOT1=mxGetPr(plhs[5]);
        plhs[6]=mxCreateDoubleMatrix(M,1,mxREAL);
        QCTOT=mxGetChars(plhs[6]);
        plhs[7]=mxCreateDoubleMatrix(M,1,mxREAL);
        WindA1=mxGetPr(plhs[7]);
        plhs[8]=mxCreateDoubleMatrix(M,1,mxREAL);
        QCWindA1=mxGetChars(plhs[8]);
        plhs[9]=mxCreateDoubleMatrix(M,1,mxREAL);
        WindB1=mxGetPr(plhs[9]);
        plhs[10]=mxCreateDoubleMatrix(M,1,mxREAL);
        QCWindB1=mxGetChars(plhs[10]);
        plhs[11]=mxCreateDoubleMatrix(M,1,mxREAL);
        WindC1=mxGetPr(plhs[11]);
        plhs[12]=mxCreateDoubleMatrix(M,1,mxREAL);
        QCWindC1=mxGetChars(plhs[12]);
        plhs[13]=mxCreateDoubleMatrix(M,1,mxREAL);
        Tamb2=mxGetPr(plhs[13]);
        plhs[14]=mxCreateDoubleMatrix(M,1,mxREAL);
        QCTamb2=mxGetChars(plhs[14]);
                FILE *ptr_file;
                char buf[1000000];
                ptr_file =fopen(filename,"r");
    fscanf(ptr_file,"%s %f %s %f %s %f %s %f %s %f %s %f %s %f %s",&Date1,&Load1,&QCLoad,&Tamb1,&QCTamb,&TOT1,&QCTOT,&WindA1,&QCWindA1,&WindB1,&QCWindA1,&WindC1,&QCWindA1,&Tamb2,&QCTamb2);
    }

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

dpb 2014 年 1 月 11 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/112120-mex-code-to-read-a-large-tab-delimited-file#answer_120619

MATLAB Online で開く

doc textscan % maybe

What's the point of mex for a builtin, anyway? Or, if you want less overhead than textscan, use fscanf directly

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

dpb 2014 年 1 月 11 日

MATLAB Online で開く

>> which fscanf
built-in (C:\ML_R2012b\toolbox\matlab\iofun\fscanf)
>> which textscan
built-in (C:\ML_R2012b\toolbox\matlab\iofun\textscan)
>>

So, the answer is "yes". I'd expect it unlikely you're going to beat TMW, at least with a naive implementation.

I would expect that you might find that fscanf performs a little better than does textscan because it only handles arrays where as textscan uses cell arrays and there is some overhead associated with them. But, otoh, it can only handle an array of a single type; not mixed data types so you may be stuck there.

You can also try textread which is much like textscan excepting that it does return multiple variables of mixed types altho there are issues with mixing string and numeric in some instances.

The way to speed up your application would be to get away from using formatted files entirely and go to unformatted stream or .mat files. That eliminates all the i/o conversion plus will shorten them considerably in size.

You may also find that instead of using the options to skip fields/columns that it's faster to read the whole file into memory and then delete the unwanted columns instead.

abc 2014 年 1 月 11 日

編集済み: abc 2014 年 1 月 11 日

@dpb Thank you for your feedback. The data files are not in my hands, they are provided to me. Converting them to .mat files will take as much time as textscan. All the other function you mentioned too will consume a lot of time given the amount of data I am dealing with I believe. So you are saying the mex file would not help with speed?

dpb 2014 年 1 月 12 日

編集済み: dpb 2014 年 1 月 12 日

I wrote the following earlier but don't see it -- if it shows up later duplicated, sorry...

See if can get the other party to provide the files unformatted instead or in addition to the formatted ones.

That being not possible, you can test the idea of your mex-ing skills being better than TMW's by writing a standalone Fortran or C app that just reads the files and time it. The mex overhead will only add to that minimum.

My thinking is that those formatted i/o calls will be eventually translated to the compiler runtime i/o library/system calls anyway, just as, finally, are those of the builtin Matlab routines. The only reason I could see for that root level to be significantly slower in Matlab would be overhead in cell handling and the facility to skip columns, etc. that they have incorporated. The former is why I suggested you might want to look at textread instead of textscan to eliminate the cell stuff but if you try to implement the skip at the reading level you'll have the same problems that they do for that portion.

There is, of course, one other "trick" -- read the whole file as stream binary (via fread) as character and do the translation all internally.

ADDENDUM:

Just how big is "big" for a typical file?

サインインしてコメントする。

mex code to read a large tab delimited file

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

mex code to read a large tab delimited file

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

4 件のコメント 2 件の古いコメントを表示2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示