problem with converting dates to numbers
古いコメントを表示
Hi,
I want to convert the 3 rd column "date" to "numbers" using datanum function in a loop.
I have few problems.
1. datenum function does not read 3rd col properly (see the file attached csv file and image for the error) 2. I need to perform this in a loop since I have large number of files.

Please see my code below.
tr = readtable('01AA002_Daily_Flow_ts.csv','Delimiter',',','ReadVariableNames',false); % Load Data
tr(1,:)=[];
%fn='01AA002_Daily_Flow_ts.csv';
dn = datenum(tr.Var3,'yyyy/mm/dd');
採用された回答
その他の回答 (2 件)
The problem isn't datenum, it's that you're trying to pass a cellstr array into it. If you're going to use readtable, use a specific format string to convert the dates on input; see doc for details and an example (albeit it converts a non-US date to US as well, but it does show the date formatting string).
Failing that, revert to the way I showed previously to parse the .csv file directly into numeric arrays, bypassing all the higher-level abstractions but leaving you with a set of double arrays that can be handled pretty simply for your needs.
ADDENDUM
Please cut 'n paste text instead of the images -- they're exceedingly difficult to read plus one can't select them to try to repeat anything you've done...
Anyway, I seem to have misspoken re: cellstring arrays and datenum; it actually accepts them just fine.
I used the import tool and retrieved both files (I have R2012b so don't have readtable so can't test it directly, but they have different forms for the date string. However, each worked just fine with datenum even as the time portion of the one file is ignored.
>> whos VarName5
Name Size Bytes Class Attributes
VarName5 32142x1 3535620 cell
>> VarName5(1:4)
ans =
'1920-01-01T07:00:00+07:00'
'1920-01-02T07:00:00+07:00'
'1920-01-03T07:00:00+07:00'
'1920-01-04T07:00:00+07:00'
>> datestr(datenum(VarName5(1:4),'yyyy-mm-dd'))
ans =
01-Jan-1920
02-Jan-1920
03-Jan-1920
04-Jan-1920
>> datestr(datenum(VarName5(1:4),'yyyy-mm-ddTHH:MM:SS'))
ans =
01-Jan-1920 07:00:00
02-Jan-1920 07:00:00
03-Jan-1920 07:00:00
04-Jan-1920 07:00:00
>>
With the new datetime '%d' format string as noted you can interpret the rest of the time string as well but datenum doesn't have that facility.
The other file is an "ordinary" YYYY-MM-DD string; should be no issues whatever with it.
Whatever problem you're having seems to be associated with the "how" of how you're reading the files, but can't see what Matlab actually complained about from the pictures without the full error text in context.
1 件のコメント
You seem to keep retrogressing past what we've already solved/shown solutions for. Why not build on the previously working solution in the previous thread remove-rows-text-at-the-bottom-of-a-csv-file? There I showed a simple way to return the values from the .csv file that mitigates the trailing disclaimer text essentially automagically. Instead you've returned to the previous case of holding all the file content in a cell array of cells which is exceedingly difficult to address owing to the need to get all the curlies and parens correct plus you can't do global addressing of cells with two-layer addressing to get subsets.
The previous file in the above thread used '-' as the date separator whereas this one uses '/' so that's one modification if choose to return the dates as y,m,d values rather than the string so that might mitigate using that altho you'll probably have to fixup the format string for datenum so it's likely a wash in writing generic code; you'll have to deal with the specific format at some point, anyway.
All that aside, start by first reading a single file and returning the specific information needed; namely the max for complete years, then look at wrapping that functionality over the files...
fmt='%*s %*d %4f/%2f/%2f %f %*[^\n]';
for i=1:length(d)
fid=fopen(d(i).name);
c=cell2mat(textscan(fid,fmt,'headerlines',1, ...
'collectoutput',1, ...
'delimiter',','));
fid=fclose(fid); % close input file
c(all(isnan(c),2),:)=[];
yr=unique(c(:,1)); % unique years in file
n=histc(c(:,1),yr); % count entries by year
yr=yr(n==(365+isleapyr(yr))); % years that are complete
i1=find(c(:,1)==yr(1),1); % first complete year in dataset
i2=find(c(:,1)==yr(end),1,'last'); % last of last complete year
c=c(i1:i2,:); % save only those entries
[~,~,iy]=unique(c(:,1)); % indices vector for grouping
mx=accumarray(iy,c(:,end),[],@max); % get maximum for each year
stn=strtok(d(i).name,'_'); % parse station name from file
% write out the results in other file (presume already open)
fprintf(fido,'%s,%d'\n',stn,length(yr)) % output station, # years
fprintf(fido,'%4d,%.1f\n', [yr mx].';) % year, max for each
end
You'll have to put in the housekeeping to create and open the output file(s*) then close after done and such, but the basic processing should be taken care of in the above...
You'll note I didn't bother to parse the station name from the file; that's just a complication of a bunch of meaningless text; I just parsed it from the input file name. The output file format is
StationName,#years
year,max
year,max
...
(*) I basically presumed in the above the idea is to consolidate all these into a single file; hence the station and number of entries in each section to aid reading. If again want one per station, then as the sample in the other thread demonstated, find some common name-generating pattern here as well.
Also note the utility function isleapyr is one of my little helpers...
function is=isleapyr(yr)
% returns T for input year being a leapyear
is=eomday(yr,2)==29;
20 件のコメント
Damith
2015 年 12 月 4 日
In the snippet I posted c holds the data from each input file in succession and writes all to a single output file. Logic to name and open that file is left as "exercise for the student"; as the comments state the snippet presumes that is already done prior to beginning the loop.
You modified the fprintf lines however and used the same file handle variable as that used for the input file, and I see no code to open any different output file.
While again you don't post the full context of the error, in this case there is only one line that references yr(1) and since the message indicates that the array is empty that indicates there were no full years for that particular file found. Set a breakpoint and use the debugger to see precisely what's happening; I ran the script on the sample file so I know the logic works for case where there is at least one complete year. If it's possible there are no complete years, need to add a test for that where the count is done or, possibly, use a try...catch block. It would likely be of interest to display [unique(c(:,1) n] in event this occurs so you can decide what to do with that file or maybe you do know that is a possibility and don't care in which case you can simply put a continue in the catch clause and go on.
I did see a problem in a couple of the formatting strings for the output file; I had just looked at the [yr mx] array at the command line and typed those on the fly. I did edit them in the Answer so pick up those mods from there.
For the given file the script gives the following result--
>> dai
01AA002,8
1969,161.00
1970,280.00
1971,213.00
1972,168.00
1973,198.00
1974,255.00
1975,128.00
1976,281.00
>> type dai
fmt='%*s %*d %4f/%2f/%2f %f %*[^\n]';
for i=1:length(d)
fid = fopen(d(i).name);
c=cell2mat(textscan(fid,fmt,'headerlines',1, ...
'collectoutput',1, ...
'delimiter',','));
fid=fclose(fid); % close input file
c(all(isnan(c),2),:)=[];
yr=unique(c(:,1)); % unique years in file
n=histc(c(:,1),yr); % count entries by year
yr=yr(n==(365+isleapyr(yr))); % years that are complete
i1=find(c(:,1)==yr(1),1); % first complete year in dataset
i2=find(c(:,1)==yr(end),1,'last'); % last of last complete year
c=c(i1:i2,:); % save only those entries
[~,~,iy]=unique(c(:,1)); % indices vector for grouping
mx=accumarray(iy,c(:,end),[],@max); % get maximum for each year
stn=strtok(d(i).name,'_'); % parse station name from file
% write out the results in other file (presume already open)
fprintf('%s,%d\n',stn,length(yr)) % output station, # years
fprintf('%4d,%.2f\n', [yr mx].') % year, max for each
end
>>
ADDENDUM
NB: The above was tweaked to simply output the results to the command line rather than write an output file; again you've got to open a new output file with a different handle than that used for the input files before starting the loop.
dpb
2015 年 12 月 5 日
"Noticed that it does not store each file into an array. "c" stores only the last file. How can I modify to store mutliple files without assigning into a cell array?"
There's no reason to store more than one file at a time; you have no need for any data other than the one over which you're doing the processing at any one time. Ergo, don't make things more difficult than needs must be.
IF you were to need multiple years at one time, then it might be necessary to use cell arrays to hold disparate sizes/dates at one time, granted, but don't worry about solving a problem of that type until it's necessary. At that point I'd likely either
- Do the reduction as shown to minimal dataset for the file first, then assign that array to a cell array element, or, alternatively,
- Create a single 3D array that grows to hold each station by plane with empty placeholders for those locations without data at any given station/plane.
The choice would depend upon just what would be needed to be done with the data simultaneously and just how disparate those datasets might be so as to how much wasted space would be needed to do the second option.
Other ideas for storage would also likely present themselves for any specific case as well that might be better than either of the above.
That's exactly what my example does if you'll simply open the output file first, before the loop and not close it until everything is done excepting I'd not choose to build a text file with all that missing data as you've shown the one record; that's basically the option outlined above as #2. I'd likely only save the valid data initially, then build a (probably sparse) array from it for processing. There's not much chance one would look at such a file manually, anyway, there's too much stuff there to deal with by hand so why not be more concise?
What's the next step; that would likely again control what I'd think would be the more suitable file format.
But, if you're adamant (or somebody else has made the requirement to put the file out in that specific [silly imo :) ] format), then creating an array of nan(nSta,maxYr) and populating it by row in the loop, saving the station in a linear vector since it's string, not numeric, and then writing it at the end will leave you with the full dataset in memory for further analyses as well. Better might be sparse depending upon just how many stations (rows) there are; 165 * 2000 is "only" 2+ MB, however, which is not a terribly large dataset by today's standards to handle but the storage is quite inefficient. Again, it all depends upon where you're headed in the end.
Damith
2015 年 12 月 7 日
dpb
2015 年 12 月 7 日
I observe you took out the line to clean up the end of the file on reading after the disclaimer text altho don't think that should cause this problem specifically.
But, I notice for the given file in the variable editor that c is only [19,NaN,NaN,NaN] so clearly something is either wrong with the input data file not following the rules for the others or somesuch. Maybe you're back to another one of those tab-delimited instead of comma-delimited files, I don't know but that's the cause for the failure; '19' won't match any year.
Now, why that's the returned data is something else related to the input file. You do need to reinsert the cleanup line, however.
dpb
2015 年 12 月 7 日
Oh, I see you still haven't opened an input file for to collect the output prior to the loop...don't know why this seems such a difficult concept to get across.
dpb
2015 年 12 月 7 日
And also didn't check before but I see that i is 20 so it's the last file in the list that's got "issues"...
dpb
2015 年 12 月 8 日
Well, you know you have length(d) files and length(year) years so the first part should be pretty obvious--
mxary=nan(length(d),length(year));
Then
[~,iy]=ismember(year,yr);
mxary(i,iy)=mx;
should put them in the right locations (NB: air-code, untested).
The stn variable this way could be come a cellstring array to hold the station IDs as I presume as per the previous thread they're not the same length so a character string array would require padding.
So what was wrong with the file???
dpb
2015 年 12 月 8 日
Btw, the above assumes the complete years are not necessarily contiguous in the dataset; if they're known to be then all you need is the the first location index and length to place in the proper location. This is simply an offset calculation based on the starting years.
Also you mention csvwrite above; you can't write a mixed-content nor nonnumeric data with it' you'll have to use fprintf to output the file.
Damith
2015 年 12 月 8 日
dpb
2015 年 12 月 8 日
Debugger to the rescue...
dpb
2015 年 12 月 8 日
Oh, as said, "air code". It's the first return value from ismember that's the logical array instead of the second...
dpb
2015 年 12 月 8 日
You already know which are the complete years; just don't use yr(1):yr(end) but will have to locate those whose years match the remaining values in the data array. Use ismember instead with the yr vector on the year column in the data array. This shouldn't take long to do a sample case at the command line to understand the logic.
Damith
2015 年 12 月 10 日
Damith
2015 年 12 月 10 日
カテゴリ
ヘルプ センター および File Exchange で Data Type Identification についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!




