Read CSV with row in quotes (string and numbers)

7 ビュー (過去 30 日間)
Benjamin Brodwolf
Benjamin Brodwolf 2020 年 3 月 10 日
コメント済み: Benjamin Brodwolf 2020 年 3 月 11 日
I have no idea to read the .CSV file correctly.
Here are the titles incl. the first three rows as example (in the File, there are over 22'000):
id,date,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,condition,grade,sqft_above,sqft_basement,yr_built,yr_renovated,zipcode,lat,long,sqft_living15,sqft_lot15
"7129300520,""20141013T000000"",221900,3,1,1180,5650,""1"",0,0,3,7,1180,0,1955,0,""98178"",47.5112,-122.257,1340,5650"
"6414100192,""20141209T000000"",538000,3,2.25,2570,7242,""2"",0,0,3,7,2170,400,1951,1991,""98125"",47.721,-122.319,1690,7639"
"5631500400,""20150225T000000"",180000,2,1,770,10000,""1"",0,0,3,6,770,0,1933,0,""98028"",47.7379,-122.233,2720,8062"
Every row is in quotes, and the data are strings (double quotes) and numbers. The demiliter is the comma ",".
I tried this:
opts = detectImportOptions('house_data.csv')
T = readtable('house_data.csv', opts);
But what I get is a nx1-table but it should be nx21-table (21 is the amount of the features [id, date, price, bedroms, ...])
  3 件のコメント
Benjamin Brodwolf
Benjamin Brodwolf 2020 年 3 月 10 日
編集済み: Benjamin Brodwolf 2020 年 3 月 10 日
Is it possible to remove this quotes via Matlab?
Josh Zagorski
Josh Zagorski 2020 年 3 月 10 日
I don't think so, I can't read it as a .csv into excel either because of the quotes. At least not easier than removing the quotes at the beginning and end of each line...

サインインしてコメントする。

採用された回答

Star Strider
Star Strider 2020 年 3 月 10 日
編集済み: Star Strider 2020 年 3 月 10 日
I have no idea to read the .CSV file correctly.
One approach:
C = {"7129300520,""20141013T000000"",221900,3,1,1180,5650,""1"",0,0,3,7,1180,0,1955,0,""98178"",47.5112,-122.257,1340,5650"
"6414100192,""20141209T000000"",538000,3,2.25,2570,7242,""2"",0,0,3,7,2170,400,1951,1991,""98125"",47.721,-122.319,1690,7639"
"5631500400,""20150225T000000"",180000,2,1,770,10000,""1"",0,0,3,6,770,0,1933,0,""98028"",47.7379,-122.233,2720,8062"};
Vn = {'id','date','price','bedrooms','bathrooms','sqft_living','sqft_lot','floors','waterfront','view','condition','grade','sqft_above','sqft_basement','yr_built','yr_renovated','zipcode','lat','long','sqft_living15','sqft_lot15'};
T1 = array2table(squeeze(split([C{:}], ',')),'VariableNames',Vn);
producing:
T1 =
3×21 table
id date price bedrooms bathrooms sqft_living sqft_lot floors waterfront view condition grade sqft_above sqft_basement yr_built yr_renovated zipcode lat long sqft_living15 sqft_lot15
____________ ___________________ ________ ________ _________ ___________ ________ ______ __________ ____ _________ _____ __________ _____________ ________ ____________ _________ _________ __________ _____________ __________
"7129300520" ""20141013T000000"" "221900" "3" "1" "1180" "5650" ""1"" "0" "0" "3" "7" "1180" "0" "1955" "0" ""98178"" "47.5112" "-122.257" "1340" "5650"
"6414100192" ""20141209T000000"" "538000" "3" "2.25" "2570" "7242" ""2"" "0" "0" "3" "7" "2170" "400" "1951" "1991" ""98125"" "47.721" "-122.319" "1690" "7639"
"5631500400" ""20150225T000000"" "180000" "2" "1" "770" "10000" ""1"" "0" "0" "3" "6" "770" "0" "1933" "0" ""98028"" "47.7379" "-122.233" "2720" "8062"
I will let you take it from there.
EDIT — (10 Mar 2020 at 22:03)
For example, extracting the numeric values is straightforward:
Price = str2double(T1.price)
produces:
Price =
221900
538000
180000
  4 件のコメント
Benjamin Brodwolf
Benjamin Brodwolf 2020 年 3 月 11 日
Thank you Star Strider and Josh Zagorski for your help.
I found a way to load the value as string, remove the qoutes, split it in string arrays and convert it to a table. :-)
Star Strider
Star Strider 2020 年 3 月 11 日
As always, my pleasure!

サインインしてコメントする。

その他の回答 (1 件)

Cris LaPierre
Cris LaPierre 2020 年 3 月 11 日
編集済み: Cris LaPierre 2020 年 3 月 11 日
I'd take advantage of some of the built in capabilities of readtable. By setting the appropriate options, MATLAB can handle the mix of quotes and commas for you automatically, bringing the data into a table with the desired data type already set.
% detect and set import options
opts = delimitedTextImportOptions("Delimiter",{'"',','},...
"ConsecutiveDelimitersRule","join",...
"LeadingDelimitersRule","ignore",...
"VariableNamesLine",1,...
"NumVariables",21,...
"ExtraColumnsRule","ignore",...
"DataLines",2);
% Set data types
opts = setvartype(opts,[1,3:8,13:21],'double');
opts = setvartype(opts,2,"datetime");
opts = setvartype(opts,[9,10],"logical");
opts = setvartype(opts,[11,12,15,16,17],"categorical");
% Define datetime format for input and display
opts = setvaropts(opts,2,"InputFormat",'yyyyMMdd''T''HHmmss',"DatetimeFormat",'d MMM yyyy');
% Import data
data = readtable("house_data.csv",opts)
To explain, I set quotes and commas as the delimiter. If there are more than one of these together, they are treated as a single delimeter. I had to tell it what line contained the variable names, as well as how many there were. I also had to tell it what row the data started on.
Hopefully the code is fairly straightforward, but let me know if you have any questions.
  1 件のコメント
Benjamin Brodwolf
Benjamin Brodwolf 2020 年 3 月 11 日
Wow! That's awesome. I never thougth it would be possible to create a table this way. Really nice to know. I learned a lot.

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeTables についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by