get rid of series that contain useless values

1 回表示 (過去 30 日間)
Christos Papagrigoriou
Christos Papagrigoriou 2021 年 12 月 9 日
編集済み: Adam Danz 2021 年 12 月 15 日
Hello,
I have imported a table in a script and I want to create a loop that deletes all the series in the table that contain cells with 'Not collected' and 'Unknown Values'. The code that currently use is
data0= readtable('NSCLCR01Radiogenomic_DATA_LABEL.csv');
data1=data0(:,[3,5,7,18,25,26,27,28,29,30,31,32]);
data1(49,:) = [];
for i = 1: width(data1)
for j = 1:12
s1 = data1(i,j);
s2 = 'Not collected';
s3 = 'Unknown';
tf = strcmp(s1,s2);
tf2 = strcmp(s1,s3) ;
if tf == 1 || tf2 == 1 ;
else
newdata(i,j) = data1(i,j)
end
end
end
newdata(~cellfun('isempty',R))
but it does not seem to gimme back the desirable results.
cheersxxx
  2 件のコメント
Adam Danz
Adam Danz 2021 年 12 月 9 日
Do you want to delete rows or columns that contain those key words?
BTW, the file you uploaded is xlsx (to csv) and contains only 17 columns but your code is looking for up to 32 columns.
Christos Papagrigoriou
Christos Papagrigoriou 2021 年 12 月 9 日
hello, rows that contain those words.
ps: Yh my original csv was pretty big that is the reason why I made it a little smaller ( refer to line 2) just to keep what i actually want.
cheers

サインインしてコメントする。

採用された回答

Adam Danz
Adam Danz 2021 年 12 月 9 日
編集済み: Adam Danz 2021 年 12 月 15 日
  1. Load the data
  2. use varfun to determine which columns of the table are cell-strings or strings
  3. use ismember find a list of key words ("not collected", "unknown", etc). This code ignores case.
  4. Use indexing and any to eliminate any row that contains a key word.
data0= readtable('https://www.mathworks.com/matlabcentral/answers/uploaded_files/828830/NSCLCR01Radiogenomic_DATA_LABEL.xlsx');
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
isstr = varfun(@(c)iscellstr(c)||isstring(c), data0,'OutputFormat','uniform');
nullKeys = {'Not collected', 'Unknown'}; % not case sensitive; add more as needed
dataStr = data0{:, isstr};
isnull = ismember(lower(dataStr), lower(nullKeys));
% Remove rows of table that contains a null indicator
rowContainsNull = any(isnull,2);
data0(rowContainsNull, :) = []
data0 = 150×17 table
CaseID PatientAffiliation AgeAtHistologicalDiagnosis Var4 Gender Var6 SmokingStatus Var8 Histology Var10 EGFRMutationStatus KRASMutationStatus ALKTranslocationStatus AdjuvantTreatment Chemotherapy Radiation Recurrence ___________ __________________ __________________________ ____ __________ ____ _____________ ____ __________________ _____ __________________ __________________ ______________________ _________________ ____________ _________ __________ {'AMC-001'} {'Stanford'} 34 NaN {'Male' } NaN {'Nonsmoker'} NaN {'Adenocarcinoma'} NaN {'Wildtype'} {'Mutant' } {'Wildtype'} {'No' } {'No' } {'No'} {'yes'} {'AMC-003'} {'Stanford'} 69 NaN {'Female'} NaN {'Nonsmoker'} NaN {'Adenocarcinoma'} NaN {'Mutant' } {'Wildtype'} {'Wildtype'} {'No' } {'No' } {'No'} {'no' } {'AMC-004'} {'Stanford'} 80 NaN {'Female'} NaN {'Nonsmoker'} NaN {'Adenocarcinoma'} NaN {'Wildtype'} {'Wildtype'} {'Wildtype'} {'No' } {'No' } {'No'} {'no' } {'AMC-005'} {'Stanford'} 76 NaN {'Male' } NaN {'Former' } NaN {'Adenocarcinoma'} NaN {'Mutant' } {'Wildtype'} {'Wildtype'} {'No' } {'No' } {'No'} {'yes'} {'AMC-009'} {'Stanford'} 61 NaN {'Male' } NaN {'Former' } NaN {'Adenocarcinoma'} NaN {'Wildtype'} {'Mutant' } {'Wildtype'} {'No' } {'No' } {'No'} {'no' } {'AMC-010'} {'Stanford'} 42 NaN {'Female'} NaN {'Nonsmoker'} NaN {'Adenocarcinoma'} NaN {'Mutant' } {'Wildtype'} {'Wildtype'} {'No' } {'No' } {'No'} {'no' } {'AMC-011'} {'Stanford'} 66 NaN {'Female'} NaN {'Former' } NaN {'Adenocarcinoma'} NaN {'Wildtype'} {'Mutant' } {'Wildtype'} {'Yes'} {'Yes'} {'No'} {'yes'} {'AMC-012'} {'Stanford'} 70 NaN {'Female'} NaN {'Nonsmoker'} NaN {'Adenocarcinoma'} NaN {'Mutant' } {'Wildtype'} {'Wildtype'} {'No' } {'No' } {'No'} {'yes'} {'AMC-013'} {'Stanford'} 67 NaN {'Female'} NaN {'Nonsmoker'} NaN {'Adenocarcinoma'} NaN {'Mutant' } {'Wildtype'} {'Wildtype'} {'No' } {'No' } {'No'} {'no' } {'AMC-014'} {'Stanford'} 78 NaN {'Female'} NaN {'Former' } NaN {'Adenocarcinoma'} NaN {'Wildtype'} {'Wildtype'} {'Wildtype'} {'No' } {'No' } {'No'} {'no' } {'AMC-016'} {'Stanford'} 65 NaN {'Male' } NaN {'Former' } NaN {'Adenocarcinoma'} NaN {'Wildtype'} {'Wildtype'} {'Wildtype'} {'No' } {'No' } {'No'} {'no' } {'AMC-018'} {'Stanford'} 69 NaN {'Female'} NaN {'Former' } NaN {'Adenocarcinoma'} NaN {'Wildtype'} {'Wildtype'} {'Wildtype'} {'No' } {'No' } {'No'} {'no' } {'AMC-020'} {'Stanford'} 61 NaN {'Female'} NaN {'Former' } NaN {'Adenocarcinoma'} NaN {'Wildtype'} {'Wildtype'} {'Wildtype'} {'Yes'} {'Yes'} {'No'} {'no' } {'AMC-021'} {'Stanford'} 78 NaN {'Female'} NaN {'Former' } NaN {'Adenocarcinoma'} NaN {'Wildtype'} {'Mutant' } {'Wildtype'} {'No' } {'No' } {'No'} {'no' } {'AMC-022'} {'Stanford'} 77 NaN {'Female'} NaN {'Former' } NaN {'Adenocarcinoma'} NaN {'Mutant' } {'Wildtype'} {'Wildtype'} {'No' } {'No' } {'No'} {'no' } {'AMC-023'} {'Stanford'} 76 NaN {'Female'} NaN {'Nonsmoker'} NaN {'Adenocarcinoma'} NaN {'Mutant' } {'Wildtype'} {'Wildtype'} {'No' } {'No' } {'No'} {'no' }
Or, if you want to remove columns with key words,
% Remove cols of table that contains a null indicator
colContainsNull = any(isnull,1);
tblColIdx = ismember(cumsum(isstr) .* isstr, find(colContainsNull));
data0(:, tblColIdx) = []

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeWhos についてさらに検索

製品

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by