Calculate similarity between data columns in the data matrix

2 ビュー (過去 30 日間)
EK
EK 2024 年 12 月 4 日
コメント済み: Andrew Frane 2024 年 12 月 7 日
Hello
I have an Excel file containing recording data. The first four columns are log data:
  • Column 1 logs the stimulus IDs over time. Values such as 1, 2, 3, 4, etc., represent stimulus IDs, while 0 indicates no stimulus was presented.
  • Column 2 logs the intervals of stimulus presentation:
  • 1 = pre-stimulus interval,
  • 2 = stimulus interval,
  • 3 = post-stimulus interval.
I need to calculate the correlation of data in each column (from column 6 to the last column) with the data in column 5 during the stimulus interval (i.e., where the value in column 2 equals 2) and do it only for specific stimuli in the analysis, such as Stimulus 1 and Stimulus 3 (Column 1). I would like to save the correlation coefficient for each column. Could you help me implement this in MATLAB?
I am attaching example data file below
Many Thanks in Advance!

採用された回答

Andrew Frane
Andrew Frane 2024 年 12 月 4 日
編集済み: Andrew Frane 2024 年 12 月 7 日
First, a vocabulary note: "Similarity" and "correlation" don't mean quite the same thing. You can use the corrcoef function to get a correlation matrix, which gives the bivariate Pearson correlation coefficients for all the pairs of columns in the inputted matrix. If I'm understanding you correctly, what you want is something like this (I'm doing it for Stimulus 3, but you can straightforwardly adapt the code for any stimulus number just by replacing 3 with the desired stimulus number):
% import data from Excel file into matrix
% (column 1 is stimulus ID number, column 2 is interval ID number,
% additional columns are additional variables)
data = readmatrix('Test3.xlsx') ;
% reduced matrix, which only contains rows from data where the column 1 value is 3
% and the column 2 value is 2, i.e., only contains data for Stimulus 3
% during Interval 2 (Interval 2 is the interval during the stimulus itself)
dataDuringStimulus3 = data( data(:, 1) == 3 & data(:, 2) == 2, : ) ;
% correlation matrix for columns 5:end in that reduced matrix
% (e.g., row 1 or column 1 of this correlation matrix gives the correlations of
% variable 5 with variables 5:end; row 2 or column 2 of this correlation
% matrix gives the correlations of variable 6 with variables 5:end; etc., where
% "variable" number refers to the column number in the original data matrix)
correlationMatrixDuringStimulus3 = corrcoef( dataDuringStimulus3(:, 5:end) ) ;
% vector giving correlation of variable 5 with each subsequent variable
% (i.e., with variables 6, 7, 8, etc.) during Stimulus 3; we just extract the
% first row from the correlation matrix and omit the first value (i.e., we omit
% the correlation of variable 5 with itself, which is of course 1)
correlationOfVar5WithEachSubsequentVarDuringStimulus3 = ...
correlationMatrixDuringStimulus3(1, 2:end) ;
Here's the same thing, but for Stimulus 1 and 3 pooled together:
% import data from Excel file into matrix
% (column 1 is stimulus ID number, column 2 is interval ID number,
% additional columns are additional variables)
data = readmatrix('Test3.xlsx') ;
% reduced matrix, which only contains rows from data where the column 1 value is 1
% or 3 and the column 2 value is 2, i.e., only contains data for Stimulus 1 or 3
% during Interval 2 (Interval 2 is the interval during the stimulus itself)
dataDuringStimulus1or3 = data( data(:, 1) == 1 | data(:, 1) == 3 & ...
data(:, 2) == 2, : ) ;
% correlation matrix for columns 5:end in that reduced matrix
% (e.g., row 1 or column 1 of this correlation matrix gives the correlations of
% variable 5 with variables 5:end; row 2 or column 2 of this correlation
% matrix gives the correlations of variable 6 with variables 5:end; etc., where
% "variable" number refers to the column number in the original data matrix)
correlationMatrixDuringStimulus1or3 = corrcoef( dataDuringStimulus1or3(:, 5:end) ) ;
% vector giving correlation of variable 5 with each subsequent variable
% (i.e., with variables 6, 7, 8, etc.) during Stimulus 1 or 3; we just extract the
% first row from the correlation matrix and omit the first value (i.e., we omit
% the correlation of variable 5 with itself, which is of course 1)
correlationOfVar5WithEachSubsequentVarDuringStimulus1or3 = ...
correlationMatrixDuringStimulus1or3(1, 2:end) ;

その他の回答 (1 件)

Sameer
Sameer 2024 年 12 月 5 日
Hi @EK
From my understanding, you want to calculate the correlation between a specific column ("column 5") and other columns in your dataset during specific conditions: when a stimulus is presented (interval value 2) and for certain stimulus IDs (1 and 3).
1. Select columns for stimulus IDs, intervals, and the column of interest (column 5).
dataArray = table2array(data);
stimulusID = dataArray(:, 1);
interval = dataArray(:, 2);
column5 = dataArray(:, 5);
2. Identify rows where the stimulus interval is 2 and the stimulus ID is either 1 or 3.
filterIdx = (interval == 2) & (stimulusID == 1 | stimulusID == 3);
3. Loop through each column from column 6 to the last column and calculate the correlation with column 5 for the filtered data.
numColumns = size(dataArray, 2);
correlationCoefficients = zeros(1, numColumns - 5);
for col = 6:numColumns
columnData = dataArray(filterIdx, col);
column5Data = column5(filterIdx);
correlationCoefficients(col - 5) = corr(column5Data, columnData);
end
Hope this helps!
  3 件のコメント
EK
EK 2024 年 12 月 7 日
Sorry, I could not run your code. Pooling the Stimulus 1 and Stimulus 3 data together is correct.
Andrew Frane
Andrew Frane 2024 年 12 月 7 日
I edited my accepted answer so it also includes a version for Stimulus 1 and 3 pooled together.

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeDirection of Arrival Estimation についてさらに検索

製品


リリース

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by