How to catalogue a dataset during different stages of data filtering?

3 ビュー (過去 30 日間)
Carter Hoffman
Carter Hoffman 2022 年 10 月 19 日
コメント済み: William Rose 2022 年 12 月 8 日
Hey all,
I run an application which generates high-throughput particle fluorescence data in 5 channels, meaning there are 5 values for each particle that I care about (in order, blue, green, orange, red, size). What's more, I use channel 5 as a normalizing signal, so for each particle, there are actually 9 values I care about (1-5, as well as 1/5, 2/5, 3/5 and 4/5). The data come from a detector already in a matrix, and most of my operations involve first calling individual column vectors some binning values from the matrix as so:
data1 = load('data_file');
blue1 = data1(:,1)
[n_blue1,E_blue1] = histcounts(blue1,nbins);
[num_blue1,x_blue1] = hist(blue1,nbins);
N_blue1 = n_blue1./sum(n_blue1);
...............
size1 = data1(:,5)
[n_size1,E_size1] = histcounts(size1,nbins);
...............
In total, there are 72 column vectors that I pull out of a single dataset for downstream analysis.
However, the data also goes through several filtering steps, to eliminate outlying data etc. Something like:
criterion1 = ________
data2 = all data1 that meets criterion1
criterion2 = _________
data3 = all data2 that also meets criterion2
and eventually I will have several data files, where length(data3) < length(data2) < length(data1) because I am eliminating data at each step.
The problem is that I like to generate several plots between filtering steps so I can show the effect of the data filtering, which requires calling the column vectors and binning values from the data matrices again. I have been doing something like this:
criterion1 = ________
data2 = all data1 that meets criteria1
blue2 = data2(:,1)....
[n_blue2,E_blue2] = histcounts(blue2,nbins);
...............
size2 = data2(:,5)
But this obviously requires LOTS of manual inputting of values, rather than automated storage and calling.
plainly, I am looking for a better way of doing this. I have written an organization function that will call all of the column vectors and binning values, except the only way I've been able to code it, it overwrites them each time. It is something like
z = data2 or data3 or whatever
(blue, n_blue,....... other_outputs) = organizationfunction(z);
What would be better is if I could program the function to organize these column vectors and binning values into a cell or struct, and to automatically add a subscript to this struct/cell. For example
data{i} = load('data_file');
filtering criterion1 = ________
data{i+1} = filteringfunction( data{i} )
filteringfunction eliminates unwanted data and stores filtered datafile in new cell
(many outputs) = organizationfunction( data {i +1} )
organizationfunction calls column vectors and binning values of data{i + 1}
the hope is that rather than give everything at every step its own name/subscript, I could instead call the blue vector after 2 filtering steps with:
data{i+2}.blue, or something of the like.
Thanks in advance! If someone reads this and thinks I should be pointed toward some reading, I am all ears.
Carter

回答 (1 件)

William Rose
William Rose 2022 年 12 月 8 日
Here is an example.
data1=randn(1000,5); %1000x5 array of random numbers
size1=data1(:,5); %size data is in column 5
partSize={size1}; %cell array with 1 element
sizeA=size1(abs(size1)<2.5); %filtered size data
partSize{end+1}=sizeA; %add sizeA to the cell
sizeA=sizeA(sizeA>-2); %filtered size data
partSize{end+1}=sizeA; %add latest sizeA to the cell
sizeA=sizeA(sizeA<2); %filtered size data
partSize{end+1}=sizeA; %add latest sizeA to the cell
cellfun(@size,partSize,'UniformOutput',false) %display sizes of columns
ans = 1×4 cell array
{[1000 1]} {[975 1]} {[960 1]} {[946 1]}
%Thanks to @the cyclist for the line above.
Try it. Good luck.
  1 件のコメント
William Rose
William Rose 2022 年 12 月 8 日
@Carter Hoffman, as you probably know, you can access the data within partSize as follows:
a=partSize{2}; %extract second size vector
b=partSize{3}(10); %element 10 of third size vector
And so on.

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeBartlett についてさらに検索

製品


リリース

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by