Remove rows in an array containing a non-matching element

I have a datafile data.txt:
gene12 489 483 838
gene82 488 763 920
gene31 974 837 198
gene45 489 101 378
gene59 89 827 138
I have another data file genelist.txt that lists just genes I'm interested in for my study:
gene45
gene59
gene61
I want to modify the first dataset by removing all rows where the gene isn't found in the second list so basically end up with this array:
gene45 489 101 378
gene59 89 827 138
How do I go about doing this?

 採用された回答

Guillaume
Guillaume 2017 年 4 月 11 日

2 投票

Probably the easiest:
geneswithdata = readtable('data.txt'); %load file as a table
geneswithdata.Properties.VariableNames{1} = 'genes'; %rename first column for clarity (optional).
%I would also rename all the other columns
genesonly = readtable('genelist.txt'); %load as a table
genesonly.Properties.VariableNames = {'genes'}; %rename columns. Common columns must have the same name
filteredgenes = innerjoin(genesonly, geneswithdata);
Done.
Using ismember that last line could be done as:
found = ismember(geneswithdata, genesonly);
filteredgenes = geneswithdata(found, :);
Using intersect (rather than setdiff) it could be done as:
[~, tokeep] = intersect(geneswithdata, genesonly);
filteredgenes = geneswithdata(tokeep, :);

3 件のコメント

astein
astein 2017 年 4 月 12 日
This almost got me there. The only issue is that I lose the first line/first gene of genelist.txt. This is an easy fix if I simply edit genelist.txt to:
genes
gene45
gene59
gene61
Is there another way I can prevent losing that first line if I'm loading files as tables?
Thank you!
Guillaume
Guillaume 2017 年 4 月 12 日
By default, readtable considers the first line as a header line that is to be used to name the variables. To tell it to not do that:
readtable(___, 'ReadVariableNames', false)
readtable is extremely flexible. Look at its documentation to see all the options available.
astein
astein 2017 年 4 月 16 日
Thank you very much for the help!

サインインしてコメントする。

その他の回答 (1 件)

Image Analyst
Image Analyst 2017 年 4 月 11 日

0 投票

Look into ismember() or setdiff()

1 件のコメント

astein
astein 2017 年 4 月 11 日
編集済み: astein 2017 年 4 月 11 日
I don't know how to use either for this purpose. setdiff() is going to give me the genes they don't have in common? I want the genes they have in common. ismember() gives me a logical array. I run into the same issue of how do I use the array to pull out only the rows that are "true". I am having difficulty manipulating the datasets (which format to load the txt files--structure, table, etc).

サインインしてコメントする。

カテゴリ

ヘルプ センター および File ExchangeTables についてさらに検索

質問済み:

2017 年 4 月 11 日

コメント済み:

2017 年 4 月 16 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by