Reading in specific column and plotting bar chart

6 ビュー (過去 30 日間)
Jason
Jason 2015 年 4 月 14 日
コメント済み: Jason 2015 年 4 月 15 日
I have a text file as:
Heading A
------------------------
Heading B
GA008246-0_B_F_1852967891 X 7117
GA011810-0_B_F_1852968731 14 7380
GA017861-0_B_F_1852970072 22 7749
GA017864-0_T_R_1853027526 22 7751
GA017866-0_T_R_1853027527 22 7753
GA017875-0_B_R_1852970076 22 7755
I want to be able to plot a histogram of the 2nd column under the title Heading B. sometimes there are additonal lines under heading A.
This is what I have so far.
%Read in data file
fid = fopen('c:\myfile.txt','rt');
C = textscan (fid, '%s %s s', 'delimiter', '\t','headerlines', 1)
while (strcmp(C{1}{1}, 'Heading B') == 0)
C = textscan (fid, '%s %s %s', 'delimiter', '\t')
end
fclose(fid);
C{:,2}
But Im picking out one too early item i.e.
ans =
''
'X'
'14'
'22'
'22'
'22'
'22'
once the additional ' ' item is removed, how can I plot a bar chart showing the number of occurances of each of these int he list. i.e. in this example
X = 1 repetition 14 = 1 repetition 22 = 4 repetitions
Tanaks for any help. Jsaon

採用された回答

Guillaume
Guillaume 2015 年 4 月 14 日
編集済み: Guillaume 2015 年 4 月 14 日
I would use fgetl instead of textscan to find the start of the heading B section, then use textscan to read it.
fid = fopen('c:\myfile.txt','rt');
tline = fgetl(fid);
while ~isnumeric(tline) && ~strcmp(tline, 'Heading B')
tline = fgetl(fid);
end
if isnumeric(tline) %eol reach before Heading B
error('End of file reached prematurely');
end
C = textscan (fid, '%s %s %s', 'delimiter', '\t');
To find the number of repetitions in a column of C, use the third return value of unique together with histc:
[names, ~, position] = unique(C{2})
repetitions = histc(position, 1:numel(names))
%useful for seeing the result:
table(names, repetitions)
  5 件のコメント
Guillaume
Guillaume 2015 年 4 月 14 日
Oh, sorry I misunderstood. You also need to change the position and numbers of ticks (XTick property)
set(gca, 'XTickLabel', names, 'XTick', 1:numel(names))
should work.
Jason
Jason 2015 年 4 月 15 日
Perfect, thankyou.

サインインしてコメントする。

その他の回答 (1 件)

Star Strider
Star Strider 2015 年 4 月 14 日
I don’t have your file, but I would change the textscan call to:
C = textscan (fid, '%s %f %f', 'delimiter', '\t','headerlines', 3)
The initial ‘X’ in column #2 will then show up as either '' or NaN, so you can eliminate it by using isempty or isnan, as appropriate.
  2 件のコメント
Jason
Jason 2015 年 4 月 14 日
編集済み: Jason 2015 年 4 月 14 日
The problem is that there are sometimes lines under "Heading A", so the number of lines until I find "Heading B" is variable.
I actually want the X as well as the numbers (its to do with Chromosomes). Its actually this mixture of text and numbers in the cell array that I am finding it hard to plot a bar chart showing the frequency of each string.
I've included the txt file. Thanks
Star Strider
Star Strider 2015 年 4 月 14 日
編集済み: Star Strider 2015 年 4 月 14 日
This works for the current file:
fidi = fopen('test1.txt');
C = textscan (fidi, '%s %s %s', 'delimiter', '\t','headerlines', 2);
C2 = C{2};
Ix = cellfun(@isempty,C2);
[C2u,ia,ic] = unique(C2(~Ix));
cnts = hist(ic,length(C2u));
figure(1)
bar(cnts)
xt = get(gca, 'XTick');
set(gca, 'XTick', xt, 'XTickLabel',C2u)
EDIT —
Added plot ...

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeLabels and Annotations についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by