How do I count and save twitter hashtags?

1 回表示 (過去 30 日間)
Abim
Abim 2012 年 12 月 14 日
I am writing a script that analyzes the hashtags from tweets that I saved in a text file. So far I managed to count the amount of hashtags in the file:
fid = fopen('Tweets.txt');
numberOfTweets = 0;
while i ~= -1
i = fgetl(fid);
numberOfTweets = numberOfTweets + 1;
end
numberOfTweets = numberOfTweets - 1;
frewind(fid)
for i = 1:numberOfTweets
twitterStuff{i} = fgetl(fid);
end
numberOfHash = 0;
for i = 1:numberOfTweets
if(strfind(twitterStuff{i}, '#') ~=0);
c = strfind(twitterStuff{i}, '#');
[rowHash columnHash] = size(c);
numberOfHash = numberOfHash + columnHash;
end
end
Now, I want to find out what the specific hashtags are and save them into a cell array, but I don't really know how to do that.
  2 件のコメント
Walter Roberson
Walter Roberson 2012 年 12 月 14 日
Is # by itself a hashtag? Is #this#that with no spaces two hashtags? Is #35 a valid hashtag? Is #? a valid hashtag?
Abim
Abim 2012 年 12 月 14 日
When I said counting the amount of hashtags, I just counted the amount of # .But when I say, save the hashtags, I want to save the words contained within the hashtags. technically, #this#that would be two hashtags, but for now I would just want to focus on the basic #this hashtag.

サインインしてコメントする。

採用された回答

Jonathan Epperl
Jonathan Epperl 2012 年 12 月 14 日
編集済み: Jonathan Epperl 2012 年 12 月 14 日
You should use regular expressions for that, you can do pretty much anything with them. This should do what you want to, and if not, then it should point you in the right direction:
s = '#Matlab#2012b rocks my #sox # off!'
% Match a '#' with zero or more characters that aren't whitespace or '#' after it
T = regexp(s,'(#[^ #]*)','tokens')
T{:}
% Match a '#' with 1 or more characters that aren't whitespace or '#' after it
T = regexp(s,'(#[^ #]+)','tokens')
T{:}
% Match a '#' with 1 or more characters that aren't whitespace or '#' after
% it, but don't capture the '#'
T = regexp(s,'#([^ #]+)','tokens')
T{:}

その他の回答 (2 件)

Sean de Wolski
Sean de Wolski 2012 年 12 月 14 日
編集済み: Sean de Wolski 2012 年 12 月 14 日
Using regular expressions:
str = '#MATLAB is an awesome product by #MathWorks';
[matchstart,matchend,~,hashtag] = regexp(str,'(\#(\w*))')

Abim
Abim 2012 年 12 月 14 日
Thanks

カテゴリ

Help Center および File ExchangeString Parsing についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by