How to recognize gender by name

6 ビュー (過去 30 日間)
Alexander Engman
Alexander Engman 2018 年 7 月 11 日
コメント済み: Image Analyst 2018 年 7 月 13 日
Hi!
I have a list (1 column, 601 rows) of the most popular male and female surnames and they are marked in another column as either M for male or F for female. I have another list of surnames of people from a statistical survey (which does not have the same dimensions as the list of names). I want to compare the names from the survey with the names in my list and mark them as either M or F if they are recognized. If they are not found in my list, I want to leave them blank. Does anyone know how I can do this?
Many thanks in advance.
  2 件のコメント
KSSV
KSSV 2018 年 7 月 11 日
This can be done with strcmp and ismemebr, can you share your data?
Jan
Jan 2018 年 7 月 11 日
What exactly is "a list"? Prefer to post a small Matlab code, which creates a representative data set. Then suggesting some code is much easier.

サインインしてコメントする。

採用された回答

Guillaume
Guillaume 2018 年 7 月 11 日
編集済み: Guillaume 2018 年 7 月 11 日
Very easy to do:
%inputs:
%genderlist = Mx2 cell array, 1st column name, 2nd column gender
%namelist = Nx1 cell array, list of names that need gender
%output
%namelistwithgender = Nx2 cell array, 1st column from namelist, 2nd column corresponding gender if found in genderlist, empty otherwise
[isfound, where] = ismember(namelist, genderlist(:, 1));
namelistwithgender = namelist;
namelistwithgender(isfound, 2) = genderlist(where(isfound), 2);
Note that the search is performed case sensitive. If you want to ignore case, then convert both lists to lower in the ismember call.
  6 件のコメント
Alexander Engman
Alexander Engman 2018 年 7 月 13 日
Thank you!
A lot of the names are actually combinations or "double names", they are connected with a hyphen, for example a combination of the names "Anna" and "Maria" would be "Anna-Maria". Is there a way to give the name a gender if either or both of the names are recognized?
Also, how do I write the code to not make it case-sensitive?
Thank you so much!
Image Analyst
Image Analyst 2018 年 7 月 13 日
Just use lower() and strrep():
namelist = lower(namelist); % Everything is lower case after this.
theName = namelist{:, 1};
theName = strrep(theName, '-', ' '); % Replace dashes with spaces.
% Get cell array of names
ca = strsplit(theName)
for k = 1 : length(ca)
thisName = ca{k}; % Extract first word
% Check if thisName is in each gender namelist.
etc.

サインインしてコメントする。

その他の回答 (1 件)

Image Analyst
Image Analyst 2018 年 7 月 11 日
I'd get a distribution and then use k nearest neighbors. After all, there are several names with varying numbers of people in either gender, like chris, robin, ariel, sam, pat, etc.
  2 件のコメント
Alexander Engman
Alexander Engman 2018 年 7 月 11 日
That is a great input. These are however Swedish names, and we have very few gender neutral ones.
Image Analyst
Image Analyst 2018 年 7 月 11 日
Then just use xlsread() to read in your reference name lists, and your "test/validation" set of names and use ismember(), something like (untested):
[numbers, names, raw] = xlsread(filename);
femaleNames = strings(:, 1); % Female names in column 1.
maleNames = strings(:, 2); % Male names in column 2.
testNames = strings(:, 3); % Test names in column 3.
for k = 1 : length(testNames)
inFemaleList(k,1) = ismember(testNames{k}, femaleNames);
inMaleList(k,2) = ismember(testNames{k}, maleNames);
end

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeSpreadsheets についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by