categorical 配列によるデータへのアクセス

カテゴリ別のデータの選択

値に基づいてデータを選択できると便利な場合がよくあります。このようなタイプのデータ選択では、ある変数の値に基づいて logical ベクトルを作成してから、その logical ベクトルを使用して他の変数の値のサブセットを選択する必要があることがあります。データ選択に使用する logical ベクトルは、数値配列で特定の範囲に該当する値を見つけることによって作成できます。また、個々の離散値を見つけて logical ベクトルを作成することもできます。categorical 配列を使用すると、次の操作が簡単にできるようになります。

特定カテゴリの要素の選択。categorical 配列では、論理演算子 == または ~= を使用して、特定のカテゴリに該当するデータまたは該当しないデータを選択することができます。特定のカテゴリグループのデータを選択するには、関数 ismember を使用します。
順序 categorical 配列では、不等号 >、>=、< または <= を使用して、特定のカテゴリよりも順序が先または後のカテゴリのデータを見つけることができます。
特定のカテゴリに該当するデータの削除。論理演算子を使用して、特定のカテゴリのデータを抽出または除外することができます。
定義済みカテゴリに該当しない要素の検索。categorical 配列では、定義済みカテゴリに該当しない要素を <undefined> で表します。関数 isundefined を使用して、定義された値をもたない観測値を見つけることができます。

categorical 配列による一般的なデータアクセス方法

ライブスクリプトを開く

この例では、categorical 配列を使用してインデックス付けと検索を行う方法を示します。同様の方法で、テーブル内に格納されている categorical 配列を使用してデータにアクセスすることができます。

サンプルデータの読み込み

サンプル MAT ファイル patients.mat から 100 人の患者についてのデータを読み込みます。

load patients.mat
whos

  Name                            Size            Bytes  Class      Attributes

  Age                           100x1               800  double               
  Diastolic                     100x1               800  double               
  Gender                        100x1             13012  cell                 
  Height                        100x1               800  double               
  LastName                      100x1             13216  cell                 
  Location                      100x1             15808  cell                 
  SelfAssessedHealthStatus      100x1             13140  cell                 
  Smoker                        100x1               100  logical              
  Systolic                      100x1               800  double               
  Weight                        100x1               800  double

categorical 配列の作成

配列 Location および SelfAssessedHealthStatus にはカテゴリに属するデータが含まれています。各配列には、小規模な、一意の値 (それぞれ 3 つの場所と 4 つの健康状態を示す) のセットから取得したテキストが含まれています。Location および SelfAssessedHealthStatus を categorical 配列に変換するには、関数 categorical を使用します。一方、配列 LastName には、カテゴリでない姓のリストが含まれています。そのため、LastName については、関数 string を使用して string 配列に変換します。

Location = categorical(Location);
SelfAssessedHealthStatus = categorical(SelfAssessedHealthStatus);
LastName = string(LastName);

単一カテゴリのメンバーの検索

categorical 配列では、論理演算子 == および ~= を使用して、特定のカテゴリに該当するデータまたは該当しないデータを見つけることができます。

場所 Rampart General Hospital で調査された患者がいるかどうかを判定します。

any(Location == "Rampart General Hospital")

ans = logical
   0

Rampart General Hospital で調査された患者はいません。

複数カテゴリのメンバーの検索

ismember を使用して、特定のグループのカテゴリのデータを見つけることができます。たとえば、Location を入力データとして使用して ismember を呼び出します。County General Hospital または VA Hospital で調査された患者を識別する logical ベクトルを作成します。

Location

Location = 100×1 categorical
     County General Hospital 
     VA Hospital 
     St. Mary's Medical Center 
     VA Hospital 
     County General Hospital 
     St. Mary's Medical Center 
     VA Hospital 
     VA Hospital 
     St. Mary's Medical Center 
     County General Hospital 
     County General Hospital 
     St. Mary's Medical Center 
     VA Hospital 
     VA Hospital 
     St. Mary's Medical Center 
     VA Hospital 
     St. Mary's Medical Center 
     VA Hospital 
     County General Hospital 
     County General Hospital 
     VA Hospital 
     VA Hospital 
     VA Hospital 
     County General Hospital 
     County General Hospital 
     VA Hospital 
     VA Hospital 
     County General Hospital 
     County General Hospital 
     County General Hospital 
      ⋮

VA_CountyGenIndex = ...
    ismember(Location,["County General Hospital","VA Hospital"])

VA_CountyGenIndex = 100×1 logical array

   1
   1
   0
   1
   1
   0
   1
   1
   0
   1
   1
   0
   1
   1
   0
      ⋮

VA_CountyGenIndex は 100 行 1 列の logical 配列で、Location 内のカテゴリ County General Hospital または VA Hospital のメンバーである各要素について logical true (1) を持ちます。出力 VA_CountyGenIndex には 76 個の非ゼロ要素が含まれています。

logical ベクトル VA_CountyGenIndex を使用して、County General Hospital と VA Hospital のいずれかで調査された患者の LastName を選択します。

VA_CountyGenPatients = LastName(VA_CountyGenIndex)

VA_CountyGenPatients = 76×1 string
    "Smith"
    "Johnson"
    "Jones"
    "Brown"
    "Miller"
    "Wilson"
    "Taylor"
    "Anderson"
    "Jackson"
    "White"
    "Martin"
    "Garcia"
    "Martinez"
    "Robinson"
    "Clark"
    "Rodriguez"
    "Lewis"
    "Lee"
    "Walker"
    "Hall"
    "Allen"
    "Young"
    "Hernandez"
    "King"
    "Wright"
    "Lopez"
    "Green"
    "Adams"
    "Baker"
    "Mitchell"
      ⋮

プロットする特定のカテゴリの要素の選択

関数 summary を使用して、カテゴリ名と各カテゴリの要素数を含む概要を出力します。

summary(Location)

Location: 100×1 categorical

     County General Hospital       39 
     St. Mary's Medical Center      24 
     VA Hospital                   37 
     <undefined>                    0

Location は 100 行 1 列の categorical 配列で、3 つのカテゴリを含んでいます。County General Hospital は 39 個の要素、St. Mary's Medical Center は 24 個の要素、VA Hospital は 37 個の要素で出現します。

関数 summary を使用して、SelfAssessedHealthStatus の概要を出力します。

summary(SelfAssessedHealthStatus)

SelfAssessedHealthStatus: 100×1 categorical

     Excellent        34 
     Fair             15 
     Good             40 
     Poor             11 
     <undefined>       0

SelfAssessedHealthStatus は 100 行 1 列の categorical 配列で、4 つのカテゴリを含んでいます。

論理演算子 == を使用して、自分の健康状態を Good と申告している患者の年齢データにアクセスします。その後、このデータのヒストグラムをプロットします。

figure()
histogram(Age(SelfAssessedHealthStatus == "Good"))
title("Ages of Patients with Good Health Status")

Figure contains an axes object. The axes object with title Ages of Patients with Good Health Status contains an object of type histogram.

histogram(Age(SelfAssessedHealthStatus == "Good")) は、自分の健康状態を Good と申告している 40 名の患者の年齢データをプロットします。

特定のカテゴリのデータの削除

論理演算子を使用すると、特定のカテゴリのデータを抽出または除外することができます。VA Hospital で調査されたすべての患者をワークスペース変数 Age および Location から削除します。

Age = Age(Location ~= "VA Hospital");
Location = Location(Location ~= "VA Hospital")

Location = 63×1 categorical
     County General Hospital 
     St. Mary's Medical Center 
     County General Hospital 
     St. Mary's Medical Center 
     St. Mary's Medical Center 
     County General Hospital 
     County General Hospital 
     St. Mary's Medical Center 
     St. Mary's Medical Center 
     St. Mary's Medical Center 
     County General Hospital 
     County General Hospital 
     County General Hospital 
     County General Hospital 
     County General Hospital 
     County General Hospital 
     County General Hospital 
     St. Mary's Medical Center 
     St. Mary's Medical Center 
     County General Hospital 
     St. Mary's Medical Center 
     St. Mary's Medical Center 
     St. Mary's Medical Center 
     County General Hospital 
     County General Hospital 
     County General Hospital 
     County General Hospital 
     County General Hospital 
     County General Hospital 
     St. Mary's Medical Center 
      ⋮

すると、Age が 63 行 1 列の数値配列、Location が 63 行 1 列の categorical 配列になります。

Location のカテゴリのリストが各カテゴリの要素数と共に表示されます。

summary(Location)

Location: 63×1 categorical

     County General Hospital       39 
     St. Mary's Medical Center      24 
     VA Hospital                    0 
     <undefined>                    0

VA Hospital で調査された患者のデータは Location から削除されていますが、まだ VA Hospital はカテゴリとして残っています。

関数 removecats を使用して、Location のカテゴリから VA Hospital を削除します。

Location = removecats(Location,"VA Hospital");

カテゴリ VA Hospital が削除されたことを確認します。

categories(Location)

ans = 2×1 cell
    {'County General Hospital'  }
    {'St. Mary's Medical Center'}

Location は 63 行 1 列の categorical 配列で、2 つのカテゴリを含んでいます。

要素の削除

インデックスを付けることで要素を削除できます。たとえば、Location の最初の要素は、Location(2:end) を使用して残りの要素を選択することで削除できます。より簡単に要素を削除するには、[] を使用します。

Location(1) = [];
summary(Location)

Location: 62×1 categorical

     County General Hospital       38 
     St. Mary's Medical Center      24 
     <undefined>                    0

Location は 62 行 1 列の categorical 配列で、2 つのカテゴリを含んでいます。最初の要素を削除しても、同じカテゴリの他の要素には影響せず、カテゴリ自体も削除されません。

未定義の要素のテスト

カテゴリ County General Hospital を Location から削除します。

Location = removecats(Location,"County General Hospital");

categorical 配列 Location の最初から 8 番目までの要素を表示します。

Location(1:8)

ans = 8×1 categorical
     St. Mary's Medical Center 
     <undefined> 
     St. Mary's Medical Center 
     St. Mary's Medical Center 
     <undefined> 
     <undefined> 
     St. Mary's Medical Center 
     St. Mary's Medical Center

カテゴリ County General Hospital を削除すると、そのカテゴリに属していた要素は Location について定義されているいずれのカテゴリにも属さなくなります。どのカテゴリにも属していない categorical 配列の要素は未定義で、その値として <undefined> が表示されます。

関数 isundefined を使用して、どのカテゴリにも属さない categorical 配列の要素を見つけます。

undefinedIndex = isundefined(Location);

undefinedIndex は、62 列 1 行の categorical 配列で、Location のすべての未定義要素について logical true (1) を持ちます。

未定義の要素の設定

関数 summary を使用して、Location 内の未定義の要素数を出力します。その後、Location の最初の 5 つの要素を表示します。

summary(Location)

Location: 62×1 categorical

     St. Mary's Medical Center      24 
     <undefined>                   38

Location(1:5)

ans = 5×1 categorical
     St. Mary's Medical Center 
     <undefined> 
     St. Mary's Medical Center 
     St. Mary's Medical Center 
     <undefined>

Location の最初の要素はカテゴリ St. Mary's Medical Center に属します。この最初の要素を未定義値に設定して、どのカテゴリにも属さないようにします。推奨される方法は、関数 missing を使用して未定義値を作成することです。もう 1 つは、配列の要素に '' または "" を割り当てる方法です。このような値を categorical 配列の要素に割り当てると、それらの値は未定義値に変換されます。

Location(1) = missing;
Location(3) = '';
Location(1:5)

ans = 5×1 categorical
     <undefined> 
     <undefined> 
     <undefined> 
     St. Mary's Medical Center 
     <undefined>

関数 summary により、これらの割り当てによって未定義の要素の数が増えたことが示されます。

summary(Location)

Location: 62×1 categorical

     St. Mary's Medical Center      22 
     <undefined>                   40

カテゴリを削除したり、その他の要素のカテゴリを変更したりせずに、選択した要素を undefined にできます。未定義の要素を設定して、不明な値をもつ要素を表します。

未定義の要素をもつ categorical 配列の事前割り当て

パフォーマンス向上のため、未定義の要素を使用して categorical 配列のサイズを事前に割り当てることができます。場所がわかっている要素だけをもつ categorical 配列を作成します。

definedIndex = ~isundefined(Location);
newLocation = Location(definedIndex);
summary(newLocation)

newLocation: 22×1 categorical

     St. Mary's Medical Center      22 
     <undefined>                    0

newLocation のサイズを拡張して 200 行 1 列の categorical 配列にします。最後の新しい要素を未定義の要素に設定します。その他の新しい要素にもすべて未定義の値が割り当てられます。元の 22 個の要素は値を保持します。

newLocation(200) = missing;
summary(newLocation)

newLocation: 200×1 categorical

     St. Mary's Medical Center       22 
     <undefined>                   178

newLocation には、後で配列に格納する値のためのスペースがあります。

参考

categorical 配列によるデータへのアクセス

カテゴリ別のデータの選択

categorical 配列による一般的なデータ アクセス方法

参考

トピック

categorical 配列による一般的なデータアクセス方法