splitParagraphs

テキストを段落に分割する

R2023a 以降

ページ内をすべて折りたたむ

構文

newStr = splitParagraphs(str)

newDocuments = splitParagraphs(document)

説明

newStr = splitParagraphs(str) は、str を段落の配列に分割します。

例

newDocuments = splitParagraphs(document) は、単一の tokenizedDocument オブジェクトを段落の tokenizedDocument 配列に分割します。

例

すべて折りたたむ

文字列から段落への分割

ライブスクリプトを開く

ファイル exampleParagraphs.txt からテキストを抽出します。

str = extractFileText("exampleParagraphs.txt")

str = 
    "This example file contains three paragraphs. The first paragraph contains three sentences. The third sentence is short.
     
     The second paragraph contains one sentence only.
     
     The third (and final) paragraph has seventeen words in total. The final sentence concludes the example file.
     "

テキストを段落に分割します。

paragraphs = splitParagraphs(str)

paragraphs = 3×1 string
    "This example file contains three paragraphs. The first paragraph contains three sentences. The third sentence is short."
    "The second paragraph contains one sentence only."
    "The third (and final) paragraph has seventeen words in total. The final sentence concludes the example file.↵"

文書から段落への分割

ライブスクリプトを開く

ファイル exampleParagraphs.txt からテキストを抽出してトークン化します。

str = extractFileText("exampleParagraphs.txt");
document = tokenizedDocument(str)

document = 
  tokenizedDocument:

   49 tokens: This example file contains three paragraphs . The first paragraph contains three sentences . The third sentence is short . The second paragraph contains one sentence only . The third ( and final ) paragraph has seventeen words in total . The final sentence concludes the example file .

文書を段落に分割します。

paragraphs = splitParagraphs(document)

paragraphs = 
  3×1 tokenizedDocument:

    20 tokens: This example file contains three paragraphs . The first paragraph contains three sentences . The third sentence is short .
     8 tokens: The second paragraph contains one sentence only .
    21 tokens: The third ( and final ) paragraph has seventeen words in total . The final sentence concludes the example file .

入力引数

すべて折りたたむ

`str` — 入力テキスト
string スカラー | 文字ベクトル | 文字ベクトルを含むスカラー cell 配列

入力テキスト。string スカラー、文字ベクトル、または文字ベクトルを含むスカラー cell 配列として指定します。

データ型: string | char | cell

`document` — 入力文書
スカラー `tokenizedDocument` オブジェクト

入力文書。スカラー tokenizedDocument オブジェクトとして指定します。

出力引数

すべて折りたたむ

`newStr` — 出力テキスト
string 配列 | 文字ベクトルの cell 配列

出力テキスト。string 配列、または文字ベクトルの cell 配列として返されます。

str が string の場合、newStr は string になります。それ以外の場合、newStr は文字ベクトルの cell 配列になります。

データ型: string | cell

`newDocuments` — 出力文書
`tokenizedDocument` 配列

出力文書。tokenizedDocument 配列として返されます。

バージョン履歴

R2023a で導入

参考

splitSentences | addSentenceDetails | tokenizedDocument

splitParagraphs

構文

説明

例

文字列から段落への分割

文書から段落への分割

入力引数

str — 入力テキスト string スカラー | 文字ベクトル | 文字ベクトルを含むスカラー cell 配列

document — 入力文書 スカラー tokenizedDocument オブジェクト

出力引数

newStr — 出力テキスト string 配列 | 文字ベクトルの cell 配列

newDocuments — 出力文書 tokenizedDocument 配列

バージョン履歴

参考

トピック

`str` — 入力テキスト
string スカラー | 文字ベクトル | 文字ベクトルを含むスカラー cell 配列

`document` — 入力文書
スカラー `tokenizedDocument` オブジェクト

`newStr` — 出力テキスト
string 配列 | 文字ベクトルの cell 配列

`newDocuments` — 出力文書
`tokenizedDocument` 配列