メインコンテンツ

seqsplitpe

Split merged paired-end sequences into separate files

Description

seqsplitpe(fastqFile) splits merged paired-end sequences from fastqFile into two separate files. Each sequence is split in the middle. The first half of the sequence is saved in the first output file and the other half in the second output file. By default, each output file name consists of the input file name appended with a suffix '_1' or '_2' before the file extension.

example

seqsplitpe(___,Name,Value) uses additional options specified by one or more Name,Value pair arguments.

example

[outFiles,N] = seqsplitpe(___) returns the names of output files in a cell array outFiles. N represents a vector containing the numbers of sequences saved in each output file.

example

Examples

collapse all

Split each of the paired-end sequences in half, and store each half in separate output files.

[outFiles, N] = seqsplitpe('SXX123456_merged.fastq');

Check the number of sequences in each output file.

N
N = 2×1

    50
    50

Input Arguments

collapse all

Names of FASTQ files with sequence and quality information, specified as a character vector, string, string vector, or cell array of character vectors.

Example: 'SRR005164_1_50.fastq'

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: 'OutputSuffix','PairedEnd_split' specifies to use the custom suffix in the output file names.

Relative or absolute path to the output file directory, specified as a character vector or string. The default is the current directory.

Example: 'OutputDir','F:\results'

Custom suffix to use in the output file names, specified as a character vector or string. It is inserted after the input file name and before the suffix '_1' or '_2'. The default is ''.

Example: 'OutputSuffix','_MisMatches2'

Option to perform computations in parallel using a parallel pool of workers, specified as one of these values:

  • "off" — Run in serial on the MATLAB® client.

  • "auto" — Use a parallel pool if one is open or if MATLAB can automatically create one. If a parallel pool is not available, run in serial on the MATLAB client.

  • "on" — Use a parallel pool if one is open or if MATLAB can automatically create one. If a parallel pool is not available, throw an error.

If you do not have a parallel pool open and automatic pool creation is enabled, MATLAB opens a pool using the default cluster profile. To use a parallel pool to run computations in MATLAB, you must have Parallel Computing Toolbox™.

Before R2026a: You can specify this argument as true or false only. The default value is false. To run computations in parallel, set this argument to true.

Note

There is a cost associated with sharing large input files across workers in a distributed environment. In some cases, running in parallel may not be beneficial in terms of performance.

Example: 'UseParallel',true

Output Arguments

collapse all

Output file names, returned as a cell array of character vectors. By default, the name of each output file consists of the input file name appended with a suffix '_1' or '_2' before the file extension.

Number of sequences saved in each output file, returned as an n-by-1 vector where n is the number of output files. If there are multiple output files, the order within N corresponds to the order of the output files.

Extended Capabilities

expand all

Version History

Introduced in R2016b

expand all