Main Content

getSubset

Retrieve subset of elements from object

Description

subset = getSubset(object,subset) returns the sequence read data subset for only the object elements specified by subset.

example

subset = getSubset(object,subset,Name,Value) uses additional options specified by one or more name-value pair arguments. For example, you can specify whether to keep the data in memory.

example

Examples

collapse all

Store read data from a SAM-formatted file in a BioRead object. By default, the data remains in the source file, and BioRead uses an index file to access the data, making the process more memory efficient.

br = BioRead('ex1.sam')
br = 
  BioRead with properties:

     Quality: [1501x1 File indexed property]
    Sequence: [1501x1 File indexed property]
      Header: [1501x1 File indexed property]
       NSeqs: 1501
        Name: ''


Set the 'InMemory' name-value pair argument to true to store the data in memory, enabling you to access the data faster and edit the properties of the object.

brInMemory = BioRead('ex1.sam','InMemory',true)
brInMemory = 
  BioRead with properties:

     Quality: {1501x1 cell}
    Sequence: {1501x1 cell}
      Header: {1501x1 cell}
       NSeqs: 1501
        Name: ''

Retrieve the second and third elements from the object br. By default, the resulting object subset is not placed in memory if the parent object br is not in memory. If br is already in memory, the resulting subset is placed in memory.

subset = getSubset(br,[2 3])
subset = 
  BioRead with properties:

     Quality: [2x1 File indexed property]
    Sequence: [2x1 File indexed property]
      Header: [2x1 File indexed property]
       NSeqs: 2
        Name: ''


Alternatively, you can keep the parent object br in the source file, and load the resulting subset in memory if the subset is small enough. You access the subset faster and update it as needed.

subsetInMemory = getSubset(br,[2 3],'InMemory',true)
subsetInMemory = 
  BioRead with properties:

     Quality: {2x1 cell}
    Sequence: {2x1 cell}
      Header: {2x1 cell}
       NSeqs: 2
        Name: ''

Update the header information of the first element.

subsetInMemory.Header(1)
ans = 1x1 cell array
    {'EAS54_65:7:152:368:113'}

subsetInMemory.Header(1) = {'NewHeader'};
subsetInMemory.Header(1)
ans = 1x1 cell array
    {'NewHeader'}

You can use a header to get the corresponding elements with that header. If multiple elements have the same header, the function returns all those elements.

Get all the elements with the header 'B7_591:4:96:693:509' from the br object stored in memory.

subset2 = getSubset(brInMemory,{'B7_591:4:96:693:509'})
subset2 = 
  BioRead with properties:

     Quality: {'<<<<<<<<<<<<<<<;<<<<<<<<<5<<<<<;:<;7'}
    Sequence: {'CACTAGTGGCTCATTGTAAATGTGTGGTTTAACTCG'}
      Header: {'B7_591:4:96:693:509'}
       NSeqs: 1
        Name: ''

Input Arguments

collapse all

Object containing the read data, specified as a BioRead or BioMap object.

Example: bioreadObj

Subset of elements in the object, specified as a vector of positive integers, logical vector, string vector, or cell array of character vectors containing valid sequence headers.

Example: [1 3]

Tip

When you use a sequence header (or a cell array of headers) for subset, a repeated header specifies all elements with that header.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: 'InMemory',true specifies to save the output object (subset) in memory.

Name of the object, specified as the comma-separated pair consisting of 'Name' and a character vector or string. The default is an empty character vector '' (no name).

Example: 'Name','newData'

Logical flag to keep data in memory, specified as the comma-separated pair consisting of 'InMemory' and true or false. Keeping the data in memory lets you access the resulting object subset faster and update its properties. If the data specified for subset is still large and does not fit in memory, set this name-value pair to false to use indexed access, which is more memory efficient but does not enable you to modify the properties.

If the parent object is already in memory, the resulting object subset is automatically placed in memory, and the function ignores this argument.

Example: 'InMemory',true

References used to create the subset of data with only the reads mapped to those references, specified as the comma-separated pair consisting of 'SelectReference' and a cell array of character vectors, string vector, or vector of positive integers.

Note

This argument is for the BioMap objects only.

Example: 'SelectReference',{'RefSeq1'}

Output Arguments

collapse all

Subset of elements from the object, returned as a BioRead or BioMap object. If object is in memory, then subset is placed in memory. If object is indexed, then subset is indexed unless you set 'InMemory' to true.

Version History

Introduced in R2010a