Main Content

bamread

Read data from BAM file

Description

example

BAMStruct = bamread(File,RefSeq,Range) reads the alignment records in File, a BAM-formatted file, that align to RefSeq, a reference sequence, in the range specified by Range. The BAMStruct output contains the alignment data.

BAMStruct = bamread(File,nomap) returns reads that are not mapped to any reference.

[BAMStruct,HeaderStruct] = bamread(File,RefSeq,Range) also returns the header information in HeaderStruct

example

___ = bamread(File,RefSeq,Range,Name,Value), for any output arguments, specifies additional options using one or more name-value arguments. For example, to save the alignment records to a file named 'alignmentrecords.sam' in the current folder, BAMStruct = bamread(File,RefSeq,Range,ToFile="alignmentrecords.sam").

Examples

collapse all

Read multiple alignment records from the ex1.bam file that align to two different reference sequences.

data1 = bamread('ex1.bam', 'seq1', [100 200])
data1=59×1 struct array with fields:
    QueryName
    Flag
    Position
    MappingQuality
    CigarString
    MatePosition
    InsertSize
    Sequence
    Quality
    Tags
    ReferenceIndex
    MateReferenceIndex

data2 = bamread('ex1.bam', 'seq2', [100 200])
data2=79×1 struct array with fields:
    QueryName
    Flag
    Position
    MappingQuality
    CigarString
    MatePosition
    InsertSize
    Sequence
    Quality
    Tags
    ReferenceIndex
    MateReferenceIndex

Read alignments from the ex1.bam file that are fully contained in the 100 to 200 bp range of the seq1 reference sequence.

data3 = bamread('ex1.bam', 'seq1', [100 200], 'full', true)
data3=30×1 struct array with fields:
    QueryName
    Flag
    Position
    MappingQuality
    CigarString
    MatePosition
    InsertSize
    Sequence
    Quality
    Tags
    ReferenceIndex
    MateReferenceIndex

Read alignments from the ex1.bam file that align to the 100 to 300 bp range of the seq1 reference sequence. Read the same alignments using zero-based indexing. Compare the position of the 27th record in the two outputs.

data_one = bamread('ex1.bam','seq1', [100 300]);
data_zero = bamread('ex1.bam','seq1', [100 300], 'zerobased', true);
data_one(27).Position
ans = uint32
    135
data_zero(27).Position
ans = uint32
    134

Input Arguments

collapse all

Path to a BAM-formatted file, specified as a string or character vector. If the file is on the MATLAB® search path or is in the current folder, File can be the file name alone without the path information.

Note

bamread requires the BAM file to be ordered, except when returning reads that are not mapped to any reference.

Example: "C:\Documents\myfile.bam"

Data Types: char | string

Reference sequence in the BAM file, specified as one of the following:

  • Name of the reference sequence, specified as a string or character vector.

  • Index of the reference sequence, specified as a positive integer. This number is also the index of the reference sequence in the Reference field of the InfoStruct structure returned by baminfo.

Data Types: single | double | char | string

Range of references, specified as one of the following:

  • Character vector or string specifying the name of a reference sequence in the BAM file.

  • Positive integer specifying the index of a reference sequence in the BAM file. This number is also the index of the reference sequence in the Reference field of the InfoStruct structure returned by baminfo.

Data Types: double | char | string

Range positions in the reference sequence RefSeq, specified as a two-element vector of positive numbers with the second element being greater than or equal to the first.

Example: [17,22]

Data Types: single | double

Indication that the returned information is not mapped to any reference, specified as 0 or 'Unmapped'.

Example: 'Unmapped'

Data Types: single | double | char | string

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: To save the alignment records to a file named 'alignmentrecords.sam' in the current folder, BAMStruct = bamread(File,RefSeq,Range,ToFile="alignmentrecords.sam")

Indication to return only alignment records that are fully contained within the range specified by Range, specified as false or true.

Example: true

Data Types: logical

Indication to read optional tags in addition to the first 11 fields for each alignment in the BAM-formatted file.

Example: false

Data Types: logical

Path to a (new) file for saving alignment records in the specified range of a specific reference sequence, specified as a string or character vector. If you specify only a file name, the file is saved to the current folder. The resulting file is SAM-formatted.

The SAM-formatted file is always one-based, even if you set the ZeroBased name-value pair argument to true. You can use the SAM-formatted file as input when creating a BioMap object.

Example: "C:\Documents\alignment.sam"

Data Types: char | string

Indication to use zero-based indexing when reading a file, specified as false or true. The ZeroBased argument controls the return of zero-based or one-based positions in the Position and MatePosition fields in BAMStruct.

ZeroBased does not affect the Range input argument or the SAM file created when using the ToFile name-value pair argument. SAM files are always one-based.

Caution

If you plan to use the BAMStruct output argument to construct a BioMap object, make sure the ZeroBased name-value pair argument is false.

Example: true

Data Types: logical

Output Arguments

collapse all

Alignment and mapping information from a BAM-formatted file, returned as an N-by-1 array of structures, where N is the number of alignment records stored in the specified range. Each structure contains the following fields.

FieldDescription
QueryName

Name of the read sequence (if unpaired) or the name of sequence pair (if paired).

Flag

Integer indicating the bit-wise information that specifies the status of each of 11 flags described by the SAM format specification.

Tip

You can use the bitget function to determine the status of a specific SAM flag.

ReferenceIndex

Index of the reference sequence.

Tip

To convert this index to a reference name, see the Reference field in the HeaderStruct output argument

PositionPosition of the forward reference sequence where the leftmost base of the alignment of the read sequence starts. This position is zero-based or one-based, depending on the ZeroBased name-value pair argument.
MappingQualityInteger specifying the mapping quality score for the read sequence.
CigarStringCIGAR-formatted string representing how the read sequence aligns with the reference sequence.
MateReferenceIndexIndex of the reference sequence associated with the mate. If there is no mate, then this value is 0.
MatePositionPosition of the forward reference sequence where the leftmost base of the alignment of the mate of the read sequence starts. This position is zero-based or one-based, depending on the ZeroBased name-value pair argument.
InsertSizeThe number of base positions between the read sequence and its mate, when both are mapped to the same reference sequence. Otherwise, this value is 0.
SequenceCharacter vector containing the letter representations of the read sequence. It is the reverse complement if the read sequence aligns to the reverse strand of the reference sequence.
QualityCharacter vector containing the ASCII representation of the per-base quality score for the read sequence. The quality score is reversed if the read sequence aligns to the reverse strand of the reference sequence.
TagsList of applicable SAM tags and their values.

Header information for the BAM-formatted file, returned as a structure. The structure contains the following fields.

FieldDescription
NRefsNumber of reference sequences in the BAM-formatted file.
Reference

1-by-NRefs array of structures containing these fields:

  • Name — Name of the reference sequence.

  • Length — Length of the reference sequence.

Header*Structure containing the file format version, sort order, and group order.
SequenceDictionary*

Structure containing the:

  • Sequence name

  • Sequence length

  • Genome assembly identifier

  • MD5 checksum of sequence

  • URI of sequence

  • Species

ReadGroup*

Structure containing the:

  • Read group identifier

  • Sample

  • Library

  • Description

  • Platform unit

  • Predicted median insert size

  • Sequencing center

  • Date

  • Platform

Program*

Structure containing the:

  • Program name

  • Version

  • Command line

Tips

  • Use the baminfo function to investigate the size and content, including reference sequence names, of a BAM-formatted file before using the bamread function to read the file contents into a MATLAB array of structures.

  • If your BAM-formatted file is too large to read using available memory, try either of the following:

    • Use a smaller range.

    • Use bamread without specifying outputs, but using the ToFile Name,Value argument to create a SAM-formatted file. You can then use samread with the BlockRead Name,Value argument to read the SAM-formatted file. Or you can pass the SAM-formatted file to the BioIndexedFile constructor function to construct a BioIndexedFile object, which you can use to create a BioMap object.

  • Use the BAMStruct output argument to construct a BioMap object, which lets you explore, access, filter, and manipulate all or a subset of the data, before doing subsequent analyses or viewing the data.

References

[1] Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Goncalo, A., and Durbin, R. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 16, 2078–2079.

Version History

Introduced in R2010b