excludedata

Exclude data from fit

Syntax

tf = excludedata(x,y,'box',box)

tf = excludedata(x,y,'domain',domain)

tf = excludedata(x,y,'range',range)

tf = excludedata(x,y,'indices',indices)

Description

tf = excludedata(x,y,'box',box) returns a logical array that indicates which elements are outside the box in the xy-plane specified by box. The elements of tf equal 1 for data points outside the box and 0 for data points inside the box. To exclude data when fitting a curve using fit, specify tf as the 'Exclude' value.

example

tf = excludedata(x,y,'domain',domain) identifies data points that have x-values outside the interval domain.

example

tf = excludedata(x,y,'range',range) identifies the data points with y-values outside the interval range.

example

tf = excludedata(x,y,'indices',indices) identifies the data points with indices equal to indices.

Examples

collapse all

Visualize Exclusion Rules

Open Live Script

Visualize exclusion rules using random data.

Generate random x and y data.

xdata = -3 + 6*rand(1,1e4);
ydata = -3 + 6*rand(1,1e4);

As an example, exclude data that is either inside the box [-1 1 -1 1] or outside the domain [-2 2].

outliers1 = ~excludedata(xdata,ydata,'box',[-1 1 -1 1]);
outliers2 = excludedata(xdata,ydata,'domain',[-2 2]);
outliers = outliers1|outliers2;

Plot the data that is not excluded. The white area corresponds to regions that are excluded.

plot(xdata(~outliers),ydata(~outliers),'.')
axis([-3 3 -3 3])
axis square

Figure contains an axes object. The axes contains a line object which displays its values using only markers.

Exclude Data from Curve Fit

Open Live Script

Load the vote counts and county names for the state of Florida from the 2000 U.S. presidential election.

load flvote2k

Use the vote counts for the two major party candidates, Bush and Gore, as predictors for the vote counts for the third-party candidate Buchanan, and plot the scatters:

plot(bush,buchanan,'rs')
hold on
plot(gore,buchanan,'bo')
legend('Bush data','Gore data')

Figure contains an axes object. The axes object contains 2 objects of type line. One or more of the lines displays its values using only markers These objects represent Bush data, Gore data.

Assume a model where a fixed proportion of Bush or Gore voters choose to vote for Buchanan.

f = fittype({'x'})

f = 
     Linear model:
     f(a,x) = a*x

Exclude the data from absentee voters, who did not use the controversial “butterfly” ballot.

nobutterfly = strcmp(counties,'Absentee Ballots');

Perform a bisquare weights robust fit of the model to the two data sets, excluding absentee voters.

bushfit = fit(bush,buchanan,f,'Exclude',nobutterfly,'Robust','on');
gorefit = fit(gore,buchanan,f,'Exclude',nobutterfly,'Robust','on');

Robust fits give outliers a low weight, so large residuals from a robust fit can be used to identify the outliers.

figure
plot(bushfit,bush,buchanan,'rs','residuals')
hold on
plot(gorefit,gore,buchanan,'bo','residuals')

Figure contains an axes object. The axes object with xlabel x contains 4 objects of type line. One or more of the lines displays its values using only markers These objects represent Data, Zero line.

Calculate the residuals.

bushres = buchanan - feval(bushfit,bush);
goreres = buchanan - feval(gorefit,gore);

Identify large residuals as those outside the range [-500 500].

bushoutliers = excludedata(bush,bushres,'range',[-500 500]);
goreoutliers = excludedata(gore,goreres,'range',[-500 500]);

Display the counties corresponding to the outliers. Miami-Dade and Broward counties correspond to the largest predictor values. Palm Beach county, the only county in the state to use the “butterfly” ballot, corresponds to the largest residual values.

counties(bushoutliers)

ans = 2x1 cell
    {'Miami-Dade'}
    {'Palm Beach'}

counties(goreoutliers)

ans = 3x1 cell
    {'Broward'   }
    {'Miami-Dade'}
    {'Palm Beach'}

Input Arguments

collapse all

`x` — Data sites
numeric vector

Data sites of data values, specified as a numeric vector.

`y` — Data values
numeric vector

Data values, specified as a numeric vector.

`box` — Box to find data outside of
numeric vector with four elements

Box to find data outside of, specified as a numeric vector [xmin xmax ymin ymax] with four elements.

Example: [-1 1 0 2]

`domain` — Domain to find data outside of
numeric vector with two elements

Domain to find data outside of, specified as a numeric vector [xmin xmax] with two elements.

Example: [-1 1]

`range` — Range to find data outside of
numeric vector with two elements

Range to find data outside of, specified as a numeric vector [ymin ymax] with two elements.

Example: [3 4]

`indices` — Indices of data points to find
numeric vector

Indices of data points to find, specified as a numeric vector.

Example: [3 7 9]

Version History

Introduced before R2006a

excludedata

Syntax

Description

Examples

Visualize Exclusion Rules

Exclude Data from Curve Fit

Input Arguments

x — Data sites numeric vector

y — Data values numeric vector

box — Box to find data outside of numeric vector with four elements

domain — Domain to find data outside of numeric vector with two elements

range — Range to find data outside of numeric vector with two elements

indices — Indices of data points to find numeric vector

Version History

See Also

`x` — Data sites
numeric vector

`y` — Data values
numeric vector

`box` — Box to find data outside of
numeric vector with four elements

`domain` — Domain to find data outside of
numeric vector with two elements

`range` — Range to find data outside of
numeric vector with two elements

`indices` — Indices of data points to find
numeric vector