Software for the detection of outliers and influential points based on the HAT method

Nenhuma Miniatura disponível

Data

2017-04-01

Autores

de Moraes, José Reinaldo da Silva Cabral [UNESP]
Rolim, Glauco de Souza [UNESP]
Aparecido, Lucas Eduardo de Oliveira [UNESP]

Título da Revista

ISSN da Revista

Título de Volume

Editor

Resumo

We developed software in Visual Basic for application in Microsoft Excel that identifies outliers (OUTs) and influential datapoints (IPs) of scattered data using the HAT method (Hoaglin and Welsch). OUTs are commonly identified visually, which is susceptible to errors. The identification of IPs is not trivial, and using statistical tests is necessary. HAT is the most common statistical method to select OUTs and IPs in regression analyses and identifies four groups of data: 1) data within the standard range of variability, 2) OUTs, 3) IPs, and 4) both OUTs and IPs (OUT+IPs). The decision to remove or not remove data from the database depends on the researcher, and the HAT method helps to make these decisions. The removal of an OUT usually improves the accuracy of models. The removal of IPs, however, may or may not improve the accuracy. A small hypothetical data set of rainfall from automatic and conventional rain gauges was used to extensively test the software. The amount of data that can be used in the software is limited by the number of lines of the Excel spreadsheet (65 518). The first step in identifying OUTs and IPs is to analyse all the data, which produced an R2 for the raw data in our example of 0.11, indicating weak relationships between the variables. The HAT test identified two OUTs, three IPs, and one OUT+IP in the data. If all OUTs were removed, R2 would increase to 0.19. If the OUT+IP was removed, R2 would increase to 0.86. If all IPs were also removed, R2 would decrease to 0.45. The software is free and can be requested by email from reinaldojmoraes@gmail.com.

Descrição

Palavras-chave

Accuracy, Dispersion, Model, Regression analysis

Como citar

Australian Journal of Crop Science, v. 11, n. 4, p. 459-463, 2017.