Logo do repositório
 

Software for the detection of outliers and influential points based on the HAT method

dc.contributor.authorde Moraes, José Reinaldo da Silva Cabral [UNESP]
dc.contributor.authorRolim, Glauco de Souza [UNESP]
dc.contributor.authorAparecido, Lucas Eduardo de Oliveira [UNESP]
dc.contributor.institutionUniversidade Estadual Paulista (Unesp)
dc.date.accessioned2018-12-11T17:32:15Z
dc.date.available2018-12-11T17:32:15Z
dc.date.issued2017-04-01
dc.description.abstractWe developed software in Visual Basic for application in Microsoft Excel that identifies outliers (OUTs) and influential datapoints (IPs) of scattered data using the HAT method (Hoaglin and Welsch). OUTs are commonly identified visually, which is susceptible to errors. The identification of IPs is not trivial, and using statistical tests is necessary. HAT is the most common statistical method to select OUTs and IPs in regression analyses and identifies four groups of data: 1) data within the standard range of variability, 2) OUTs, 3) IPs, and 4) both OUTs and IPs (OUT+IPs). The decision to remove or not remove data from the database depends on the researcher, and the HAT method helps to make these decisions. The removal of an OUT usually improves the accuracy of models. The removal of IPs, however, may or may not improve the accuracy. A small hypothetical data set of rainfall from automatic and conventional rain gauges was used to extensively test the software. The amount of data that can be used in the software is limited by the number of lines of the Excel spreadsheet (65 518). The first step in identifying OUTs and IPs is to analyse all the data, which produced an R2 for the raw data in our example of 0.11, indicating weak relationships between the variables. The HAT test identified two OUTs, three IPs, and one OUT+IP in the data. If all OUTs were removed, R2 would increase to 0.19. If the OUT+IP was removed, R2 would increase to 0.86. If all IPs were also removed, R2 would decrease to 0.45. The software is free and can be requested by email from reinaldojmoraes@gmail.com.en
dc.description.affiliationUNESP - São Paulo State University Department of Exact Sciences
dc.description.affiliationUnespUNESP - São Paulo State University Department of Exact Sciences
dc.format.extent459-463
dc.identifierhttp://dx.doi.org/10.21475/ajcs.17.11.04.356
dc.identifier.citationAustralian Journal of Crop Science, v. 11, n. 4, p. 459-463, 2017.
dc.identifier.doi10.21475/ajcs.17.11.04.356
dc.identifier.issn1835-2707
dc.identifier.issn1835-2693
dc.identifier.scopus2-s2.0-85018362256
dc.identifier.urihttp://hdl.handle.net/11449/178824
dc.language.isoeng
dc.relation.ispartofAustralian Journal of Crop Science
dc.relation.ispartofsjr0,354
dc.relation.ispartofsjr0,354
dc.rights.accessRightsAcesso restrito
dc.sourceScopus
dc.subjectAccuracy
dc.subjectDispersion
dc.subjectModel
dc.subjectRegression analysis
dc.titleSoftware for the detection of outliers and influential points based on the HAT methoden
dc.typeArtigo
dspace.entity.typePublication
unesp.departmentCiências Exatas - FCAVpt

Arquivos