Stability and scalability of the CMS Global Pool: Pushing HTCondor and glideinWMS to new limits

dc.contributor.authorBalcas, J.
dc.contributor.authorBockelman, B.
dc.contributor.authorHufnagel, D.
dc.contributor.authorAnampa, K Hurtado
dc.contributor.authorKhan, F Aftab
dc.contributor.authorLarson, K.
dc.contributor.authorLetts, J.
dc.contributor.authorMarra Da Silva, J. [UNESP]
dc.contributor.authorMascheroni, M.
dc.contributor.authorMason, D.
dc.contributor.authorYzquierdo, A. Perez-Calero
dc.contributor.authorTiradani, A.
dc.contributor.institutionCalifornia Institute of Technology
dc.contributor.institutionUniversity of Nebraska
dc.contributor.institutionFermi National Accelerator Laboratory
dc.contributor.institutionUniversity of Notre Dame
dc.contributor.institutionQuaid-I-Azam University
dc.contributor.institutionUniversity of California San Diego
dc.contributor.institutionUniversidade Estadual Paulista (UNESP)
dc.contributor.institutionPort d'Informació Científica
dc.contributor.institutionCIEMAT
dc.date.accessioned2022-04-28T19:07:12Z
dc.date.available2022-04-28T19:07:12Z
dc.date.issued2017-11-23
dc.description.abstractThe CMS Global Pool, based on HTCondor and glideinWMS, is the main computing resource provisioning system for all CMS workflows, including analysis, Monte Carlo production, and detector data reprocessing activities. The total resources at Tier-1 and Tier-2 grid sites pledged to CMS exceed 100,000 CPU cores, while another 50,000 to 100,000 CPU cores are available opportunistically, pushing the needs of the Global Pool to higher scales each year. These resources are becoming more diverse in their accessibility and configuration over time. Furthermore, the challenge of stably running at higher and higher scales while introducing new modes of operation such as multi-core pilots, as well as the chaotic nature of physics analysis workflows, places huge strains on the submission infrastructure. This paper details some of the most important challenges to scalability and stability that the CMS Global Pool has faced since the beginning of the LHC Run II and how they were overcome.en
dc.description.affiliationCalifornia Institute of Technology
dc.description.affiliationUniversity of Nebraska
dc.description.affiliationFermi National Accelerator Laboratory
dc.description.affiliationUniversity of Notre Dame
dc.description.affiliationNational Centre for Physics Quaid-I-Azam University
dc.description.affiliationUniversity of California San Diego
dc.description.affiliationUniversidade Estadual Paulista
dc.description.affiliationPort d'Informació Científica
dc.description.affiliationCentro de Investigaciones Energéeticas Medioambientales y Tecnológicas CIEMAT
dc.description.affiliationUnespUniversidade Estadual Paulista
dc.description.sponsorshipU.S. Department of Energy
dc.description.sponsorshipNational Science Foundation
dc.identifierhttp://dx.doi.org/10.1088/1742-6596/898/5/052031
dc.identifier.citationJournal of Physics: Conference Series, v. 898, n. 5, 2017.
dc.identifier.doi10.1088/1742-6596/898/5/052031
dc.identifier.issn1742-6596
dc.identifier.issn1742-6588
dc.identifier.scopus2-s2.0-85038444697
dc.identifier.urihttp://hdl.handle.net/11449/220987
dc.language.isoeng
dc.relation.ispartofJournal of Physics: Conference Series
dc.sourceScopus
dc.titleStability and scalability of the CMS Global Pool: Pushing HTCondor and glideinWMS to new limitsen
dc.typeTrabalho apresentado em evento

Arquivos

Coleções