**Glossary of terms**

**Binomial model**: A statistical model used to analyse data with two possible outcomes, typically denoted as "success" and "failure."**Covariate**: A variable that is not the primary focus of a study but is considered alongside the main variables to understand potential relationships or influences.**Correlation**: A statistical measure that indicates the degree to which two variables change together, often expressed as a coefficient between -1 and 1.**Data noisiness**: The presence of random or irrelevant variations in data, which can make it challenging to extract meaningful information.**Design-based method**: A statistical approach that considers the specific design or structure of a study when making inferences from data.**DHS (Demographic and Health Survey)**: A large-scale survey programme that collects data on demographic and health indicators in various countries.**Elimination**: The reduction or eradication of a disease or condition from a specific area or population.**Friction surface**: A spatial model representing the resistance or cost of moving between different locations in a geographic area.**Geoconnect ID**: An identifier used to link or connect geospatial data to specific locations or regions.**Georeferenced data**: Data that are associated with specific geographic coordinates, allowing them to be mapped and analysed in a geographic context.**Geospatial**: Refers to data, information, or activities that have a geographic component and can be represented on a map.**GLMM (Generalised Linear Mixed Model)**: A statistical model that extends the generalised linear model by incorporating random effects to account for correlation or hierarchical data structures.**GTMP (Global Trachoma Mapping Project)**: A global initiative that ran between 2012-2015 aimed at mapping the prevalence of trachoma, a neglected tropical disease that causes blindness.**Ground-truthing**: The process of verifying or validating data, typically by collecting on-the-ground information to confirm their accuracy.**ITI (International Trachoma Initiative)**: An organisation dedicated to the elimination of trachoma as a public health problem.**Landcover**: The physical and biological characteristics of the Earth's surface, including vegetation, water bodies, and built environments.**Linear relationship**: A statistical relationship between two variables that can be represented by a straight line on a scatterplot.**Logistic regression**: A statistical model used to analyse the relationship between a binary dependent variable and one or more independent variables.**MBG (Model-Based Geostatistics)**: A statistical approach that combines spatial modelling and geostatistics to make predictions about data in unsampled locations.**Model calibration**: The process of adjusting model parameters to ensure that the model's predictions align with observed data.**Model distribution**: The probability distribution used to describe the variability of data in a statistical model.**Model validation**: The process of assessing how well a statistical or predictive model performs by comparing its output to real-world data.**Multivariate**: Involving multiple variables or factors, often used in statistical analysis to understand complex relationships.**NTD (Neglected Tropical Disease)**: A group of infectious diseases that primarily affect people in tropical and subtropical regions, often with limited access to healthcare.**PBT (Probability of being Below Threshold)**: A metric used to estimate the probability that the disease elimination target has been achieved.**Population density**: The number of people living per unit area, often measured in individuals per square kilometre or square mile.**Population weighting**: A technique used in statistical analysis to account for variations in population size when making inferences.**R software**: An open-source programming language and software environment commonly used for statistical analysis and data visualisation.**Random effect**: In a statistical model, a variable that represents random or unexplained variation, often accounting for correlation within hierarchical data.**Raster file**: A data format that divides geographic information into a grid of cells, where each cell has a value representing a particular attribute.**Shapefile**: A common file format for storing geographic vector data, such as points, lines, and polygons.**Variance of the spatial process**: A measure of how data points vary in space, indicating the degree of spatial autocorrelation.**Simulation study**: A research method that uses computer-generated data to model and analyse real-world phenomena.**Spatial confounding**: A potential issue in spatial analysis where unaccounted factors lead to misleading results.**Statistically significant**: A result in a statistical analysis that is unlikely to have occurred by random chance, often expressed with a significance level (e.g., p-value).**Strata**: Divisions or subsets used to group data for analysis, often based on specific criteria or characteristics.**TF (Trachomatous inflammation—follicular)**: A clinical sign of trachoma, indicating inflammation with follicles on the inside of the upper eyelid.**Threshold**: A specific value or condition used to categorise or make decisions in data analysis.**TT (Trachomatous Trichiasis)**: A late stage of trachoma, characterised by the in-turning of the eyelashes that tough the eyeball, which can lead to blindness.**Univariate**: Involving a single variable or factor, often used in statistical analysis to describe or analyse data with only one dimension.