Unplanned pump failures in asset-intensive industries like pulp and paper lead to significant production losses. Data-driven predictive maintenance through anomaly detection has recently appeared to be useful in industrial settings. However, this approach is hampered by the infeasibility of manually annotating multivariate sensor data for supervised learning. While unsupervised anomaly detection offers a promising approach, a key challenge is the lack of structured ground-truth labels for evaluation derived from sparse, unstructured maintenance logs. This paper addresses this gap by introducing a fully unsupervised framework that systematically transforms window-level sensor data for model training and utilizes maintenance notifications to enable robust model evaluation. We implement this approach on a critical process pump in a paperboard mill on an industrial scale, with data that extends for a year. The framework contains a reproducible log-to-label pipeline that generates anomalous and normal time-series windows from industrial sensor and maintenance data. The framework also implements a comprehensive feature engineering process that extracts statistical, spectral, and temporal features from high-frequency sensor readings. We have implemented the framework for a comparative evaluation of five unsupervised anomaly detectors. Our experiments show the usefulness of the framework in practice, and also discuss the tradeoffs, including a critical tradeoff between detection accuracy and deployment feasibility. This work provides a practical framework for evaluating and deploying unsupervised anomaly detection models in real-world industrial settings where labeled data is almost unavailable. © 2025 The Authors.