Maximizing Efficiency with Model Complexity Reducer (MCR)

Designed to optimize models by reducing their complexity, without significant performance degradation.

May 17, 2024

Alejandro Martinez

Sr. Data Scientist

Article

Access the complete research paper here. This blog post acts as a summary.

The rapid advancement of machine learning has led to the development of highly complex models that deliver impressive performance. However, the deployment of these models in resource-constrained environments, such as mobile devices, blockchain or real-time applications, poses significant challenges. Reducing model complexity while maintaining performance is essential to ensure feasibility and efficiency.

The Zero-Knowledge Machine Learning (ZKML) paradigm presents a unique challenge, where the cost of generating ZK Proofs can get significantly higher than standard inference due to the cryptographic processes involved. In this context, the complexity of algorithms is a fundamental factor. The model-complexity-reducer (MCR) package is designed to optimize gradient-boosting-trees (GBT) models by reducing their complexity, measured by the number of operations required for inference, without significant performance degradation.

High-complexity models can result in several issues:

Resource Consumption: Models with numerous parameters demand substantial computational power and memory, limiting their deployment in resource-constrained environments.Inference
Time: Increased complexity often leads to longer inference times, which are unsuitable for real-time applications.
Overfitting: Complex models are prone to overfitting, reducing their generalization capabilities.

Introducing MCR

The MCR package provides a robust solution for reducing model complexity while maintaining high performance. This is especially crucial for applications within the Zero-Knowledge Machine Learning (ZKML) paradigm, where computational efficiency is paramount.

MCR can be defined as an optimization process that maximizes the trade-off between performance and complexity, where complexity is defined as the number of steps the algorithm needs to perform to create an inference.

Using the MCR algorithm to optimize a model trained on this dataset is straightforward. Given a trained model in Python, you only need to call the MCR function, and it will return the transformer and the new optimized model. For example:

model, transformer = mcr(model=lgbm_reg, 
 X_train=X_train, 
 y_train=y_train, 
 X_eval=X_eval, 
 y_eval=y_eval, 
 eval_metric='auc', 
 transform_features=True)

Several tests have been performed where this method is applied to simple datasets such as the well-known diabetes and breast cancer datasets and to more complex ones such as the kaggle dataset "Airline Passenger Satisfaction". It is observed that the complexity reduction ranges from 50% to 99%, losing only 0.8% to 2% of performance.

Conclusion

The practical application on the Airline Passenger Satisfaction dataset demonstrated that MCR drastically reduces model complexity—a reduction of 99.97%—while only slightly decreasing performance, with the AUC decreasing by 2.08%.

MCR is an open-source package available to everyone, promoting accessibility and ease of use. The complexity of the optimization process is abstracted away from the user, allowing them to simply execute a function call to achieve optimized models. This simplicity, combined with the package’s effectiveness, makes MCR a valuable tool for various machine learning applications, especially those requiring lightweight and efficient models.

Start using it here: https://github.com/gizatechxyz/auto-zkml