ZKBoost: A novel XGBoost model for ZKML
The fastest available XGBoost implementation, completely open-source and with comprehensive benchmarks.
May 6, 2024
Alejandro Martinez
Sr. Data Scientist
The structured and immutable nature of blockchain data makes it ideal for the application of machine learning (ML) techniques, particularly supervised learning algorithms like XGBoost. Supervised learning involves training a model on a labeled dataset, where the input data (features) are used to predict an output (label). Given the clear, organized structure of blockchain data, which typically includes transaction details such as timestamps, amounts, sender and receiver addresses, and transaction metadata, it can be effectively used to train models to recognize patterns, detect anomalies, or predict outcomes. This is one of the reasons we created Giza Datasets to provide a cleaner, curated structure of this data to apply ML in blockchain use cases. There are also great tools like Cryo that make blockchain data available in a very structured manner for analysis and consumption.
While the AI x Crypto intersection presents innovative opportunities, the specific use of Large Language Models (LLMs) in these contexts are often misaligned with the inherent characteristics of blockchain data and performance requirements of its use-cases. The suitability of Large Language Models (LLMs) versus XGBoost for handling tabular data hinges on several key differences in their design, capabilities, and inherent strengths. While LLMs offer broad capabilities in natural language processing, XGBoost is specifically optimized for structured, tabular data, making it generally more effective for such applications. In the paper “Why do tree-based models still outperform deep learning on tabular data?”, the authors benchmark multiple Deep Learning and Transformers model architectures against tree-based models demonstrating how the latter outperform more complex models on real-world datasets. If you’re eager to go down the rabbit hole, Bojan Tunguz, a Data Scientist at NVIDIA, recently published a talk on why XGBoost is sufficient for handling tabular data, extending the existing body of supporting research on this topic.
Introducing a novel implementation for the CairoVM
At Giza, we’re excited to announce a significant advance in the efficiency of our proof system based on one of the most popular machine learning frameworks, XGBoost. This article details the benchmarking results that highlight these improvements and discusses the potential of our technology to enhance ZKML workflows for Web3 use cases.
This new implementation is targeting the CairoVM, a virtual machine for the Cairo language. Cairo is the first production-grade platform for generating STARK proofs for general computation. It's Turing-complete and it was created by StarkWare. The implementation is integrated inside Orion, our open-source ZKML framework.
Methodology
The time metrics of our custom implementation only depend on two factors: the number of trees and the depth of each tree. By not relying on external generic format conversion tools such as ONNX, we know exactly the structure of the algorithm on which we are going to run the inference. That is, the inference code is static and the metrics only depend on the number of steps to go through a tree and the number of trees.
This approach ensures that the times to run, prove, and verify the model are consistent and predictable, focusing only on tree complexity.
Device: Apple M2 Max, 32GB of RAM.
Prover/Verifier: Platinum prover, commit fed12d6
VM: CairoVM, commit c5839fd
Cairo: 2.6.3
Sierra: 1.5.0
Comparing to previous benchmarks
For cohesion, we adopted the same methodology conducted by the EZKL team in their latest benchmark. We compared previous results with our new implementation of tree ensemble in Orion, a key component of XGBoost. Here are the results:
Orion proving time is 9.83x faster than EZKL and 835.67x faster than Risc0. It's also observed that Orion’s standard deviation of the proving times are 18x more consistent than EZKL and 189.9x more consistent than Risc0.
Comparing the new tree ensemble implementation in Orion to the previous one, the results are the following:
The new tree ensemble implementation in Orion is 45.67x faster than the previous implementation. It’s also observed that the new implementation’s standard deviation of the proving times are 20.41x more consistent than the previous implementation in Orion.
Scaling the benchmarks
The purpose of our benchmark is to outline the different metrics of the new implementation for the XGBoost algorithm in Orion to show its performance. The tracked metrics are as follows:
Run Time (ms): The duration required to execute an inference for a given model on CairoVM. It is worth noting that the VM operates in proof mode.
Proving Time (ms): The duration required to generate a proof for a given model.
Verify Time (ms): The duration required to verify a proof for a given model.
Cairo Steps: The number of program steps for a given model.
Per Tree Per Sample Times
These benchmarks were designed to isolate the impact of tree depth on our XGBoost implementation performance. We tracked metrics at various depths, from 3 to 12. The results clearly illustrate how each step in the process scales with increasing complexity:
From the table above we observe the following:
There is a clear and consistent increase in run time as the depth increases. This starts from 2.92 ms at depth 3 and progresses to 43.23 ms at depth 12. This suggests that deeper structures require more computation time.
The number of steps increases clearly and consistently with increasing depth, starting at 521 at depth 3 and rising to 3157 at depth 12. However, in proof mode, the step count remains constant at 2^17 for depths ranging from 3 to 12. This constant is because the CairoVM in proof mode adds an infinite loop at the end of the program to fill the execution trace to the nearest power of two, which is necessary for efficient FFT processing on the prover's side. A more detailed explanation is provided later in the article.
Proving time does not show a clear increasing or decreasing trend with depth 3 to 12; the times are relatively similar. This consistency is due to the fixed number of steps in proof mode. Consequently, the model's proving time is determined by the power of two intervals in which the Cairo-steps lie, whatever the depth of the model.
Impact of Increasing Tree Counts
We further analyzed the impact of scaling the number of trees at a fixed depth, measuring how metrics reacted to increasing complexity from 5 to 200 trees at selected depths:
From the table above we observe the following:
As the number of trees increases, the run time increases exponentially. This is evident from a run time of 12.94 ms with 5 trees, escalating to 13,226.38 ms with 200 trees. This strong correlation suggests that the computational load increases significantly with more trees, likely due to more data processing or more complex operations being performed.
The number of steps increases with the number of trees. This can be seen from the number of steps of 3433 with 5 trees, which increases to 125797 steps with 200 trees. As with the benchmark discussed above, the number of Cairo steps in proof mode remains constant at 2^17 for a number of trees ranging from 5 to 200. This means that the proving time is also relatively constant.
From the table above we observe the following:
Run time shows a substantial increase as the number of trees increases. Starting at 33.35 ms with 5 trees, the run time escalates to 3929.31 ms with 70 trees. This sharp increase suggests that the computational demand grows significantly with the addition of more trees.
The number of steps grows as the number of trees increases, as illustrated by the step count rising from 6728 for 5 trees to 84376 for 70 trees. Similar to the previously discussed benchmarks, the number of Cairo steps in proof mode stays fixed at 2^17 for any number of trees between 5 and 70. Consequently, the proving time remains relatively unchanged.
From the table above we observe the following:
There is a clear trend of increasing run times as the number of trees increases. The run time jumps from 136.18 ms with 5 trees to 1056.83 ms with 20 trees.
As the number of trees increases, so does the number of steps—from 13066 with 5 trees to 54295 with 20 trees. Similar to the earlier benchmarks, the Cairo steps in proof mode maintain a constant level at 2^17, across a range from 5 to 20 trees. As a result, the proving time remains relatively constant.
The detailed examination of the tables focusing on varying depths and numbers of trees provides a comprehensive view of how different system metrics—run time, proving time, verification time and Cairo steps—respond to changes in these variables. Here are the final observations about the relationship between depth, trees, and the resultant metrics:
Across all depths examined (4, 6, and 9), run time consistently increases with the number of trees. This increase is not merely linear but often exponential, indicating a significant rise in computational demands as the system scales up in complexity and size.
The effect of depth is notably apparent in run times, where deeper levels (e.g., depth 9) see significantly higher run times even for the same number of trees compared to shallower depths (e.g., depth 4). This underscores the added complexity and computational demands of deeper structures.
All the models we tested lie at a Cairo-steps of 2^17 in proof mode. This means that whatever the depth or number of trees, as long as the Cairo-steps is the same power of two in proof mode, proving times will be relatively similar.
Understanding Constant Proving Time
Platinum Cairo prover heavily depends on the Fast Fourier Transform (FFT) for efficient polynomial operations during the proving process. FFT processes data most efficiently when the input size is a power of two. Thus, the CairoVM pads the execution trace to the nearest power of two in proof mode.
To better understand, let's take a simple Fibonacci Cairo program where the depth depends on `n`. Below are the benchmark results.
Cairo Steps: This represents the actual number of steps your program takes to compute the Fibonacci sequence to the given depth (n).
Cairo Steps in Proof Mode: These are the padded steps, adjusted to the nearest power of two, necessary for optimal FFT performance.
When `n` ranges from 1 to 1,000, the Cairo Steps in proof mode remain constant at 2^15. This uniformity means the prover's workload doesn't substantially change, leading to similar proving times despite variations in the actual Fibonacci computation steps. However, when `n` increases to 10,000 and 100,000, the proof mode steps jump to the next powers of two (2^16 and 2^20 respectively). Thus, the proving time remains fairly stable until a new power of two threshold is reached in the proof mode steps.
This is why in our benchmark, the proving times of the XGBoost models remain constant, since the number of steps in proof mode remains 2^17 for all the models we tested.
Example: Airline Passenger Satisfaction Dataset from Kaggle
To illustrate how our solution operates from end to end, we worked on a specific example using the airline passenger satisfaction dataset from Kaggle. This example demonstrates the seamless integration and efficiency of our implementation in a practical scenario.
We began by downloading the airline passenger satisfaction dataset and performing necessary preprocessing to prepare the data for modeling.
Using scikit-learn, we trained a XGBoost classifier on the prepared dataset, tuning it to identify patterns and do predictions.
After training the classifier, the model is integrated into the Giza stack, which takes over the subsequent processes:
Transpilation: The trained XGBoost model is transpiled into a Cairo program.
Execution: execute the Cairo program. This step computes the predictions and prepares the model for generating cryptographic proofs.
Proof Generation and Verification: Finally, the proof of the model’s predictions is generated, followed by the verification process. This step is crucial as it confirms the accuracy and integrity of the model's outputs.
Performance Metrics:
These performance metrics highlight the efficiency of our solution when dealing with real-world datasets and executing complex machine learning models within Orion’s new XGBoost implementation. The consistent performance across run, prove, and verify times illustrates the practicality and innovation of our approach to zero-knowledge machine learning (ZKML).
Preserving XGBoost performance
Our solution prioritizes maintaining the integrity and performance of the original XGBoost model when transpiled and executed within the Orion. This new implementation has deterministic model behavior and has no performance degradation.
We have engineered our transpilation process to maintain a one-to-one correspondence with the original XGBoost algorithm's execution. This ensures that the model running on the CairoVM exhibits the same decision-making process as it would in a conventional computing environment. By preserving the model's internal logic and parameters, we guarantee that the outcomes remain consistent and predictable, mirroring the original XGBoost performance.
Future work
In the newly developed implementation of ZKBoost presented, all processes are currently executed sequentially, without incorporating parallelization either during runtime or in the proving stages. Acknowledging the significant potential for enhancing efficiency through parallel computing, we are dedicating our efforts towards the development of a universally applicable framework for proof parallelization. This approach aims to be model-agnostic, thereby ensuring its applicability across various machine learning algorithms, not limited to XGBoost alone. Concurrently, specific to the ZKBoost algorithm, we are exploring strategies for parallelizing tree construction, which is anticipated to substantially reduce both the execution and proving times.
Our future work, therefore, will focus on these two main areas: Establishing a robust, scalable method for general parallelization in ZKML workflows, and optimizing the ZKBoost algorithm to leverage parallel tree building. This dual approach promises to push the boundaries of current processing speeds and efficiency, setting a new standard for performance in the field.
Conclusion
In conclusion, this new implementation of XGBoost in Cairo (ZKBoost) represents a significant advancement in the application of trust-minimized ML for Web3 use cases. The structured, tabular nature of blockchain data, coupled with the efficiency and predictive power of ZKBoost, has shown considerable promise in enhancing the functionality and security of these systems. Our benchmark results clearly demonstrate the superior performance of our new tree ensemble implementation, significantly reducing runtime and proving times while maintaining a high standard of accuracy and consistency.
With these advancements, the future of blockchain and zero-knowledge machine learning (ZKML) looks promising. As we continue to develop and refine our methodologies, focusing on scalability and the parallelization of tree construction, we anticipate even greater improvements in performance. These enhancements will not only accelerate the proving processes but will also broaden the applicability of our solutions across a variety of Web3 use cases. Our ongoing research and development are driven by the goal to achieve an optimal balance between efficiency and reliability, ensuring that our technology remains at the forefront of ZKML.
The repository for the benchmarks is publicly available.
Run ZKBoost in Giza CLI now.