Sample, Scrutinize and Scale

Created: 2025-03-19 02:44:24 +0000

Last modified: 2025-03-19 20:56:50 +0900

Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification

url: https://arxiv.org/abs/2502.01839

pdf: https://arxiv.org/pdf/2502.01839

html: https://arxiv.org/html/2502.01839v1

abstract: Sampling-based search, a simple paradigm for utilizing test-time compute, involves generating multiple candidate responses and selecting the best one – typically by having models self-verify each response for correctness. In this paper, we study the scaling trends governing sampling-based search. Among our findings is that simply scaling up a minimalist implementation of sampling-based search, using only random sampling and direct self-verification, provides a practical inference method that, for example, elevates the reasoning capabilities of Gemini v1.5 Pro above that of o1-Preview on popular benchmarks. We partially attribute the scalability of sampling-based search to a phenomenon of implicit scaling, where sampling a larger pool of responses in turn improves self-verification accuracy. We further identify two useful principles for improving self-verification capabilities with test-time compute: (1) comparing across responses provides helpful signals about the locations of errors and hallucinations, and (2) different model output styles are useful for different contexts – chains of thought are useful for reasoning but harder to verify. We also find that, though accurate verification can be elicited, frontier models demonstrate remarkably weak out-of-box verification capabilities and introduce a benchmark to measure progress on these deficiencies.

Sample, Scrutinize and Scale

Sample, Scrutinize and Scale

Sample, Scrutinize and Scale

Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification

post contain ""

No matching posts found containing ""

Recent Posts

Most Likes

Most Views

Share Your Feedback 🏝️

Sample, Scrutinize and Scale

Sample, Scrutinize and Scale

Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification

post contain ""

No matching posts found containing ""

Recent Posts

Most Likes

Most Views