Non-stochastic Budgeted Online Pricing with Semi-Bandit Feedback

Xiang Liu; Hau Chan; Minming Li; Weiwei Wu; Long Tran-Thanh

doi:10.1609/aaai.v39i18.34089

ScienceGate Book Chapters

JOURNAL ARTICLE

Non-stochastic Budgeted Online Pricing with Semi-Bandit Feedback

Xiang Liu Hau Chan Minming Li Weiwei Wu Long Tran-Thanh

Year: 2025 Journal: Proceedings of the AAAI Conference on Artificial Intelligence Vol: 39 (18)Pages: 18978-18986 Publisher: Association for the Advancement of Artificial Intelligence

DOI: 10.1609/aaai.v39i18.34089

Get Full-Text PDF Get Analytical Report

Abstract

We consider a general non-stochastic online pricing bandit setting in a procurement scenario where a buyer with a budget wants to procure items from a fixed set of sellers to maximize the buyer's reward by dynamically offering purchasing prices to the sellers, where the sellers' costs and values at each time period can change arbitrarily and the sellers determine whether to accept the offered prices to sell the items. This setting models online pricing scenarios of procuring resources or services in multi-agent systems. We first consider the offline setting when sellers' costs and values are known in advance and investigate the best fixed-price policy in hindsight. We show that it has a tight approximation guarantee with respect to the offline optimal solutions. In the general online setting, we propose an online pricing policy, Granularity-based Pricing (GAP), which exploits underlying side-information from the feedback graph when the budget is given as the input. We show that GAP achieves an upper bound of O(n{v_{max}}{c_{min}}sqrt{B/c_{min}}ln B) on the alpha-regret where n, v_{max}, c_{min}, and B are the number, the maximum value, the minimum cost of sellers, and the budget, respectively. We then extend it to the unknown budget case by developing a variant of GAP, namely Doubling-GAP, and show its alpha-regret is at most O(n{v_{max}}{c_{min}}sqrt{B/c_{min}}ln2 B). We also provide an alpha-regret lower bound Omega(v_{max}sqrt{Bn/c_{min}}) of any online policy that is tight up to sub-linear terms. We conduct simulation experiments to show that the proposed policy outperforms the baseline algorithms.

Keywords:

Computer science Economics Econometrics Mathematical optimization Mathematics

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.24

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Advanced Bandit Algorithms Research

Social Sciences → Decision Sciences → Management Science and Operations Research

Auction Theory and Applications

Social Sciences → Decision Sciences → Management Science and Operations Research

Smart Grid Energy Management

Physical Sciences → Engineering → Electrical and Electronic Engineering

Non-stochastic Budgeted Online Pricing with Semi-Bandit Feedback

Abstract

Metrics

Topics

Related Documents

Online Influence Maximization With Semi-Bandit Feedback Under Corruptions

Distributed Online Stochastic-Constrained Convex Optimization With Bandit Feedback

Online Second Price Auction with Semi-Bandit Feedback under the Non-Stationary Setting

Online Learning for Computation Peer Offloading with Semi-bandit Feedback

Stochastic Convex Optimization with Bandit Feedback