expensive and running the standard greedy algorithm is in-
feasible. Fortunately, submodularity can be exploited to im-
plement an accelerated version of the classical greedy algo-
rithm, usually called LAZY-GREEDY (Minoux 1978a). In-
stead of computing ∆(e|A
i−1
) for each element e ∈ V , the
LAZY-GREEDY algorithm keeps an upper bound ρ(e) (ini-
tially ∞) on the marginal gain sorted in decreasing order. In
each iteration i, the LAZY-GREEDY algorithm evaluates the
element on top of the list, say e, and updates its upper bound,
ρ(e) ← ∆(e|A
i−1
). If after the update ρ(e) ≥ ρ(e
0
) for all
e
0
6= e, submodularity guarantees that e is the element with
the largest marginal gain. Even though the exact cost (i.e.,
number of function evaluations) of LAZY-GREEDY is un-
known, this algorithm leads to orders of magnitude speedups
in practice. As a result, it has been used as the state-of-the-
art implementation in numerous applications including net-
work monitoring (Leskovec et al. 2007), network inference
(Rodriguez, Leskovec, and Krause 2012), document sum-
marization (Lin and Bilmes 2011), and speech data subset
selection (Wei et al. 2013), to name a few. However, as the
size of the data increases, even for small values of k, run-
ning LAZY-GREEDY is infeasible. A natural question to ask
is whether it is possible to further accelerate LAZY-GREEDY
by a procedure with a weaker dependency on k. Or even bet-
ter, is it possible to have an algorithm that does not depend
on k at all and scales linearly with the data size n?
In this paper, we propose the first linear-time algorithm,
STOCHASTIC-GREEDY, for maximizing a non-negative
monotone submodular function subject to a cardinality con-
straint k. We show that STOCHASTIC-GREEDY achieves
a (1 − 1/e − ) approximation guarantee to the opti-
mum solution with running time O(n log(1/)) (measured
in terms of function evaluations) that is independent of
k. Our experimental results on exemplar-based clustering
and active set selection in nonparametric learning also con-
firms that STOCHASTIC-GREEDY consistently outperforms
LAZY-GREEDY by a large margin while achieving practi-
cally the same utility value. More surprisingly, in our exper-
iments we observe that STOCHASTIC-GREEDY sometimes
does not even evaluate all the items and shows a running
time that is less than n while still providing solutions close
to the ones returned by LAZY-GREEDY. Due to its inde-
pendence of k, STOCHASTIC-GREEDY is the first algorithm
that truly scales to voluminous datasets.
Related Work
Submodularity is a property of set functions with deep the-
oretical and practical consequences. For instance, sub-
modular maximization generalizes many well-known com-
binatorial problems including maximum weighted match-
ing, max coverage, and facility location, to name a few. It
has also found numerous applications in machine learning
and artificial intelligence such as influence maximization
(Kempe, Kleinberg, and Tardos 2003), information gath-
ering (Krause and Guestrin 2011), document summariza-
tion (Lin and Bilmes 2011) and active learning (Guillory
and Bilmes 2011; Golovin and Krause 2011). In most of
these applications one needs to handle increasingly larger
quantities of data. For this purpose, accelerated/lazy vari-
ants (Minoux 1978a; Leskovec et al. 2007) of the celebrated
greedy algorithm of Nemhauser, Wolsey, and Fisher (1978)
have been extensively used.
Scaling Up: To solve the optimization problem (1) at
scale, there have been very recent efforts to either make
use of parallel computing methods or treat data in a
streaming fashion. In particular, Chierichetti, Kumar, and
Tomkins (2010) and Blelloch, Peng, and Tangwongsan
(2011) addressed a particular instance of submodular
functions, namely, maximum coverage and provided a
distributed method with a constant factor approximation
to the centralized algorithm. More generally, Kumar et al.
(2013) provided a constant approximation guarantee for
general submodular functions with bounded marginal gains.
Contemporarily, Mirzasoleiman et al. (2013) developed a
two-stage distributed algorithm that guarantees solutions
close to the optimum if the dataset is massive and the
submodular function is smooth.
Similarly, Gomes and Krause (2010) presented a heuristic
streaming algorithm for submodular function maximization
and showed that under strong assumptions about the way
the data stream is generated their method is effective. Very
recently, Badanidiyuru et al. (2014) provided the first one-
pass streaming algorithm with a constant factor approxima-
tion guarantee for general submodular functions without any
assumption about the data stream.
Even though the goal of this paper is quite different
and complementary in nature to the aforementioned work,
STOCHASTIC-GREEDY can be easily integrated into ex-
isting distributed methods. For instance, STOCHASTIC-
GREEDY can replace LAZY-GREEDY for solving each sub-
problem in the approach of Mirzasoleiman et al. (2013).
More generally, any distributed algorithm that uses LAZY-
GREEDY as a sub-routine, can directly benefit from our
method and provide even more efficient large-scale algorith-
mic frameworks.
Our Approach: In this paper, we develop the first central-
ized algorithm whose cost (i.e., number of function evalua-
tions) is independent of the cardinality constraint, which in
turn directly addresses the shortcoming of LAZY-GREEDY.
Perhaps the closest, in spirit, to our efforts are approaches
by Wei, Iyer, and Bilmes (2014) and Badanidiyuru and
Vondr
´
ak (2014). Concretely, Wei, Iyer, and Bilmes (2014)
proposed a multistage algorithm, MULTI-GREEDY, that
tries to decrease the running time of LAZY-GREEDY by
approximating the underlying submodular function with a
set of (sub)modular functions that can be potentially eval-
uated less expensively. This approach is effective only
for those submodular functions that can be easily decom-
posed and approximated. Note again that STOCHASTIC-
GREEDY can be used for solving the sub-problems in each
stage of MULTI-GREEDY to develop a faster multistage
method. Badanidiyuru and Vondr
´
ak (2014) proposed a dif-
ferent centralized algorithm that achieves a (1 − 1/e −
) approximation guarantee for general submodular func-
tions using O(n/ log(n/)) function evaluations. However,