RDD.
countApprox
Approximate version of count() that returns a potentially incomplete result within a timeout, even if not all tasks have finished.
Examples
>>> rdd = sc.parallelize(range(1000), 10) >>> rdd.countApprox(1000, 1.0) 1000