How should we build applications for large-scale networked embedded systems -- now in the incarnation of the Internet of Things -- when we do not want to rely on the existence of a persistent connection to a remote data center? We present the design and implementation of a system, which we call Djenne, that can aggregate the computational power of the distributed devices because the increased capacity of these devices does allow for substantial work closer to the devices. The challenge that we overcome with Djenne is dependability: how can we cope with failures and the dynamics of wireless network links in such systems? Our design uses the actor model of computation and relies on replicated services to improve reliability and to create opportunities for parallelism that increase task throughput. The key innovations in our work are the use of adaptive mechanisms for rerouting data when system conditions change significantly as well as a holistic recovery approach when computations need to be repeated in the distributed system. Via experimental evaluation, we find that Djenne can improve throughput by 30% to 190% for different use cases while ensuring resilience in the face of intermittent failures.
Franco FummiDavide QuagliaFrancesco Stefanni
Tao TaoIsaac AmundsonKenneth D. Frampton