Balanced graph partitioning is a well known NP-complete problem with a wide range of applications. These applications include many large-scale distributed problems such as the optimal storage of large sets of graph-structured data over several hosts, or identifying clusters in on-line social networks. In such very large-scale distributed scenarios, state-of-the-art algorithms are not directly applicable, because they typically involve frequent global operations over the entire graph. In this paper, we propose a distributed graph partitioning algorithm, called Ja-be-Ja1. The algorithm is massively parallel: each graph node is processed independently, and only the direct neighbors of the node, and a small subset of random nodes in the graph need to be known. Strict synchronization is not required. These features allow Ja-be-Ja to be easily adapted to any distributed graph-processing system from data centers to fully distributed networks. We perform a thorough experimental analysis, which shows that the minimal edge-cut value achieved by Ja-be-Ja is comparable to state-of-the-art centralized algorithms such as Metis. In particular, on large social networks Ja-be-Ja outperforms Metis.