Modern mobile networks are increasingly complex from a resource management perspective, with diverse combinations of software, infrastructure elements and services that need to be configured and tuned for correct and efficient operation. It is well accepted in the communications community that appropriately dimensioned, efficient and reliable configurations of systems like 5G or indeed its predecessor 4G is a massive technical challenge. One promising avenue is the application of machine learning methods to apply a data-driven and continuous learning approach to automated system performance tuning. We demonstrate the effectiveness of policy-gradient reinforcement learning as a way to learn and apply complex interleaving patterns of radio resource block usage in 4G and 5G, in order to automate the reduction of cell edge interference. We show that our method can increase overall spectral efficiency up to 25% and increase the overall system energy efficiency up to 50% in very challenging scenarios by learning how to do more with less system resources. We also introduce a flexible phased and continuous learning approach that can be used to train a bootstrap model in a simulated environment after which the model is transferred to a live system for continuous contextual learning.