The long latencies introduced by remote accesses in a large multiprocessor can be hidden by caching. Caching also decreases the network load. We introduce a new class of architectures called Cache Only Memory Archi-tectures (COMA). These architectures provide the programming paradigm of the shared-memory architectures, but have no physically shared memory; instead, the caches attached to the processors containallthe memory in the system, and their size is therefore large. A datum is allowed to be in any or many of the caches, and will automatically be moved to where it is needed by a cache-coherence protocol,which also ensures that the last copy of a datum is never lost. The location of a datum in the machine is completely decoupled from its address. We also introduce one example of COMA: the Data Diffusion Machine (DDM), and its simulated performance for large applications. The DDM is based on a hierarchical network structure, with processor/memory pairs at its tips. Remote accesses generally cause only a limited amount of traffic over a limited part of the machine.