This dissertation presents the design, implementation, and evaluation of a system that exploits fine-grain implicit parallelism in concurrent constraint programming language. The system is able to outperform a C implementation of an algorithm with complex dependencies without any user annotations. The concurrent constraint programming language AKL is used as a source programming language. A program is divided during runtime into tasks that are distributed over available processors. The system is unique in that it handles both and-parallel execution of goals as well as or-parallel execution of encapsulated search. A parallel binding scheme for a hierarchical constraint store is presented. The binding scheme allows encapsulated search to be performed in parallel. The design is justified with empirical data from the implementation. The scheme is the most efficient parallel scheme yet presented for deep concurrent constraint systems. The system was implemented on a high-performance shared-memory multiprocessor. Extensive measurements were done on the system using both smaller benchmarks as well as real-life programs. The evaluation includes detailed instruction-level simulation, including cache-performance, to explain the behavior of the system.