Several approaches to intelligent tutoring monitor the learners actions in order to understand his/her abilities. This requires that the learner's actions can be monitored. In traditional single-user command interfaces this is straightforward. As a result action tracking has not been subject to much study. However, in more advanced interfaces and applications, it is no longer self-evident how action tracking should be done, or even what should be considered an action. This paper addresses two particular aspects of systems of today that make action tracking hard: The increasing naturalness of interfaces, and the introduction of environments for human collaboration. The DIVE environment is extremely advanced in both these aspects, being a virtual reality environment for unconstrained human collaboration. We characterize the possibilities and limitations for action tracking in DIVE, and suggest an architecture for the task.