The end to end delay is a critical factor in the perceived quality of service for Voice over IP applications. Sicsophone is a complete VoIP system that couples the low level features of audio hardware with a standard jitter buffer playout algorithm. Using the sound card directly eliminates intermediate buffering as well as providing fine control over timers needed by a soft real-time application such as VoIP. A statistical based approach for inserting packets into audio buffers is used in conjunction with a scheme for inhibiting unnecessary fluctuations in the system. We also present mouth-to-ear delay measurements for selected VoIP applications and show that several hundreds of milliseconds can be saved by using the techniques described in this paper. A prototype for both UNIX and Windows platforms has been implemented, demonstrating that our system adapts to network conditions whilst maintaining low delays.