Block-matching and 3D-filtering (BM3D) is a state of the art denoising algorithm for image/video, which takes full advantages of the spatial correlation and the temporal correlation of the video. The algorithm performance comes at the price of more similar blocks finding and filtering which bring high computation and memory access. Area, memory bandwidth and computation are the major bottlenecks to design a feasible architecture because of large frame size and search range. In this paper, we introduce a novel structure to increase data reuse rate and reduce the internal static-random-access-memory (SRAM) memory. Our target is to design a phase alternating line (PAL) or real-time processing chip of BM3D. We propose an application specific integrated circuit (ASIC) architecture of BM3D for a 720×576 BT656 PAL format. The feature of the chip is with 100 MHz system frequency and a 166-MHz 32-bit double data rate (DDR). When noise is σ = 25, we successfully realize real-time denoising and achieve about 10 dB peak signal to noise ratio (PSNR) advance just by one iteration of the BM3D algorithm.