A Review Of 2D-DCT Computation Types
In today’s scenario, there is huge involvement of transformations. These scenarios include digital signal processing and image processing. It is very essential in all sectors. Transformation techniques are useful as it makes analysis easier, reliable and relevant. The response to any particular input could be computed in the time domain, but it takes much more computations and not be as intuitive therefore this signal should convert to frequency domain. There are various transformations tools used namely FFT, DFT, DCT, DWT etc. DCT express a finite sequence of data points in terms of sum of functions oscillating at different frequencies. The 2-D DCT is a direct extension of the 1-D DCT. 2D-DCT algorithms are computation intensive and involve large number of multiplication and addition operations and due to this chip area gets increased and performance get degrade, hence it is required to make 2D DCT computations efficient. Keywords —DSP, DCT, processorsI.
INTRODUCTION
Customers have quickly become accustomed to the high standard of today’s information technology, and their demands continue to grow. The increase in use of computers increases the use of digital signal processing (DSP). In DSP, three domains are used for signal representation. These domains are namely time domain/spatial domain, frequency domain, and wavelet domain. Signal can be represented in any one of the domain which represents the important characteristics information of the signal, but if we required extra details of the signal then this signal represent in only time domain is not sufficient hence signal has to represent in frequency domain. Frequency domain also called spectral analysis in which divide the spectral components of signal to give small and meaningful form of signal representation. There are many frequency domain transformations like FFT, DFT, DCT, and DWT. But only DCT having strong ‘energy compaction’ property DCT is frequently used in signal and image processing.
The implementation of a 2D-DCT fast re-configurability, provide the possibility of swapping in and out designs in the time domain, so that a designer can meet requirements, with a minimum amount of resource. In conventional DCT, an 8-point 1D- DCT requires 64 multiplications and 56 additions and 8-point 2D-DCT requires 1024 multiplications and 896 additions, due to these huge number of computations there is increase in number of length of the DCT, the number of multiplication and addition operations also increase leading to larger chip area and performance degradation. When a large number of mathematical computations are required, then primary feature of the 2-D DCT computation is to compute the DCT coefficients. One of the main objectives is to minimize the complexity of operations as much as possible with maintaining low delays and high speed throughput. Nowadays, more and more embedded systems are using hardware to control and process data by making use of parallelism and flexibility concepts. The necessity for powerful computation is increasing rapidly. When it comes to advancement in the processing power of a computer, the first thing to be taken into account is the processor’s operational frequency. Therefore, it is essential to design such a system with parallelizing tasks or algorithms to boost performance and simultaneously reduce resource footprints using multi-core processing.
LITERATURE SURVEY
Mariem Makni [1] this paper proposes the Comparison and Performance Evaluation of FPGA Soft-cores for Embedded Multi-core Systems. This presents a great challenge for designers to select the most efficient and the suitable soft-core for a specific software application. They computed performance of existing soft-cores which presents a great challenge for designers to select the most efficient and the suitable soft-core for a specific software application. It presents an overview of soft-core processors that are used in embedded systems. This compare different open-source and commercial soft-cores such as open Fire, LEON3, Micro blaze, etc, based on major architectural features. We also evaluate the impact of the selected soft-core processors on the total execution time and the FPGA area consumption using various applications. Second system designed by Ravi Jani1, Kunjal Mehta [2] Fast Fourier Transform implementation using Microblaze and uclinux, states that in certain multimedia and signal processing application the FPGA's computational capacity proves to be inadequate. To overcome this limitation the designers have come up with number of approaches. Two of the approaches have been implemented.
One approach is hardware based approach and the other is software based approach. The hardware based approach refers to resorting to multiprocessor architecture to enhance or multiply the performance of System on Chip. Another approach which is software based is useful in case hardware capacity of the FPGA is limited. F. Cariccia P. Cariccia [3] Multimedia SoC: a Systolic Core for Embedded DCT Evaluation this paper states that, reconfigurable systolic 2D-DCT architecture is proposed and physical implementation results on XILINX Virtex-E FPGA are given. As far as performance are concerned this core is able to process 128 XGA (1024 × 768) frames/s running at 110 MHz it is forecastable that systolic DCT architectures seem to be well suited for FPGA implementation, especially when high throughput is seeked. The system proposed by D. W. Trainor, J. P. Heron and R. F. Woods [4] presented a novel FPGA i. e. Xilinx XC6200 series implementation of a 2D (8x8) point DCT. It has shown the development of a suitable architectural style can produce high quality circuit designs for a specific technology. To produce DCT implementation on a single chip FPGA, they used distributed arithmetic and exploitation of parallelism and pipelining. FPGA operated at 25 frames per second with VGA resolution.
This technology was suitable for processing image data at 25 frames per second. M. Thiruveni Raguraman, D. Shanthi Saravanan [5] this paper implement FPGA Implementation of Approximate 2D Discrete Cosine Transforms aims to contribute to the efforts of design discrete cosine transform (DCT) is frequently. It is enough to produce approximate outputs rather than absolute outputs which in turn reduce the circuit complexity. Numbers of applications like image and video processing need higher dimensional DCT algorithms. Approximate 2D multiplier-free DCT architectures are coded in Verilog, simulated in Modelsim to evaluate the correctness, synthesized to evaluate the performance and implemented in virtex E Field Programmable Gate Array (FPGA) kit. A comparative analysis of approximate 2D DCT architectures is carried out in terms of speed and area. Abdessalem Ben Abdelali [6] Efficient BinDCT hardware architecture exploration and implementation on FPGA. This paper describes a large design exploration of this module is performed. At first, a detailed study of the BinDCT, which is decomposed in a multi-stage architecture, is carried out. Several architectures of the whole 2D-BinDCT are developed by exploring different BinDCT stage hardware implementation solutions. These architectures are obtained by combining the different implementation solutions of the BinDCT stages. The timing of the explored solutions is determined by taking into account the stages pipeline and the coefficient calculation order. This latter is fixed in the manner of ensuring the best latency while avoiding data dependency violation. S. E. Tsai, and S. M. Yang [7] A Fast DCT Algorithm for Watermarking,. This paper proposes fast discrete cosine transform (FDCT) algorithm that used theenergycompactnessandmatrixsparsenesspropertiesinfrequencydomaintoachievehighercomputationperformance. For a JPEG image of 8×8 block size in spatial domain, the algorithm decomposes the 2D DCT into one pair of 1D DCTs with transform computation in only 24 multiplications. The 2D spatial data is a linear combination of the base image obtained by the outer product of the column and row vectors of cosine function the inverse DCT is as efficient. Implementation of the FDCT algorithm shows that embedding a watermarking image of 32×32 block pixel size in a 256×256 digital image can be completed. Ankita Selokar, A. C. Kailuke [8] FPGA Implementation of Forward 2D-DCT and Inverse 2D-DCT Based On Row-Column Decomposition Method. This paper represents the FPGA implementation of 2D forward DCT and inverse DCT. 2D-DCT is computed by combining two 1D-DCT that connected by a transpose buffer. Firstly implemented the forward 1D-DCT row wise that requires addition, subtraction, registers and multipliers, and then column wise. For inverse 1D-DCT we implemented 1D-DCT column wise and then row wise. It possesses features and thus well suited for VLSI implementation. It can be used for the computation of either the forward or the inverse 2D DCT. Then synthesized onto a Xilinx14. 2 ISE device Shahrukh Agha and Farman ullahJan [9] DCT and Motion Estimation. This paper describes a shared memory multiprocessing system and multistage interconnection based multiprocessor systems especially for FFT algorithms. A theoretical analysis for the speeding up in the speed of the FFT algorithm (1D and 2D) is presented. In this work implementation of multiprocessing system is presented.
Multiprocessing approaches also appears beneficial in terms of power as compared to single processor at the same throughput at a cost of more area e. g. running multiple processors at low frequency is power efficient, in case of 2D DFT and 2D DCT, as compared to single processor at higher frequency. It has also shown that different ways of parallelization’s or mapping. S. Varkeessheeba [10] Performance Evaluation of Various Discrete Cosine Transforms this paper proposes, many DCT algorithms were proposed in order to achieve high speed DCT and low power consumption. CORDIC algorithm can be widely used in Software Defined Radio, wireless communications and medical imaging applications. The algorithm is very much hardware efficient because it performs combination of shift-add operations and omits the dependence on multipliers. This article discusses the CORDIC algorithm and various DCT performances in the Digital Signal Processing and compares the performance of various discreet cosine transforms in terms of power consumption and accuracy. Transformations Power consumption DCT 1. 058 W 1D DCT 13. 1 W 2D DCT 2. 488 DA based DCT 5. 78 CORDIC 0. 184Table1: Power analysis of various transformations
CONCLUSION
After reviewing from the above mentioned sources, it seems that many researchers have implemented 2D- DCT for various applications using single core system on chip. In today’s world, transformation techniques are very useful it makes understanding the problem easier in one domain than in another and making analysis reliable. But 2D-DCT demand huge computations which require large memory space and high processing time which is not feasible to overcome these problems. Using multi-core processors, such systems can be integrated on a single FPGA chip, assuming that the soft-core processor provides adequate performance.