Product Details Supplier Info More products

Tensilica has introduced the Xtensa LX4 dataplan processor (DPU) for systems-on chip (SOC), which is designed to benefit any application that requires extensive data processing.

The Xtensa LX4 DPU supports wider local data memory bandwidth of up to 1,024 bits per cycle, wider VLIW (very long instruction word) instructions up to 128 bits for increased parallel processing, and a cache memory pre-fetch option that boosts overall performance for systems with long off-chip memory latency.

Steve Roddy, vice-president of marketing and business development at Tensilica, said: ‘The strength of Tensilica’s DPUs is the ability to combine control and digital signal processing functions in cores that can be optimised to provide 10x to 100x performance improvement compared with a standard RISC or DSP core.

‘With Xtensa LX4, Tensilica offers intellectual property (IP) cores that range from an ultra-small programmable DPU as exemplified by a 1GigaMAC-per-second DSP in 0.01mm2 (in 28nm process technology) up to the Connx BBE 64-128, a high-performance licensable DSP IP core with more than 100 GigaMAC-per-second performance,’ he added.

Tensilica’s Xtensa LX4 DPU has four times the local data memory bandwidth of the Xtensa LX3 DPU, with up to two 512-bit load/store operations per cycle.

Designers can now create super-wide SIMD (single instruction multiple data) DSPs that pump more data into more MAC (multiply accumulate) units each clock cycle for extremely fast performance.

This makes Xtensa LX4 DPUs suitable for wired and wireless baseband processing, video pre- and post-processing, image signal processing, and various network packet processing functions.

This enhanced local memory bandwidth is in addition to Tensilica’s existing customisable local port and queue interfaces that provide unlimited point-to-point data and control signal bandwidth.

Tensilica now offers both the Port/Queue interfaces that allow connections between Xtensa DPUs and other system blocks and the ultra-high-bandwidth local memory connections.

With Xtensa LX4, Tensilica doubles the allowable width of its Flexible Length Instruction Extensions (FLIX) instructions from 64- to 128-bits wide.

This allows the execution of twice the number of independent operations per clock cycle.

Every wide FLIX instruction is seamlessly intermixed with the shorter base Xtensa instruction set so there is no mode switch penalty when using FLIX.

With FLIX, the Xtensa LX4 DPU can deliver the ultra-high-performance characteristics of a specialty VLIW processor with smaller code size than competing VLIW DSPs.

Tensilica’s Xtensa C/C++ compiler automatically extracts parallelism from source code and bundles multiple operations into single FLIX instructions.

An Xtensa LX4 DPU with wide FLIX instructions running parallel operations at low clock frequency can often deliver performance matching that of larger, higher MHz non-VLIW cores, but consumes less energy completing the same task.

The data pre-fetch option reduces cycle counts in long-latency designs by fetching data from system memory ahead of its use.

This way, the data is ready and waiting when the application code needs it, reducing wasted cycles when the DPU would have to wait for data.

Tensilica said the benefits are seen most when streaming data from contiguous memory locations.

It is said to be a much simpler alternative for memory access optimisation than adding a separate DMA (Direct Memory Access) engine, which requires additional software programming and application code tuning.

Tensilica provides tools that automate the creation of DPU hardware and the creation of the matching software development tool set.

Because the underlying base Xtensa instruction set is never changed, designers can access the company’s third-party applications software and development tools even after heavily customising the Xtensa DPU.

Customisable Xtensa DPUs are compatible with major operating systems, debug probes and in-circuit emulator (ICE) solutions, and come with an automatically generated, complete software development toolchain including an advanced integrated development environment based on the Eclipse framework, a compiler, a cycle-accurate SystemC-compatible instruction set simulator, and the industry standard GNU toolchain.

Tensilica has also introduced the Vectorization Assistant tool, which suggests ways developers can improve compiler vectorisation of their C-code when running on SIMD (single instruction multiple data) DSPs.

The Vectorization Assistant explains what is preventing further vectorisation so the software developer can improve the source C-code to take advantage of the DPU’s parallel execution units.

The base Xtensa LX4 DPU can reach speeds of more than 1GHz in 45nm process technology (45GS) with an area of 0.044mm2.

View full profile