DataPelago: Redefining Data Processing With Accelerated Computing

Rajan Goyal, Co-Founder & CEO

Established data management and analytics platforms have long excelled at managing databases, running analytics, and creating queries to provide business insights. However, we have reached an inflection point where the current generation of hardware and software can’t support the relentless growth of the global datasphere. DataPelago transcends conventional boundaries with a ground breaking, accelerated, and domain-specific data processing engine capable of handling the sheer volume, velocity, diversity, and complexity of modern datasets.

DataPelago is the first Universal Data Processing Engine, empowering organizations to extract value from all data at scale, regardless of the structure of data, facilitating rapid decision-making and providing a competitive edge.

To build the data processing engine of the future, DataPelago Co-Founder and CEO Rajan Goyal assembled a multidisciplinary team with decades of experience across system architecture, software and cloud computing, data management, and data center infrastructure.

A pioneer in accelerated computing, Goyal began his journey with a master’s degree in computer science from Stanford. His career spanned over 25 years in Silicon Valley, giving him a deep understanding of domain-specific computing, which has played a crucial role in shaping DataPelago's vision.

His expertise was honed during his first decade at Cisco, where he enhanced data plane operations by using accelerated computing to build layer four to layer seven services. He later joined Cavium as a distinguished engineer, developing domain-specific accelerators for the company’s OCTEON family of multi-core processors, which are used in various applications like networking and storage data paths.

Reflecting on this period, Goyal remarks, “Accelerated computing is the art of hardware-software co-design. This was my first formal way of articulating and building products using domain-specific principles.”

His tenure as Fungible’s CTO further solidified his position in the field of championing accelerated computing for data movement in data centers. He assembled an engineering team that pioneered hardware-software co-design and was instrumental in developing the industry’s first Data Processing Unit (DPU)— now a well-established crucial third computing socket in data centers in addition to CPU and GPU. With over 150+ issued and pending patents and over 15 years dedicated to mastering domain-specific computing, Goyal’s expertise has helped solve complex technological problems in storage, big data, and multimedia processing. He successfully led multiple products from inception to multi-billion-dollar revenue phases.

Navigating the Data Evolution

The tech industry is facing three irreversible trends that will reshape data storage and the data processing industry.

The first is the change in the nature of data. A decade ago, data primarily existed in structured formats like ERP transactions, sales data, and similar table-based information. The conventional business intelligence (BI) analytics workflow entailed transferring data from online transaction processing (OLTP) systems to data warehouses to extrapolate trends and insights. The data environment has since changed significantly.
The explosion of unstructured data, including JSON files, logs, and multimedia files, presents new challenges Traditional processing solutions based on CPUs and basic software architectures cannot handle the complexity and volume of today's data. The tension between rising unstructured data volumes and the challenge of processing it quickly and cost-effectively is exacerbated by the second irreversible trend; the race for artificial intelligence (AI). The rise of generative AI (Gen AI) and large language models (LLM), which have an insatiable appetite for data that must be continually cleaned and processed, has made the need for advanced hardware to process unstructured data undeniable.

Finally, these systems are constrained by Moore’s Law. While it historically predicted the rapid doubling of computing power, it now marks the boundaries of performance gains in traditional contexts. The performance limit of general-purpose computing in high-performance data processing indicates the need for accelerated computing solutions.

DataPelago is responding to these changes by creating a new data processing standard for the accelerated computing era. The engine leverages heterogeneous computing environments in the cloud to process data queries in the most efficient way possible based on the data being processed and the hardware resources available. It provides universal support for any silicon architecture—CPU, GPU, and FPGA—and all data types.

Recognizing these irreversible shifts early, DataPelago is pioneering a new model for data processing that dramatically improves the cost and performance of classic BI analytics pipelines while enabling emerging Gen AI data pipelines. By harnessing the power of accelerated computing, the company is introducing a step-function improvement to handle data complexity, scale, and application demands, breaking the performance ceiling.

“We are providing a sustainable alternative that aligns with the needs of modern digital transformation. Our engine delivers exceptional performance while reducing the total cost of ownership,” says Goyal.

Acknowledging that next-generation data processing requires advancements across the entire processing stack, DataPelago focuses on every aspect—from query optimizers and runtime reconfiguration to distributed data layers. This holistic approach ensures that it addresses the evolving challenges and opportunities, ushering in a new era of innovation in the data landscape.

Next-Gen Data Processing for Growing Enterprise Demand

In his discussions with CIOs, Goyal often uncovers a shared objective across industries—the creation of a secondary data pipeline for Gen AI and LLMs. However, this strategic aspiration introduces dual challenges, the first being the substantial cost of continually cleaning and processing the vast amounts of data required to train an LLM.

“In scenarios where traditional setups demand ten servers, our technology achieves comparable outputs with just one,” explains Goyal. “This reduction in hardware not only cuts operational and licensing expenses but also boosts enterprises' ability to process queries and manage data, ultimately enhancing productivity and efficiency.”

The second hurdle concerns the integration of data, which has compelled enterprises to seek solutions that can unify existing and new data pipelines. Rather than implementing separate systems that create data silos, they seek a unified platform that can manage traditional BI analytics and emerging AI pipelines.
DataPelago addresses this need with a versatile stack that can support traditional BI analytics and cutting-edge AI-driven processes. A key feature is its streamlined data preparation capabilities for AI and machine learning (ML) models, which significantly reduce required time, cost and labor. This makes data scientists more productive, enabling them to train models at higher frequency with the latest data and improve the AI capabilities of their chatbots, recommendation systems, and other AI-based applications.

In addition, DataPelago’s engine is composable and zero friction. The platform’s modular architecture enables plug-and-play with open-source building blocks, so it can be used as a full solution or as a component of a broader Lakehouse ecosystem. DataPelago can be deployed into existing environments without any changes to data, workflows, tools, or processes.

In Scenarios Where Traditional Setups Demand Ten Servers, Our Technology Achieves Comparable Outputs With Just One

The company’s go-to-market strategy includes forging strategic alliances with major cloud service providers like AWS, Google Cloud and Azure, as well as emerging AI-centric platforms. This approach positions the firm as a sought-after data engine for AI factories, ensuring that enterprises have the necessary tools to efficiently manage and utilize their data for AI-driven applications.

As the complexity and volume of data increases, enterprises are compelled to seek the most cost-effective and high-performance data processing methods.

“We Are Providing A Sustainable Alternative That Aligns With The Needs Of Modern Digital Transformation. Our Engine Delivers Exceptional Performance While Reducing The Total Cost Of Ownership”

A Visionary Future in Accelerated Computing and Data Processing

Goyal is optimistic about DataPelago's future. His vision for the company, fueled by his passion and enthusiasm for domain-specific computing and data processing, propels it forward and brings tangible advancements to the industry.

Under Goyal's guidance, DataPelago is poised for strategic expansion. The company's roadmap includes expanding to new markets supported by significant investments in research and development (R&D) and talent acquisition. These initiatives are about more than growth. They are strategically devised to address practical, real-world operational bottlenecks by engaging with a diverse client base. This approach aims to fine-tune the engine’s functionality and increase its ability to meet real-world demands.

As they evolve, the engine will cater to a broader range of use cases, paving the way for accelerated computing to tackle the complexities of modern data environments. By making the solutions more accessible, self-sufficient and easy to adopt, DataPelago intends to expand its reach across various industries and markets.

Goyal’s deep expertise in accelerated computing, along with his team's profound knowledge of hardware-software co-design, positions DataPelago not merely as a participant in the data evolution but as a principal architect of its future. Its innovative approach to tackling the intricate challenges of modern data needs ensures that DataPelago remains an industry leader, ready to set new standards and drive significant advancements in the domain.