Qubole: Cloud-enabled Platform for Big Data Analytics

Ashish Thusoo

Co-Founder & CEO

Sensitive information is being stored in different types of environments-one being big data. The percentile is 50 percent using sensitive data within big data implementations. It has become imperative for data analysts to ensure the data quality before it is fed to the system. Biased or incorrectly sampled data may produce wrong conclusions that can affect the business decisions adversely. There is a need for authenticated data derived from business intelligence along with professional analysts to drive value for clients through streamlined Big Data solutions, Qubole, a Mountain View, CA based firm simplifies the management, provisioning, and scaling of big data analytics workloads by leveraging data stored on Amazon Web Services, Microsoft Azure infrastructure, or Google Compute. "The Big Data industry is shifting from on-premise to cloud as the preferred deployment model. We offer the first fully packaged migration service which enables companies to experience the benefits


in the cloud," says Ashish Thusoo, co-founder and CEO of Qubole.

The firm's big data as a service (BDaaS) platform, Qubole Data Services (QDS) is hosted on the Amazon Web Services (AWS) cloud platform that enables clients to migrate their on-premise Cloudera based big data implementations to the cloud. Clients can utilize QDS's unified interface for performing workloads ranging from ad hoc analysis, machine learning, to Amazon's Elastic MapReduce (EMR). With the help of QDS Workbench, data scientists and analysts can drive their ad-hoc workloads by using easy-to-use SQL query composer or SmartQuery builder tool. "Qubole customers have seen savings up to 80 percent for workloads using spot instances and have admins supporting hundreds of users."

The Data Engines of QDS are automated and optimized for cloud, in the process, enabling clients to evaluate innovative open-source tools and engines. By setting a central system policy, QDS is able to control the security and encryption practices that are present in the client's environment. The BDaaS platform optimizes the Data Engines-Hive, Spark, Pig, Presto, and MapReduce that facilitate clients to run minimal operational administration and a common metastore database.

Apart from the QDS platform, the firms open source SQL optimization project-Quark, helps
data analysts to simplify and optimize access to data. UsingQuark, clients can create and manage base tables for effective utilization. It also supports OLAP cubes on partial data and enables clients to choose the technology stack that is best suited for their architectural and cost constraints. "Our goal at Qubole is to get companies up and running on Big Data in days, with a minimal upfront investment so clients can concentrate on analysis and value-add actions instead of infrastructure and operations," says Thusoo.

One of their clients, Pinterest, leveraged the firm's extensive big data solution-a web and mobile application firm that manages a photo-sharing website, to provide a personalized search engine that can streamline data search and scale the data infrastructure. Initially Pinterest used Amazon's EMR to work on its Hadoop jobs, but as the firm scaled to a few hundred nodes EMR became unstable. Pinterest utilized the QDS platform to migrate its Hadoop jobs to Qubole which reduced the operational costs considerably.

Apart from the unified big data platform, Qubole focuses on its goal to embrace and lead by adopting the changing trends to enable clients acquire heighted business productivity and value. It also plans to expand their solutions and capabilities in the big data domain to help analyst streamline their business operations.