Open source Apache Kafka – which provides powerful distributed processing of continuous data streams – continues to gain traction within enterprises. Ben Bromhead, the CTO and a co-founder of open-source-as-a-service platform provider Instaclustr, gives his perspective on Kafka’s growing popularity, along with strategy considerations for getting the most out of the technology.
Q. Give us an overview of Apache Kafka and how it works.
A. Apache Kafka is a highly available, highly performant distributed streaming platform. It can be used to manage streams of data as a publish/subscribe messaging system, it can be used to process and perform arbitrary functions over streams of data, and it can also be used to safely store and replicate a stream of data for consumption later on. More broadly, Kafka is used to move data between systems/applications, or to transform – or react to – streams of data.
Under the hood, Kafka is implemented as a distributed log, with producers (clients that put data into Kafka) appending to the log, and consumers (clients that get data from Kafka) reading from the log based off a log offset. From this simple distributed, high performance architecture, many use cases and patterns can be implemented using Kafka.
Q. What are some of typical enterprise use cases around Kafka?
- Message broker – Passing messages between loosely-coupled services.
- Activity feeds – Real-time feeds of user activity; this can be clicks, follows, page views, etc.
- Metrics – Similar to an activity feed, operational time series data is often fed and aggregated via Kafka.
- Logging – Kafka can be used as the basis for log aggregation.
- Event sourcing – Similar to an activity feed, state changes are logged as an ordered set of events into a Kafka stream.
- Stream processing – Arbitrary transformation over streams of data.
Q. What accounts for Kafka’s growing adoption among enterprises?
Kafka primarily is used to power real-time insights and events within enterprise applications at scale. This means getting the most recent events to systems in order to influence customer or internal decisions as quickly as possible.
Q. How does the fact that Kafka is open source influence that popularity?
Open Source is now the de facto standard. Enterprise customers can get up and running immediately to test and try new ideas without the fear of vendor lock-in or complex contract negotiation.
Q. What are some of the challenges for enterprises self-managing Kafka?
Kafka brings a raft of benefits to the enterprise, but it is a new, rapidly-evolving distributed system that can be challenging to run without prior expertise or significant investment in skills and tooling. Fundamentally though, companies know that they should focus on their own core business and adopt enabling technologies like Kafka in the lowest friction manner possible, which often involves outsourcing operations.
Q. What factors should enterprises consider when deciding whether to enlist a managed services provider versus deploying Kafka on their own?
Enterprise need to weigh up a few factors when deciding to run Kafka themselves or leverage a managed service.
- Cost – Managed services tend to be economically more cost effective when you take into account the advanced tooling, 24/7 support and economies of scale they provide. Remember that the cost of the managed service includes the staff and development costs you would need to cover…it’s never just the infrastructure costs.
- Opportunity costs – Running any technology yourself means making a decision between spending engineering resources on supporting technologies or focusing on your core business.
- Risk – Managing a service – especially a new technology – means you are exposed to a certain amount of technical and HR risk. Managing itself means you will go through the pain and outages as you discover the edge cases of any given technology. Once you have learnt those lessons, you need to retain the employees who have that knowledge. A capability within an organization can be lost rapidly when two or three key members leave.
- Capability – Generally, managed services will be able to deliver features and capability on top of their open source offerings that is cost-prohibitive to build internally.
Q. Why did Instaclustr select Kafka as the latest technology to add to its Open-Source-as-a-Service platform?
Instaclustr is building out an Open Source Data Platform as a service that is based on key technologies that have an amazing community, enterprise adoption, and fit a real use case. We determine this by looking at what technologies our existing customer base is already using alongside other Instaclustr services (e.g. Apache Cassandra) and see how we can make our customers’ lives easier. Kafka really adds an amazing set of capabilities to any Enterprise Architects’ toolbag, and it complements the existing highly available, highly scalable capabilities we already provide.
Q. How does Instaclustr distinguish itself as a managed Kafka provider?
We primarily distinguish ourselves from other providers via two main points:
- Fiercely Open Source – We don’t do anything proprietary with our managed services; we generally run unchanged upstream versions of the technologies (pending hotfixes etc). Changes, features, improvements, and fixes always get upstreamed back to the community. This means zero technology lock-in, no license fees, and absolute portability for our customers. This is also fundamentally critical to companies looking to adopt the cloud but maintain agility via cloud independence.
- Platform approach – Instaclustr provides a suite of technologies that are all proven to be highly available, highly scalable, and work well together. This makes Instaclustr the central platform to implement state changes and manage data within your real-time applications. We ensure our supported technologies work well together and integrate in a simple fashion.
About Ben Bromhead
Ben Bromhead is Chief Technology Officer and Co-Founder at Instaclustr, which delivers open source big data technology solutions at scale. Ben is located in Instaclustr’s California office and is active in various open source communities. Prior to Instaclustr, Ben had been working as an independent consultant developing NoSQL solutions for enterprises, and he ran a high-tech cryptographic and cyber security formal testing laboratory at BAE Systems and Stratsec.