Cassandra and Hadoop – Introducing the KassandraMRHelper
Here at Knewton we use Cassandra for storing a variety of data. Since we follow a service-oriented architecture, many of our internal services are backed by their own data store. Some of the types of...
View ArticleKankoku: A Distributed Framework For Implementing Statistical Models (Part 2)
The focus of my internship project this summer was to extend Kankoku (Knewton’s scientific computing framework) to operate in a more distributed fashion. There are a few reasons that drove this change...
View ArticleKankoku: A Distributed Framework for Implementing Statistical Models
As future-facing as Knewton’s adaptive learning platform may be, the concept of a personalized classroom has a surprisingly rich history. The idea has intrigued educators and philosophers for decades....
View ArticleHow Knewton Cutover the Core of its Infrastructure from Kafka 0.7 to Kafka 0.8
Kafka has been a key component of the Knewton architecture for several years now. Knewton has 17 Kafka topics consumed by 33 services. So when it came time to upgrade from Kafka 0.7 to Kafka 0.8 it was...
View ArticleRolling Out the Mesos Slave Roller
A few months ago, Knewton started running most services via Docker containers, deployed to an Apache Mesos cluster with a Marathon scheduler. This new infrastructure makes it easy to deploy and manage...
View ArticleDistributed Tracing: Design and Architecture
The previous blog post talked about why Knewton needed a distributed tracing system and the value it can add to a company. This section will go into more technical detail as to how we implemented our...
View ArticleDistributed Tracing: Observations in Production
Previous blog posts have explained Knewton’s motivation for implementing distributed tracing, and the architecture we put together for it. At Knewton, the major consumers of tracing are ~80 engineers...
View ArticleDigging Deep Into Cassandra Thrift Buffer Behavior
Everyone who works in tech has had to debug a problem. Hopefully it is as simple as looking into a log file, but many times it is not. Sometimes the problem goes away and sometimes it only looks like...
View ArticleSimplifying Cassandra Heap Size Allocation
As discussed previously, Knewton has a large Cassandra deployment to meet its data store needs. Despite best efforts to standardize configurations across the deployment, the systems are in a...
View ArticleAnalyzing Java “Garbage First Garbage Collection” (G1GC) Logs
Garbage Collection can take a big toll on any Java application, so it’s important to understand its behavior and impact. After a JVM upgrade of Knewton’s Cassandra database, we needed a tool to compare...
View Article