Tag Archives: interns

Ping-pong, Snacks, and Segways: On Startup Culture

Knewton CEO Jose and COO David square off.

I have to admit, I’ve always been a little skeptical of the idea of the “fun-loving tech startup” — companies full of geeks lounging on bean bag chairs and playing ping-pong. The idea that a pampering environment could stimulate creativity and lead to wild success in the great internet frontier always seemed a bit trite. Weren’t these really just a bunch of geeks who got lucky, now attributing their success to some nebulous inspiration found between the primary-colored walls of their offices?

While there is certainly a component of geeky-fashionable extravagance to this trend (do Google employees really need to ride around on segways?), since interning at Knewton, I’ve come to appreciate the value of a business making its employees as comfortable as possible at work. While I didn’t always go for the bean-bag chairs, I can say for sure that my creativity and problem-solving abilities benefited from the fact that I was free to grab a snack and relax on the deck any time I needed to clear my head.

Of course it wasn’t just the availability of snacks that made a difference for me.  It was the company culture — I knew that it was perfectly acceptable for me to take a break at any time and do whatever would make me most comfortable and productive, even if that meant taking an afternoon off.  It made me feel like I was there on my own terms.

I think that this — the feeling of being there on your own terms — is an essential part of the term “startup” as it’s currently used, and also what is best about these companies. The beauty of this business model is that it recognizes that a company is ultimately a collection of people, and that its success should be measured by the value it brings to the people who compose it and the people it serves. While compensation is obviously important, truly successful companies bring other forms of value as well — personal well-being, creative freedom, and personal investment in a worthy cause to name a few.

And if for the people who work at Google that means segways, more power to them.

What Our Interns Did This Summer

This summer we were lucky to have a rockstar set of interns on our tech team. Here’s just a quick look at some of what they accomplished this summer:

Andersen Chen worked on models for estimating the way student proficiencies change over time. Most existing methods are focused on point estimates resulting from tests and surveys, so they don’t concern themselves with how students improve over time — but that’s exactly what we’re focused on!  Andersen worked with our data science team, using both simulated data and anonymized data from our courses, to consider student proficiency as a process changing over time.

Anderson recaps his work in a final presentation to the company.

Zack Newman experimented with Finagle, an open-source remote procedure call (RPC) library written by Twitter. He built a simple HTTP/JSON-to-Thrift proxy server using Finagle. He patched our fork of Apache’s Kafka to add hooks so that our systems could respond appropriately to its rebalances. In addition, he worked with the data science team on modeling student proficiency for concepts on which they might not have been explicitly tested. Zack also had some great slides in his final presentation:

“You may think I’ve spent the whole summer eating,” Zack said in his final presentation.
Actually, we were wondering if you spent it Google image searching.
OK, that looks like a pretty complicated slide. Maybe he did do something after all.

Matthew Suozzo worked on adding configuration management extensions and unit tests to the User and Course Admin services, but his main task was an update to Apache’s thrift project. Matthew also added support for the JSON protocol and Union structures to the Python thrift compiler.

Matt during his final presentation.

Dylan Sherry added system health statistics collectors to the Knewton Recommender System (KRS) using Brightcove’s “diamond” library, and supported instrumentation of KRS via Etsy’s “statsd” library. Dylan also aided the support of new features for Knewton Crab Stacker and considered preliminary steps towards a full-stack dev environment management system, dubbed the Knewton Development Environment. Dylan’s final contribution was to enable diamond to gather information on KRS caches.

Dylan during his final presentation.

Joy Zheng built a web-based data dashboard to quickly aggregate data in a large-scale, distributed, and real-time system for hundreds of thousands of students across the country using Knewtonized courses. The dashboard had to be responsive, smartly display large amounts of data, be searchable and not impact current production services. We solved these problems by using smart optimizations such as threading, lazy caching and lazy display, as well as some efficient in-memory manipulations. From an implementation perspective, the backend for this dashboard interacts with several services deployed in the cloud environment and entailed using Java, JSP’s, Thrift and connecting to services using Cassandra. The front end was built using HTML, CSS and Javascript, and is currently successfully deployed in Production.

Joy shows the Pearson data dashboard during her final presentation.

Tarik Tosun worked on models of student pacing and progress in an adaptive learning environment.  There are several challenges  in modeling this behavior of interest to Knewton, and Tarik, working with the data science team, was able to develop some very intriguing solutions.

Tarik during his final presentation.

If you’re interested in solving interesting world problems, working with exciting data sets, building your professional skills, and learning from the best in the industry, check out our jobs page. We’re always looking for software engineers, data scientists, and other smart people to join our team!

Screen shot 2012-08-30 at 3.56.11 PM

New Post on N Choose K: Web-based, Real-time Data Monitoring

Check out the latest post on the Knewton tech blog, written by one of our star summer interns, Harvard student Joy Zheng.

Here’s a sneak peek:

THE PROBLEM

How do you quickly visualize and validate data in a large-scale, distributed, and real-time system? How do you make it deployable in multiple cloud environments? And how do you track changes to this data over time?

These are questions that I set out to answer in my internship this summer, as I worked to develop a web-based interface that displays how much data the Knewton system is receiving from hundreds of thousands of Pearson students across the country.

At a very high level, this is how things work: Hundreds of thousands of college kids across the country use a recommended textbook published by Pearson. These books have an online component with which the students interact, learn, take tests and quizzes, etc. Information regarding all this activity flows through to us at Knewton. We process it to make recommendations to each student about how to progress through the course, so that the student can achieve the milestones set by the professor.

Read the rest of the post here.

interns 06.19.2012 small

Introducing the Interns

Every summer, the seams of our office stretch a bit to accommodate a few good interns. This year is no exception. We’ve got a crop of impressive college students, recent graduates, and grad students here to prove — once again — that age is no measure of errand-running talent.

Just kidding. The only coffee these interns will be fetching is their own — and they’ll need it. All of them will be digging into some pretty complex projects while they’re here.

Here’s a bit of background on each of our seven new team members.

Andersen Chen (bottom center) is a rising junior at Brown University, majoring in Mathematics and Computer Science. Andersen is from Brookline, MA, and this is his first time living in NYC for an extended period of time. He’s looking forward to using his math skills to solve problems in adaptive learning — and eating plenty of Chobani yogurt from the Knewton kitchen. You can find Andersen on the web at andersenchen.com.

Matthew Suozzo (top right) grew up on the Upper West Side of NYC and just just completed his freshman year in his native ‘hood, at Columbia University’s School of Engineering. Matthew is a Computer Science major; he has big plans this summer to build a new desktop (computer, that is, not surface). Matthew is also on the Ultimate Frisbee team at Columbia and although it’s true that he’s been a great addition to our weekly games so far, we promise that’s not the (only) reason we hired him.

Zack Newman (bottom left) is also studying Computer Science at Columbia. He hails from Sterling, VA, right near D.C., and is glad to be in New York this summer where he can obtain food after the late hour of 8 p.m. Zack enjoys reading, going to concerts, and sleeping. As for secret talents, he can touch his nose with his bottom lip — although he warns that you’ll probably regret asking him to demonstrate that one.

Kevin Mullin (bottom right) just completed the first year of his two-year Masters program in Computer Science at NYU. Originally from Chicago, Kevin is looking forward to a (brief) break from studying this summer. Though Kevin says his favorite kitchen snack is water, we’re hopeful he’ll be tempted to expand his horizons by the end of the summer. (Yogurt-covered almonds? Popcorn? Chobani? The possibilities are endless.)

Joy Zheng (top left) is a rising sophomore at Harvard University, where she’s trying to decide whether to major in computer science, math, or combination of the two. Joy, who is from Seattle, is particularly excited about NYC’s food scene — she’s a self-described “huge foodie,” not to mention a turducken expert (she’s made one for the past two Thanksgivings). At Harvard, Joy is one of the head problem writers for the 2013 Harvard-MIT Math Tournament and also works on operations for the Harvard International Review.

Dylan Sherry (top center) is a fresh graduate of MIT, where he studied electrical engineering and computer science. Dylan was born in San Diego, CA and grew up on the other side of the country, in Portland, ME. He plays jazz saxophone, and is looking forward to checking out all the great music around the city. Dylan also loves trying new recipes (these almond and raspberry shortbread thumbprint cookies are a favorite), mountain biking, kayaking, hiking, camping, and most recently, knitting. Plus — Dylan once dissected a human arm. And that’s all we’ll say about that.

Tarik Tosun (unpictured) is originally from Wilmington, Delaware. He just graduated from Princeton University with a degree in Mechanical and Aerospace Engineering. Tarik is a guitarist, pianist, and vocalist, and like Dylan, he’s excited to explore the New York music scene this summer. Tarik is also really into Robotics; he’ll be a Ph.D. student in Robotics/Mechanical Engineering at the University of Pennsylvania in the fall. You can find him online at tariktosun.com.

Data Visualization: The Tweeting World

When I first arrived for my internship at Knewton, I had a bit of free time to get settled in and decided I wanted to do something fun (but still related to data science). So I made a map of the world using Twitter tweets.

I began by writing a script in Python to extract geotagged tweets using Twitter’s API. I ignored the text contained in the tweets, and only stored the longitude and latitude of the tweets as they streamed in. Within an hour or so, I had collected around 150,000 locations.

Plotting the tweets on a scatter plot in MATLAB yielded a nice visualization. We can already see rough outlines of populated areas:

This prompted me to investigate further. It was clear that these points represented a rough map of the world, but there were a few things that needed to be improved:

    1. Above a certain threshold, it was hard to see the relative density of tweets. For example, it was impossible to differentiate between cities and rural areas in the eastern half of the US because it was completely blue.
    2. Multiple tweets coming from the same location, or being quantized to the same location, only showed up as one point.
    3. The map, by itself, didn’t make any probabilistic inferences. We’d mapped out our data, but we hadn’t derived anything from it.

I wanted to approximate the probability density distribution of tweets, or the relative likelihood of a tweet to take on a certain location. To do that, I used a technique called kernel density estimation. By smoothing out our data points, kernel density estimation infers a probability distribution from the given data.

Kernel density estimation got me this:

Areas of high tweet density are red, and areas of even higher tweet density are yellow.

This visualization turned out pretty much as I expected. Places that are generally uninhabited, such as the Amazon Rainforest, don’t have as many tweets. The Western world, especially the east coast of the US, the UK, and the Netherlands, have the highest density of tweets. Note that the UK and the Netherlands have the highest tweet density of the European nations, probably because a majority of their citizens speak English.

China is noticeably absent from this map, which makes sense; Twitter is blocked by the Chinese government.

There are a few interesting things to note:

      1. Southeast Asia was surprisingly active in terms of tweets.
      2. Australia was surprisingly dormant. Perhaps it was because I sampled the tweets at around 11 AM EST, which meant that it was pretty late at night there. But that didn’t stop Southeast Asia and Japan from tweeting.
      3. Notice that the western half of the US (not including the West Coast) had a sparser tweet density than the eastern half of the US. There seems to be a dividing line at around -100 longitude.

This exercise shows why it is an awesome time to be a data scientist. So much data is readily available online for us to analyze, and we now have the tools to analyze it efficiently. In just a few hours, I was able to go from an idea to a visualization. This summer, I look forward to using these same tools to rapidly iterate through mathematical models to make inferences from student data.

How to Build a Robot with Python Scripting on Android

This post was written by Zlata Barshteyn, a former intern here at Knewton. 

Every few months at Knewton, we have something called Hack Day, where team members take a day off from regular work tasks to tackle projects they’ve been itching to develop. These “hacks” typically fall into one of four categories: product, business, culture, or performance.

A Robot for the Office

For my Hack Day project, I worked with my mentor on building a LEGO Mindstorms NXT robot that could serve as a minion around the office. We programmed it to drive around and avoid colliding with obstacles, with help from an infrared sensor. Mounting an Android phone on the robot and having the robot communicate with the phone expanded its capabilities, as the phone had a camera and access to the Web, which the robot did not.

Equipped with a camera and Internet access, the robot was pretty powerful. It could provide virtual tours of the Knewton office, driving around and broadcasting its camera feed to a web page that could be accessed remotely. Viewers could even drive the robot themselves through built-in controls on the page, making the tours extremely flexible and open-ended. With text-to-speech capabilities, the robot could say each of the Knewton team’s SVN commit messages as they came in, and say a phrase typed in by a viewer on the web page. With enough time and Lego bricks, the robot could even be trained to fetch snacks from the kitchen and deliver them personally.

To get the robot to communicate with the phone, we had two options: write something from scratch to send and receive data over Bluetooth, or use the Cellbots app or Python library. The Cellbots team provides software and code libraries to facilitate communication between Android phones and various types of robots, such as the NXT. Using the app would provide many features and controls from the get-go, but we wanted to write and customize code ourselves, so we went with the Python library.

Running Python on Android

Now, the question is, how would Python code run on an Android phone? After all, Android apps are written in Java. That’s where Scripting Layer for Android, a handy Android app, comes in. Once you install it on your phone, you can run files written in a multitude of scripting languages (such as Python) by putting them into the app’s directory on the phone, then opening the app and selecting them. It even comes with “Facades”, which are basically APIs that give you simplified control over things such as Bluetooth and the camera in the scripting language of your choice.

This was our reasoning: If we could run Python scripts on the phone, then we could have the phone host a web server, fetch a video feed using the SL4A Webcam Facade, and display it on the server. Then, whoever wished could navigate to the server’s address and watch a live feed of what the robot “saw” through the Android phone’s camera. Controls could then be placed on the page, allowing someone to turn off the robot’s autonomous navigation and drive it around the office themselves, or even have it speak a given phrase, using the SL4A Text-to-Speech Facade.

Setting Up a Web Server on Your Phone

The first step is to find the IP address of the phone, in order to access the web server from another computer. Check out this blog post (scroll down to “Gleaning your IP address”). Do you want a web page that only lists the contents of some directory on your phone or SD card? If so, you can just use Python’s built-in SimpleHTTPServer, which is the server demonstrated in the previously mentioned blog post. As you can see, it only requires 4 lines of code:

import SimpleHTTPServer
from os import chdir
chdir(‘/sdcard/’)
SimpleHTTPServer.test()

To run the server and view the page, do the following:

1. Save this code in the SL4A directory
2. Open the SL4A app on your phone
3. Run the code you just wrote
4. Open up a web browser on your computer
5. Navigate to the IP address you saved earlier, port 8000
(so if the IP address was 192.168.0.2, you’d go to http://192.168.0.2:8000)

You will then see a listing of the contents of /sdcard/. As you can see, this is a fairly limited web server, and is only useful for, say, browsing your photos. It definitely wouldn’t be enough for what we wanted to do with the robot (embed a video feed and controls). For that, we’d need a more sophisticated web server.

Further Possibilities

Luckily, using SL4A means that anything you can do with Python on your computer, you can do on your phone. That means that you can even run CherryPy on Android with no hassles, or any other Python web server, really. Once that’s done, you can use the SL4A Facades in your server code, meaning you can use the Webcam Facade to stream a video feed from your phone’s camera, the Text-to-Speech Facade to have the robot say out loud a phrase the user typed in, or any of the other Facades in a variety of creative ways to have your very own office robot/minion.

The bottom line is, if you love developing in Python or any other scripting language, you don’t have to suddenly stop just because you’re working on the Android platform, thanks to Scripting Layer for Android.