Big, Bigger, Biggest? Steven Skiena's Algorithms Help Answer the Questions.

Steven Skiena knows algorithms — in fact, he wrote a best-selling book on them. His Algorithm Design Manual is considered the primer for algorithms if you want to get a job at Google.

Fast forward to today: Skiena, a Distinguished Teaching professor in Stony Brook University’s Department of Computer Science, has designed algorithms to solve a wide spectrum of problems — from recoding genes for the development of vaccines to determining the significance of historical figures, the subject of his most recent book, Who’s Bigger: Where Historical Figures Really Rank, co-authored with Charles B. Ward, an engineer at Google.

To put historical figures in their numerical place, Skiena and Ward used quantitative analysis to evaluate the data culled from the more than 800,000 entries in Wikipedia. The user-generated encyclopedia proved to be the perfect source for such an exercise. The authors were interested in the concept of people as memes — ideas that propagate from mind to mind. Those “ideas” that continue to flourish and generate interest or are repeatedly referenced in other entries have a higher profile than those ideas that fade away. In the book, they cite the meme of Betsy Ross “as the woman who first sewed the American flag” as an example: “It does not really matter whether she actually sewed the flag (the evidence isn’t very strong here) but catching this meme is valuable as a cultural reference in American history and the evolution of gender roles.”

So, were there any surprises among the rankings? “The biggest ‘surprise’ was how generally reasonable our computational rankings are,” says Skiena. “Our rankings of historical memes correlated very well with various other published rankings, autograph prices and public polls.”

In Skiena and Ward’s analysis, Jesus is history’s most significant figure, followed by Napoleon, Muhammad, William Shakespeare and Abraham Lincoln. According to Skiena, while the rankings satisfy a certain public fascination with “who’s on top,” they also help us understand what forces influence historical significance, the effectiveness of human decision processes (are the historical figures included in our children’s textbooks truly the most significant ones?) and generally what these large data sets say about our culture. For example, the authors found that women remain significantly underrepresented in the historical record compared with men, a gap analogous to requiring that the average 18th-century woman in Wikipedia be four IQ points smarter than the average man.

Skiena’s fascination with algorithms — and their practical applications — began during his high school years, when he first learned to program a computer. “It wasn’t very common for kids to be interested in computer programming back then,” explains Skiena. But in 1977 when he developed a program that could successfully predict the outcomes of NFL football games, his computer model became “news” and he was given a weekly column in a local newspaper to tout his predictions. Skiena then developed an algorithm for betting on jai alai, a game he came to know and love while on vacation with his family in Florida, and later wrote a book (Calculated Bets) on the mathematics of gambling.

While in graduate school at the University of Illinois at Urbana-Champaign, he was part of a team that won a competition sponsored by Apple to design the Computer of the Year 2000. “That was in 1988,” says Skiena, “and when the iPad was introduced in 2010, it looked astonishingly like what we would’ve designed.”

For Skiena, the goal of computer algorithm design is to find “correct and efficient methods for solving problems.” Analyzing large-scale data sources is one of the ways he does this. “Character string data is interesting,” he says, “whether it’s the 3 billion character string from sequencing the human genome, or gigabytes of text from the millions of articles in Wikipedia.” In his Data Science Laboratory at Stony Brook, Skiena and students use large-scale text analysis to chart the frequency, sentiment and relationships among millions of people, places and things. The Lydia project, launched 10 years ago, was built to monitor the news and blog world, and became the foundation of the company General Sentiment, a social media analysis startup at which Skiena serves as co-founder and chief scientist.

Skiena’s approach to biology is similar. He designs algorithms to answer such questions as “What gene affects what other genes?” and “Does nature always select the best sequence for gene encoding?” Skiena has made significant contributions in several areas of algorithm design. He developed new DNA sequence assembly techniques to put together genomes like that of Borrelia burgdorferi, the bacterium that causes Lyme disease.

“Data analysis is both an art and a science,” Skiena has said. “The right data representation lets us hear what the numbers are trying to tell us.”

For one of his current research projects, Skiena and his team dip back into Wikipedia to explore other topics in natural language processing.  A current focus is on using Deep Learning techniques to build concise representations of the meanings of words in all major languages, and use these powerful features to recognize named entities and measure sentiment and other properties of texts.

As the cost of storing data continues to drop, and the ability to collect data increases, the challenge for Skiena is making intelligent use of this massive volume of information. “Stony Brook has given me the freedom to pursue a very eclectic group of projects,” says Skiena, “and with access to such great students here at Stony Brook, the possibilities are limitless.”

By Joanne Morici