Planet Knight-Mozilla OpenNews

July 31, 2015

Sonya Song

Cataloguing Internet Censorship

Note: this blog post is a republication of my recent contribution to China Outlook with permission. China Outlook is “an online, subscription-only newsletter that specialises in writing and research about China’s future. Based in Hong Kong, it is editorially independent.” 

The composition in this photo is a very visual allegory for the attitude of many citizens towards what the government wants them to believe. The photo was taken outside a disco club by the name of “Propaganda” at Wudaokou, Beijing, in 2005. This sleeper is most likely a migrant worker from the surrounding countryside. He was using his shoes as a pillow, the only comfort he could afford, while resting on the hard concrete stairs. Credit: the same as the author of this blog.
  
Just before 6 am on 26 June 2013, rioting broke out in the town of Lukqan Township in Xinjiang Uyghur Autonomous Region, in northwest China and home to millions of Uyghur Muslims. At least 24 people were killed by suspected Islamists, who set about them with swords and knives. 

About seven hours later the state-run Xinhua News Agency broke the news on its English news wire service, followed closely by numerous Chinese news portals that covered the story with a Chinese translation. 

A few hours later, Chinese speakers living in the United States first heard the news on the BBC and CNN, both of which quoted the original report from Xinhua. But when they began to look on Chinese websites, they could find hardly a trace of the story. The censors had been at work. 

For those living in China and China observers such censorship of important events is not unusual. China has never been far from the bottom of the Press Freedom Index published by Reporters without Borders and is presently in 173rd position, with just six countries worse . Foreign websites are routinely blocked and Chinese websites are under continual, close scrutiny. 

According to the Harvard Berkman Center for Internet and Society, China devotes “substantial technical, financial, and human resources” to develop the apparatus of censorship and has instituted “by far the most intricate filtering regime in the world.” Since censorship is a common practice in China, censored information has become an alternative perspective that we should not neglect when seeking to understand this country. 

Censorship is a crude tool at the best of times and often the material censored carries crucial information for people both inside and outside China – even if it is too inconvenient for the Chinese authorities. The outbreak of SARS in 2002-3, for instance, was censored from Chinese media for five months, presumably to avoid spoiling the harmonious atmosphere created for the 16th National Congress of the Communist Party. However, the decision was not without implications: it allowed the virus to travel irreversibly across continents until a worldwide epidemic emerged. 

So too with other subjects, which, like SARS, are not only inconvenient but also crucial; in fact, it hardly makes sense to devote huge resources in terms of human labour and computing power to monitor and eliminate subject matter that was merely trivial. 

At the same time, the authorities’ decision to censor information that is inconvenient, even if it is important, provides an opportunity to observe China from the standpoint of what it discards, rather than what it consumes. And that is precisely what a number of researchers outside China have now begun to do – namely, to examine and assess news stories that have been censored from the Chinese media. 

Of course, the analysis of deleted stories will never provide a full picture of media control. In many cases journalists familiar with a particular regimen will know not to write certain kinds of stories. This is a form of pre-censorship. The journalists in a newsroom may often be privy to certain information that they know it would be foolish to circulate. But with articles that have appeared and then just as rapidly have disappeared, there is a different situation. In these cases, the material has been published, but is subsequently judged to be unsuitable and is removed. 

But how to collect this censored information before it vaporizes? In recent years scholars and institutes have been trying to uncover information censored from news portals and social media in China. A common technique involves two steps: collect and check. First, information published online in China is collected using big data techniques and, in the second stage, is repeatedly and continuously checked for availability. Once a link appears broken, it is “red flagged” for suspicion. 

While it is possible for articles to be removed completely for editorial purposes, in practice this is rare. In such cases, as with corrections to, say the New York Times website, it is usually possible to identify the corrections, through the use of italics or some similar device. 

Even stronger evidence to rule out alternative explanations beyond censorship can be obtained by comparing deletions in a variety of news media to see if they cover similar topics. If so, then censorship is a strong possibility, because similar deletions reflect the systematic control of content, which in turn is a good indication of regulated behaviour.

One recent censorship study conducted jointly by Michigan State University and the City University of Hong Kong focused on NetEase and Sina, two major news portals in China. 

From November 2011 to October 2012, the researchers found on average that two articles were deleted from each website per day and that the deletions from the two websites followed similar patterns. In particular, domestic news had a significantly higher probability of being deleted compared to international news: twice as likely for NetEase, and six times for Sina. Beijing stories had twice the probability of deletion compared to news covering other places in China. Surprisingly, very few articles on Tibet appear to have been deleted, a fact that the researchers put down to pre-censorship. Compared to neutral stories, for NetEase, positive news had one third the probability of being deleted whereas negative news nearly four times, and for Sina, negative news had three times the probability of being deleted. 

From a list of 13 news topics, five were strongly associated with deletions: politics, business, foreign affairs, food and drugs, and military. These topics frequently included sensitive keywords or phrases, such as land acquisition, death toll, social unrest, poor working environment, food safety, and disputed territories. 

These findings are in line with sociological theory on censorship, which suggests that the elimination of improper political news helps keep ideological purity, the removal of military news reflects a concern over national security, the ban on news covering disputed territories indicates the protection of national interest, and the expurgation of news on unsafe foods is one approach to maintaining social order. Like all modern states, China is wrestling with the impact of the growth in social media. It is aware of just how quickly online media can amplify the impact of events and invite participation, as seen in the Arab Spring movement. In his well-received book Rewire, Ethan Zuckerman, director of the MIT Center for Civic Media, narrates how this movement started with a family’s protest against government corruption in Tunisia, spread beyond one town, and eventually reached over a dozen countries. 

Over a decade before the Arab Spring, China’s leadership had foreseen the potential threat of online media and started developing censorial strategies and tools. What it aims to constrain is the mobilizing power of online media, as indicated by a study conducted by Gary King and his colleagues at Harvard University. 

From the messages deleted from nearly 1,400 Chinese social media platforms, they observed that the state aims to prevent and suppress ongoing and potential collective activities. This is in contrast to the widely held view that first and foremost the Chinese censors target harsh criticism of the state. Hence, on the one hand, social media are censored to prevent mobilization, and on the other hand, news media are censored to eliminate possible triggers for such mobilization. That is why international news was found to have been much less deleted than domestic news from NetEase and Sina, because remote events are not relevant enough to provoke strong reactions among citizens. 

China is not the only country to censor social media. Censorship exists in all societies and all forms of media. For example, there is presently a growing international debate on the ease with which pornography can be accessed online and whether or not this is a danger to children. Websites regarded as promoting Islamic fundamentalism are routinely banned in certain countries. In China, whilst censorship is pervasive, the debate over who controls online access to information and what are its limits has barely begun.



July 31, 2015 04:40 PM

June 15, 2015

Erika Owens

Heading to the AMC

I am so excited to be heading back to the Allied Media Conference this year. And not just that, but I get to facilitate a session with Knight-Mozilla Fellow alum Gabriela Rodriguez while two of our current fellows Tara Adiseshan and Francis Tseng facilitate a session of their own.

I have only been to the AMC once before, but I've been wanting to get back ever since. Personally, it was an experience where I built relationships with activists in Philly during a long car ride, learned about political perspectives I never encountered in college, saw amazing art in museums and on buildings throughout Detroit, and generally had my assumptions and comfortable resting places challenged. It was the first time I had been to a conference that had personal and professional significance. Since then, I've encountered hints of the AMC spirit, such as from events with Aspiration Tech, who are facilitating the community technology network gathering on Thursday.

As my professional and personal work has continued at the intersection of media, technology, community, and activism, I've longed to return to the AMC. But the timing hasn't worked out. Till this year.  I am able to attend, and I won't be there alone. I'm again carpooling with a couple of friends and will be joined by inspiring colleagues. I know that there is a huge amount of overlap between the activism and tech work that many AMC participants are involved with and the work we do at OpenNews. I know there are some amazing Knight-Mozilla Fellowship candidates in the crowd and I cannot wait to chat with them. (Please do say hi or ping me on Twitter if you want to chat.)

It's already clear that the community technology network gathering will be a highlight. The agenda makes it clear we'll have the time and space to talk through how our work in tech affects us as individuals and within the communities we serve. The coordinators of the gathering include folks who have spent years leading open source community efforts and the chatter I've seen on Twitter has shown participants will have a lot to teach and learn from each other. It also appears that the agenda has been evolving a bit over the past few days--I love being able to learn from how organizers put together a schedule, choose activities, and the big (and small) decisions that go into creating a safe, supportive space that encourages open dialog. I'm confident the network gathering and AMC as a whole will be such a space.

And so I get to spend a weekend immersed in an activist world, talking about my favorite subjects, and even facilitating a panel about one of them--you guessed it--journalism technology. Then, a few days later, I get to take all that I learn at the AMC and apply it to helping run our own conference, SRCCON. It's going to be a busy week in the midwest. I can't wait.

June 15, 2015 09:23 PM

January 12, 2015

Dan Sinker

OpenNews: 2015 Fellowship Onboarding is GO

I love Los Angeles. Peel back the Hollywood veneer and, at its core, it’s a city that belives in putting in the work.

Which is why I’m excited to be in LA this week with our 2015 cohort of Knight-Mozilla Fellows to start the work of the fellowship year. With a distributed fellowship like ours, where fellows will spend far more time apart than together, it’s important to start the experience building the pathways of collaboration, community, and sharing that we want our fellows to continue to utilize throughout their fellowship year. It’s also an opportunity to meet somewhere warm and to celebrate the start of an amazing year.

We’re not just celebrating the start of the fellowship year at this onboarding, we’re also welcoming our final fellow for 2015: Kavya Sukumar, who will be spending her fellowship year at Vox Media.

Kavya is a developer-journalist who appreciates both elegant code and well-written prose. Everything about journalism fascinates her and she wants in on it all. She has reported and written stories, analyzed data and built a CMS. She has more than eight years of experience working at technology companies as well as in newsrooms. Kavya was a software engineer at Microsoft when the journalism bug bit her. She has a graduate degree from Medill School of Journalism where she was a Knight Scholar. She is currently a Data & Interactives Editor with the The Palm Beach Post’s investigative team.

We’re thrilled to have Kavya join the already-amazing cohort of 2015 Knight-Mozilla Fellows, and excited to have all of them together with us in LA this week. There’s so much more to come in 2015 from OpenNews, and it feels great to kick off an incredible year with these incredible people.

January 12, 2015 05:00 PM

December 17, 2014

Sonya Song

Q&A on Censorship with the Oxford Internet Institute

After presenting my study on China's censorship of online news at the Oxford Internet Institute (OII), I had a great talk with David Sutcliffe, the editor of the OII Policy and Internet Blog, and went through the following questions. The full conversation is published on the blog post titled Uncovering the patterns and practice of censorship in Chinese news sites 
  1. How much work has been done on censorship of online news in China? What are the methodological challenges and important questions associated with this line of enquiry?
  2. You found that party organs, ie news organizations tightly affiliated with the Chinese Communist Party, published a considerable amount of deleted news. Was this surprising?
  3. How sensitive are citizens to the fact that some topics are actively avoided in the news media? And how easy is it for people to keep abreast of these topics (eg the “three Ts” of Tibet, Taiwan, and Tiananmen) from other information sources?
  4. Is censorship of domestic news (such as food scares) more geared towards “avoiding panics and maintaining social order”, or just avoiding political embarrassment? For example, do you see censorship of environmental issues and (avoidable) disasters?
  5. You plotted a map to show the geographic distribution of news deletion: what does the pattern show?
  6. What do you think explains the much higher levels of censorship reported by others for social media than for news media? How does geographic distribution of deletion differ between the two?
  7. Can you tell if the censorship process mostly relies on searching for sensitive keywords, or on more semantic analysis of the actual content? ie can you (or the censors..) distinguish sensitive “opinions” as well as sensitive topics?
  8. It must be a cause of considerable anxiety for journalists and editors to have their material removed. Does censorship lead to sanctions? Or is the censorship more of an annoyance that must be negotiated?
  9. What do you think explains the lack of censorship in the overseas portal? (Could there be a certain value for the government in having some news items accessible to an external audience, but unavailable to the internal one?)

December 17, 2014 04:17 AM

Talk on News Censorship


I'm fortunately funded by the Knight-Mozilla OpenNews Fellowship program to attend a conference on China and the New Internet World organized by the Oxford Internet Institute.  There I will give a presentation on China's news censorship.  I've uploaded the full paper and the slides online, please feel free to download them for more information.  Also, I have more data and preliminary findings unpublished and I'd love to share and discuss them.  My email address is songyan at msu dot edu

Prior and Ongoing Research on Internet Censorship

Internet censorship has been attracting much attention from various academics and institutes.  For example, the Open Net Initiative (ONI) has been constantly testing the availability of websites in 74 countries and rating government control of content related to politics, social issues, Internet tools, and conflict/security (Palfrey, 2010).  The Open Internet Tool Project (OpenITP) surveyed circumvention tool users living in China to understand how they bypass the Great Firewall in hopes of building better tools to serve the needs of internet users in China and other censored regimes (Robinson et al., 2013).

Among the empirical studies focused on online media, Bamman et al.’s (2012) work claimed to be “the first large–scale analysis of political content censorship” that investigates messages deleted from Sina Weibo, a Chinese equivalent to Twitter.  They found 16.25% of posts were deleted after their publication time and recognized some characteristics related to post deletions, including 295 sensitive keywords and the outlying provinces such as Tibet and Qinghai.  Beyond Sina Weibo and on an even larger scale, King et al. (2013) collected data from nearly 1,400 Chinese social media platforms and analyzed the deleted messages with the aid of linguistic software.  In contrast to previous presumptions that its harsh criticism of the government is the target of censors, King et al. found that indeed it's ongoing and potential collective activities that the state aims to prevent and suppress. 

Research Methods in a Nutshell

To our best knowledge, however, censorial practices in online news media have never been studied, not to mention extensively investigated through computing approaches.  Therefore, our study may be the first empirical attempt that systematically examined the news articles deleted from the Chinese cyberspace.  

We developed scripts to collect news articles published on NetEase and Sina, two major news aggregators headquartered in China.  Meanwhile we continuously checked whether or not these articles remained available and we marked a news article as deleted once its link was found broken.  In fact, to make sure that the news story was really deleted due to its content rather than editorial or technical reasons, we searched across the websites for the articles with the same title but under a different link.  Only when duplicates were unavailable did we claim that a particular story was deleted. 

After collecting thousands of deleted news stories, we ran a regression over these data to detect patterns associated with deletion.  The technique we adopted is ReLogit (King and Zeng, 2001a and 2001b), a logistic regression handling rare events data.  This tool was developed by political scientists to analyze rare events, such as wars and coups.  For this reason, this is an appropriate tool for our study because the over deletion rates across the two websites were under 1%, as summarized below. 

Findings and Conclusions

During the course of our study, on each website, about two articles were deleted per day and the overall deletion rate was 0.05% on NetEase and 0.13% on Sina Beijing.

Several similar patterns have been found across the two news portals: 
  • Domestic news had a significantly higher chance of being deleted than international news: twice as likely for NetEase, and about six times for Sina Beijing.
  • News covering Beijing had twice the chance for deletion compared to news covering other places in China.
  • Tibet as a subject matter had little relation with deletion. 
  • National, compared to local, news was significantly associated with deletion for both websites: For NetEase, one and a half times as likely to be deleted, and for Sina Beijing one third times as likely to be deleted.
  • Nature of events was another strong indicator. Compared to neutral stories, for NetEase, positive news had one third the chance to be deleted whereas negative news nearly four times, and for Sina Beijing, negative news had three times to be deleted.
  • Five out of 13 coded news topics were strongly associated with news deletions, including politics, business, foreign affairs, food and drugs, and military, although the strengths varied across the categories and the websites.
From this evidence, we reached the following conclusions: 
  • The two Chinese news portals deleted news with similar patterns.
  • These similarities are translated to the practice of systematic control, the quintessential component of the definition of censorship (Peleg, 1993). 
  • Hence, for the first time, we have confirmed and quantified the online news censorship in China. 

Taboo Words

Beyond news deletion, I've been examining comment deletions as well.  I've created some word clouds with the help of Wordle and highlighted the keywords most commonly found in deleted comments.  They're not included in the paper or the slides. 

These keywords are aligned with our general understanding of taboo topics, such as land acquisition, death toll, social unrest, food safety, pollution, and lamentable work environment. 

image

Comments Prohibited and Suppressed

A second research topic of mine is how comments are manipulated and what patterns are associated with the manipulation.  Various types of manipulation have been observed and they include having commenting function disabled, screening and filtering submitted comments before publication (i.e., pre-censorship), and deleting published comments after publication (i.e., post-censorship).  This topic isn't included in the paper or the slides. 

To make this research topic more understandable, I'll first elaborate on the general practice of Chinese news portals.  Most of the time, news portals welcome and encourage comments because interactions boost web traffic.  However, a small portion of news stories have their commenting feature disabled.  There are two way to implement this function.  On NetEase, a notification is put under a story, informing "commenting is disabled" and the button for commenting is unavailable.  Sina takes a more subtle approach and puts no such a notification and meanwhile users can submit comments as usual but the comments are never displayed on the website.  These are pre-censorship techniques.  As to post-censorship, both websites simply remove comments quietly after their publication.  A third type of manipulative technique is different from passively pre- or post-censoring comments, but to proactively hire Internet commentators, or so-called 50 Cent Party, to propagate orthodox ideas endorsed by the government. 

The following time-series chart demonstrates the first type of comment manipulation, which is to prohibit comments.  In this way, party organs attempt to impose official opinions through one-way communication on issues on North Korea, outlying provinces, controversial territories, major criminal case, and so on. 
image
More subtly, Sina "allows" comments but never shows some of them on the website.  I've figured out how to send parameters to the API to request the numbers of pre-censored comments and drawn the following chart that shows the new stories having no comment at all although their commenting function is "available". 

image

The third time-series chart exhibits the amount of comment deletions on a weekly basis.  The topics found in the deleted comments are fairly aligned with those deleted from news stories. 

image

This study was funded by the Google Policy Fellowship 2012 and collaborated between the Quello Center for Telecom Management and Law at MSU and the Center for Communication Research at the City University of Hong Kong.  Please send your comments and questions to songyan at msu dot edu.  Thank you for reading this post.  

December 17, 2014 04:17 AM

October 22, 2014

Dan Sinker

OpenNews: Announcing our 2015 Knight-Mozilla Fellows!

We’ve been looking for Knight-Mozilla Fellows for four years now, and so you begin to notice patterns during the process. There’s that moment when you worry that there won’t be enough applicants, and then that other when you worry there will be too many. There’s that melancholy time when you realize that you won’t have a fellowship cohort quite like the current one and then the exhilaration when you realize that’s exactly right.

But the most important moment is the one when all the pieces begin to come together and you begin to see not an applicant but instead a fellow. That moment is magic: the sheer volume of applications (417 this year–our largest pool ever) disappears and where there was once a mass of qualifications and ideas, you begin to see truly extraordinary individuals.

It’s a great pleasure today to introduce those individuals–our 2015 Knight-Mozilla Fellows–to you. These folks will spend 10 months in 2015 experimenting in some of the best newsrooms in the world (they’ll be joined by one more Fellow, at Vox Media, who will be announced later this year), on a mission to try new things, to document them in the open, and to connect with the broader community of people writing code in journalism.

The work that the Knight-Mozilla Fellows do during their fellowship year doesn’t fit easily into a single sentence. Over the year a fellow will play the role of coder, teacher, mentor (and mentee), adventurer, colleague, and friend. They’ll push themselves, and journalism, in new directions. They’ll do work that has real impact–on themselves and on the web.

It’s a tall order, but a thrilling one, and the people we have lined up to do the work of a Knight-Mozilla Fellow in 2015 are among our very best yet. I can’t wait for you to meet them:

Tara Adiseshan | NYT/Washington Post

Tara Adiseshan is a designer and data visualization engineer who is excited about civic media, learning tools, and community platforms. From designing search futures at Autodesk to conducting user research around rainwater harvesting in rural India, Tara has had the opportunity to apply design methodologies and build solutions in a variety of disciplinary spaces. Tara believes that access to and understanding of information and data can be a key leverage point through which social systems change. Tara will be a Fellow at the Coral Project, a collaboration between the New York Times, the Washington Post, and OpenNews.

Follow Tara on Twitter at @taraandtheworld

Juan Elosua | La Nacion

Juan Elosua is a Spanish telecommunications engineer with broad experience in tech consultancy and financial services IT. In 2011, he discovered data journalism and became a data addict and freelance developer, and can now be found turning data upside down to extract knowledge from it. He strongly believes open data will play a key role in shaping the future of modern societies, and has trained journalists to help them find stories and work efficiently on data-related projects.

Follow Juan on Twitter at @jjelosua

Livia Labate | NPR

Livia Labate is a user experience designer and manager with a passion for in-house practice development. Livia is interested in how open source tools empower news creation and dissemination, and shape access to information and social participation. With over 15 years of industry experience, she has worked with large organizations such as Comcast and the BBC as well as heavily contributing to the development of the Information Architecture community of practice through the IA Institute. More recently, Livia has led Marriott’s Digital Standards and Practices group, focusing on stewardship and governance of digital experiences.

Follow Livia on Twitter at @livlab

Linda Sandvik | the Guardian

Linda Sandvik is a creative technologist and proto-MacGyver who likes to make things that inform, educate, and empower people and communities. She previously worked in local government and at Last.fm, and is a co-founder of Code Club, and her particular interests lie in using play and technology to help people discover their natural affinity for teaching themselves new things. She has a passion for open data, open knowledge, and serious games.

Follow Linda on Twitter at @hyper_linda

Julia Smith | CIR

Julia Smith is a design professional from Omaha, NE. She’s held a variety of roles in journalism and IT, having worked as a designer and developer on news sites, mobile applications, enterprise software, and corporate websites. She is fascinated with civic media and loves exploring the connections between storytelling, design, and technology to create experiences that empower community change.

Follow Julia on Twitter at @julia67

Francis Tseng | NYT/Washington Post

Francis Tseng is a programmer and interaction designer interested in natural language processing, internet socializing, demystifying technology, and systems modeling. After two years at IDEO, he became a Knight Foundation prototype grant recipient in 2014. He is currently teaching the News Automata course at the New School’s Design + Journalism program and designing and building _critical_ software with friends at Public Science. Francis will be a Fellow at the Coral Project, a collaboration between the New York Times, the Washington Post, and OpenNews.

Follow Francis on Twitter at @frnsys

October 22, 2014 06:07 PM

Erika Owens

Recruiting and selecting an amazing group of fellows

2015 Knight-Mozilla Fellows

OpenNews has just announced the 2015 Knight-Mozilla Fellows. This is our fourth cohort of fellows and we've learned a lot the past few years about where to look for people who will challenge and inspire us for the next year (and beyond). So how do we do it? Here’s some details about what we've tried, what we've learned, and the learning we still have to do.

A quick look at the numbers

Each year, we've seen a ~60% increase in overall applications. Last year, I did some analysis to see if this increase was skewed by a particular group, or if were also increasing (or at least holding steady) the diversity of our applicant pool. It seemed that both gender and geographic diversity increased as the applicant pool grew.

This year, we added a question so that applicants could self-identify as a woman, person of color, or member of another group under-represented in technology. We're going to use these self reported identifications going forward in our analysis. We may evolve them a bit due to clarity--for example, "person of color" is a pretty US-centric phrase, and we have applicants from around the world. The third category was an open response and the answers really showed how many different perspectives can be covered in diversity: people identified as being formerly incarcerated, LGBT, self taught, from rural areas. It was great to see this wide diversity of experience.

The numbers - 2015 cohort

  • 416 - Valid applications
  • 104 (25%) -  Identify as a woman
  • 88 (21%) - Identify as a person of color
  • 73 (18%) - Identify as another member of another under-represented group

Where are people from? (some example countries)

  • 42 (10%) - Argentina
  • 139 (33%) - USA
  • 26 (6%) - India
  • 11 (3%) - Kenya
  • 7 (2%) - Germany

How'd people hear about the fellowship?

  • 69 (17%) - Twitter
  • 53 (13%) - Friend
  • 33 (8%) - Email lists
  • 26 (6%) - Mozilla
  • 10 (2%) - La Nacion
  • 10 (2%) - Facebook

In comparing to last year, the overall number of women applicants doubled and the proportion jumped by a quarter. We maintained a similar level of geographic diversity as compared to last year, with 42 applicants from Argentina due to La Nacion's stellar outreach efforts. The proportion of US applicants is actually exactly the same as last year, which is encouraging given that we had fewer fellows based with international news organizations this year.

Our fellow alumni network and events support still keep us connected to communities internationally. This connection is also evident in the ways people listed they heard about the fellowship--friends continue to be a major referrer. Many people contacted their personal networks (thank you!) or shared contacts with us to reach out to about the Fellowship. Also, people heard about the fellowship from a lot of email lists: AdaCamp Alumni, TechLadyMafia, OpenNews’ community list, and a wide variety of local tech, civic hacking, and data viz groups. These outreach efforts to individuals and through wider-reaching lists and Twitter have been a key component of ensuring that amazing people do indeed apply.

Connecting with potential applicants

Last year I also wrote about our recruitment efforts and we followed that advice again this year: build relationships throughout the year and do a lot of direct outreach. Beyond that, I want to call out two important parts of our efforts: responsive communications and a program that is designed to be as accessible as possible.

We do our best each year to give people the information they need to feel able to apply--a sense of the program, an idea of what we're looking for, and a hefty helping of encouragement to, yes, apply. We redoubled these efforts this year. In addition to blog posts from Fellows, news orgs, and working news nerds, we developed a series of blog posts that responded to questions we heard from applicants. What's different about a fellowship? Am I a competitive applicant? What's exciting about working in journalism? Additionally, we held office hours, had Q+A sessions on our community calls, responded to email and tweeted questions, and jumped on quick calls to chat with potential applicants. We tried to make ourselves easy to reach, by whichever method was preferred by the applicant. All of our communications efforts were designed to be open and responsive to the needs of applicants.

Similarly, the fellowship program overall is designed to be accessible. Part of this is baked into the structure of the stipends and supplements for the fellowship--they scale with family size and cover things like moving expenses, health care, and child care costs. We want everyone who is interested in pursuing a fellowship to feel able to do it. We very consciously run the Knight-Mozilla Fellowship in a way that is considerate of people’s needs as humans: we try to make travel easy, food tasty, sleep and rest a feature of all events, and generally support people in doing what they need to do to feel healthy and productive.

Part of creating a program that feels welcoming and accessible is also about culture and expectations. We want people from all types of backgrounds and all different temperaments to be able to participate in the fellowship. We do our best to achieve this by removing as many barriers to entry or biases as we can. For example, some fellowship programs require applicants to pitch a "big idea." But not everyone is good at coming up with big ideas (and the people who are often come from privileged groups). We want to make sure our fellowship is open to people who are good at that and people who are good at other things. Instead of asking for a litany of tech skills, we ask people to talk about projects they've worked on and to describe their role in the project. People who communicate best visually can wow us with a stunning project link, while people who communicate well via text can write an engaging description. These are just some examples that come from active decisions. We challenge ourselves to think: who does this include? Who might it exclude? This skepticism, questioning of our own biases helps create a program that feels accessible to a wide range of people.

Selecting the Fellows

Each year, we get applications from a larger number of people with a staggering variety of skills and experiences. Our job gets harder, but for good reason--the recruitment effort seems to really pay off!

In the selection process, we make sure there are multiple chances for people to shine. Frequently, someone who has an interesting application shows how they are even more brilliant than words on the screen could capture when we get to chat with them. Throughout this process, Dan Sinker and I get to look over all of the applications and participate in all of the interviews. This helps us identify when a prospective fellow who interviews with one organization might be a better fit for another organization. We also have a perspective on the full fellowship cohort--we can see what personalities might fit well together and help make sure the cohort reflects a diversity of experience.

This process is made more difficult by the fact that we have so many tremendously skilled applicants. The first stage in the selection process involves cutting the entire pool down to about 25% of the applications, to a more manageable number of applicants for news organizations to review. A few years ago, this process mostly involved just removing unqualified applicants. But for the past two years, it's been arduous. We make sure at this stage to include in that shortened list a healthy amount of women and people of color applicants as well as to retain a mix of geographic representation. At each stage of the process, we keep track of these areas.

Now, I want to be very clear about something: this proactive attention to ensuring that a diverse pool of applicants is present at every stage of the selection process does NOT mean that unqualified people get pushed through because of some or another "special status." It's just not possible. We have too many applicants who are too awesome to be able to push a less skilled applicant through. At every stage of the process we have to say no to incredibly amazing people, of all backgrounds. And, due to our recruitment efforts, the structure of our program, and being proactive in tracking diversity throughout the selection process, we're also able to say yes to incredibly amazing people, of all backgrounds.

Questions we still have

This is such an amazing process to be able to participate in, and I'm so grateful to all of our applicants as well as to everyone who documents their recruitment and selection work who we have been able to learn from. We are not done learning, not by a long shot.

All those incredibly amazing people we are unable to accept into the program? We need to do a better job at staying connected with them throughout the year. We stay in touch, we connect people with job opportunities or events or other collaborations that may arise, but I know we could be doing more. How do other organizations do this? How do you make sure that when someone raises their hand to get involved, you're actually able to keep up with them in a way that supports their work and leadership development?

As the fellowship program continues, our fellowship alumni community grows. Part of recruitment is also follow through with the current fellows--what happens after their fellowship? How do they stay connected and build on the experiences they had during their fellowship year? I know lots of programs have been at this for decades and must have great suggestions on alumni support.

And finally, I would *love* some advice on program evaluation. I've outlined some of the analysis we've done with the data we have about applicants. I've mentioned how we challenge ourselves to push against our own biases. But, wow, I know there's so much more we could be doing. I bet there are improvements we could make to the selection process, and information we could learn form the applicants that would help us better support them and the community. If you're into data and forms and cognitive biases, yes, let's talk.

A community of fellows

With this announcement, we now have 25 people in the Knight-Mozilla Fellowship community. As our group of fellows has grown, we’ve also seen a growing number of fellowship opportunities in technology Through the Knight-Mozlila Fellowship, we’ve seen the way that a fellowship program can propel individuals to do fantastic work and become leaders, while also serving as a point of connection between news organizations and journalism tech groups around the world. We’ve from other organizations, and we’re excited to share what we’ve done with other fellowship programs as well. We’ve seen that thoughtful outreach works.It is possible to create a space that feels welcoming by designing a program with the needs of many different humans in mind, and that is able to adapt and respond to those needs. And through an intentional selection process, we are able to welcome truly inspiring people to the fellowship.

October 22, 2014 03:28 PM

October 08, 2014

Dan Sinker

OpenNews: Elections Code Convening. Open your code with us!

This year has been a year of trying new things at OpenNews. One of the big things we’ve been doing is experimenting with ways of bringing newsroom developers together to open up projects together. We call them Code Convenings, and we’re opening up applications for our third Convening today.

The idea behind Code Convenings is pretty simple: we’ve found that often the thing holds code back from being open-sourced is just finding the time to do that last-mile abstraction work and creating first-class documentation. Code Convenings bring devs together for a couple days to do exactly that. We feed you, put you up in a hotel, and give you the time and space to do the work that’s necessary to get some great code out.

Our first code convening was in Portland Oregon this spring, and resulted in four great projects being opened up–since then, they’ve been used and reused numerous times. Our second code convening brought together folks to collaborate on a single code base, resulting in the creation of the California Civic Data Coalition. We considered both these convenings prototypes: opportunities to try things out with a reduced number of variables. As a result, we invited folks to take part, but kept the lead-up quiet–no need to promote while we were still figuring things out. Well, we think we’ve got this relatively figured now, so we’re going public for the last Code Convening of the year.

We’re hosting an OpenNews Code Convening in New York City November 13 & 14, and we want your news developers to take part. This will be coming soon after the midterm elections in the United States, and so we’ve chosen “Elections” as the organizing theme of this convening. If your newsroom has been working on some interesting code this election cycle, that you’d like an opportunity to open up to the larger journalism code community, you should apply.

We’re moving pretty quickly here: The application opens today and closes on October 17. We’ll be selecting a maximum of five projects, and will notify folks if theirs have been chosen by October 21. You’ll need to commit two people or, if you can only send one, work with us to find a good partner) to the two days of the convening, and we’ll cover food and travel. It will be so awesome.

This is a great opportunity to get code out into the world: take it!

October 08, 2014 09:10 PM

August 26, 2014

Erika Owens

Journalism & Data at MozFest: We want your session ideas

At the Mozilla Foundation's annual festival for the open web, OpenNews organizes two tracks of hands-on hacking and building focused on journalism and data. The call for proposals is open till August 29, so get pitching!
 

What happens at MozFest?

MozFest takes over the Ravensbourne building in London for 2.5 days on October 24-26. Ravensbourne's nine stories become home to a physical representation of the possibilities of the open web: from teaching to music, science to journalism, 1,000 participants dig deep into the content, code, and questions that drive the open web. Rather than slide-driven lectures, MozFest sessions are collaborative, interactive, and focused on uncovering new possibilities for what it means to work online, in the open.
 

What makes a good journalism or data session?

Journalism drives the web both through content and code, and MozFest sessions reflect that wide range. Data is a part of what helps us understand the world around us, whether that be through visualizations, sensors, or good old fashioned research, MozFest is a place to talk about it. The most important part of a good session is that it is something you are passionate about--whether that be a tool you want to share, a question you want to wrestle with, or a problem you want smart people to help you solve. 
 
Much like SRCCON, MozFest session leaders play the role of facilitator rather than lecturer. Participants come ready to be generous with their expertise and time and they are oriented toward next steps and building, not just discussing. Session leaders help create that space by using personal experience as a frame for discussion and prototyping, not the sole focus. The team from Aspiration Tech, who help manage MozFest, also have some great guides on break out sessions and facilitating. The audience for MozFest is a little different from SRCCON though--1,000 open web lovers rather than 150 news nerds--which means if there's something you've been wanting to share with a broader audience, MozFest is a great place to do it. Also, there are a limited number of travel stipends available for MozFest, so if you're pitching a session and would require support to attend, please let us know.
 
Check out some writeups of MozFest from 2013 to get a feel for the event and the types of sessions that have worked well in the past.
 

Pitch!

The deadline is coming up in a few days, but that still leaves two months to refine your session idea, and OpenNews staff will be available to help talk through session prep. For now, we just want to hear: what do you want to share, build, or discuss?

August 26, 2014 03:22 PM

August 15, 2014

Noah Veltman

OpenNews: T-minus 36 hours

Tomorrow is the deadline to apply for the 2015 Knight-Mozilla OpenNews fellowships. Almost exactly two years ago, I was sitting in my friend’s living room in Queens, hemming and hawing about whether I should bother applying. I sent my application in a few hours before the deadline, which was nowhere near the record for cutting it close. Brian Abelson, my brilliant co-fellow who was placed at the New York Times, famously got his application in with eleven seconds to spare. If you’re interested in the intersection of code, journalism, and community, you should apply. It’s not too late.

I’ve written elsewhere about my own fellowship (See: 1, 2, 3, 4). Many others have written eloquently on what makes developing in the newsroom so great. But let me summarize some of the things that make the fellowship such a once-in-a-lifetime experience:

You’ll have a cohort of six incredible co-fellows to learn from and collaborate with. The magic of the OpenNews program is that the team takes care to select a diverse group of newsroom partners and a diverse group of fellows, so everyone brings something special to the table. You’ll be one of seven people with totally different backgrounds and expertise and that combination makes for beautiful, unexpected things. In my class we had Manuel, who, before he became a fellow, was launching fucking satellites into fucking outer space. We had Friedrich, the Harry Houdini of web scraping, who seems to know everything about open data and opening data there is to know. We had Stijn, who in addition to being obsessed with news analytics, also happens to play a mean gypsy jazz guitar. The list goes on.

You’ll get to travel the world meeting other amazing people. When you’re a fellow, flying to Buenos Aires to hack on projects with a thousand rogue technologists is in your job description. Hard to believe, right? OpenNews gives you the resources, financial and otherwise, to explore the entire universe of news nerds and civic technologists. By the end of the year you’ll have discovered so many intriguing organizations and people your head will be spinning.

You’ll have an unreasonable amount of freedom to pursue things you’re interested in. The OpenNews fellowship is a lot less structured than other programs, by design. You’re let loose in a newsroom to discover the things that you’re passionate about and the projects where your time is best spent. Some of us came in with projects in mind at the start, some didn’t, but all of us found our fellowships taking us in great unexpected directions.  The OpenNews team, Mozilla, et al. are there to support you in the ways you need and then get the hell out of the way and trust your instincts.

People will suddenly take you strangely seriously. Before I became a fellow, I was a complete outsider to this world. I had never worked in a newsroom. I didn’t know what NICAR was. There was no reason for anyone to give me the time of day. And yet they did. I was overwhelmed with how warm and welcoming everyone was throughout my fellowship year and how much people cared about what I had to say. The OpenNews imprimatur carries a tremendous amount of weight in the news nerd community.

And by the way, they’ll pay you fairly. A lot of fellowship stipends are so meager they turn you back into a starving student for the duration of your fellowship. OpenNews pays better than most, and is generous with things like relocation costs, housing supplements, equipment, and travel costs (details).

There are two hesitations I often hear from people who are considering applying for the fellowship.

“I don’t think I really fit the profile, [I’m not a developer/I’ve never worked in news/etc.]”

This was me, two years ago.  I almost didn’t apply because of this worry.  But here’s the thing: there is no profile.  Fellows come from all walks of life.  Some of them have worked in news before, some haven’t.  Some are master coders, some less so.  Most of them didn’t study computer science.  The roster of ex-fellows includes statisticians, artists, chemists, activists, engineers, academics, and even a medical doctor. The fellowship is less about what you’ve done and more about what you’ll be able to do given 10 months, a bunch of smart collaborators, and a lot of freedom.  The one thing the fellows all have in common is that we all thought we didn’t fit the profile when we applied.  Don’t let self-consciousness about your resume stop you from applying.

“But what comes AFTER the fellowship?”

This was me, one year ago.  Remember what I said above about how by the end of the year, you’ll have encountered so many intriguing new things your head will be spinning?  This is the real problem with the fellowship.  It lights up all kinds of synapses you didn’t know you had and introduces you to jobs and technologies and organizations you didn’t know existed.  By the end of the year you won’t be worried about having something to do next.  Your problem will be that you’ll have a thousand things you want to do next.  And this is pretty much the textbook definition of a “good problem to have.”  Like the question of “what makes a good fellow?” there’s no one particular answer to what people do afterwards.  Some of us have stayed in newsrooms, others have gone to tech companies or academia or something else entirely.  Mark Boas is living in a house on a hill in Tuscany and I imagine that his life basically looks like A Walk in the Clouds.  If you’re chosen, I promise that life-after-fellowship will be the least of your worries.

Got questions/concerns?  Ask me.  Ready to give it a shot? You’ve got 36 hours to apply.

August 15, 2014 04:30 PM

Dan Sinker

OpenNews: Breaking and Fixing (and one day left to apply)

They say that news “breaks.” And when they do, it conjures images of daybreak, shedding new light on the world. But news also breaks things apart: our understanding, our assumptions, how we thought the world was. This week feels a lot like that.

When we talk about the Knight-Mozilla Fellowships–applications for which close tomorrow night–we talk often about the experience of being in the room when news breaks.

But working in journalism isn’t just about being around when things break, it’s about staying in that room as the real work begins. Because news isn’t simply about breaking things: At its best, it is about fixing, about healing, about reaching understanding.

Looking at news break this week it’s clear that understanding is no longer achieved through the printed page or the broadcast booth–things move too quickly for that now. From parsing the Snowden documents to covering the streets of Ferguson, Missouri, real understanding now comes in new ways.

Those new ways mean bringing new skills into newsrooms and with those skills new ideas, perspectives, and backgrounds. It means experimenting with new forms of storytelling and new tools on the backend. It means collaboration, sharing, and working in the open.

This is what Knight-Mozilla Fellows do every day. And it’s why you should apply to join their ranks in 2015. If you love to build things on the web, if you’re a creative thinker who solves problems in code, if you’re a civic hacker, a data scientist, a web designer, or just a self-taught coder, join us.

As a Fellow, you will do many things: work in some of the best newsrooms in the world, have colleagues that will challenge and champion you in equal measures, write open-source code that gets used by thousands. But most important is helping bring understanding to a world that desperately needs it.

But it’s almost too late to apply. Deadline is tomorrow, Saturday, August 16, at midnight Eastern. Don’t hesitate. Change the world. Apply.

August 15, 2014 03:36 PM

August 13, 2014

Erika Owens

Who should apply to the Knight-Mozilla Fellowship? You.

reflection
The application deadline for the Knight-Mozilla Fellowship is this Saturday, August 16 at midnight. You may have already read the info on the website and caught up with what fellows have experienced, what next year's news organizations have planned, how we've set up the program to be welcoming to a variety of life circumstances, and how this fellowship can reshape your career.
 
But you might still be thinking, could that really be me?
 
Yes, yes it could. If you apply. We often get asked "what makes an ideal applicant?" Over the past three years, we've found that what sets a candidate apart is evidence that, as OpenNews Director Dan Sinker puts it, they are a "creative problem solver." This means, you're someone who enjoys solving problems through code--whether that be with a captivating design testing the limits of your JavaScript skills or with writing a python script to parse a giant data set or with programming an arduino to monitor what the hell is happening in the back of your fridge. :o)
 
We look for people who show a curiosity about the world around them--a key trait shared with journalists--and who turn to code to help answer the questions that arise. There is no single path to acquire the skills to be able to solve problems in this way. A few of our fellows studied computer science in college, but most didn't. Some have worked in software development, but some came from universities. Some have worked in journalism before, but most became immersed in journalism through the fellowship.
 
The way we get to know you, future applicant, is through the short application form. We say there are no trick questions and we mean it. This application is a chance to get to know, as much as is possible in a few questions, how you think, how you approach problems, the role you've played on prior projects. I promise, there is no "right" answer to any of these questions, it's whatever reflects you, your work, and your interest in applying your skills to journalism. 
 
If a question has you stumped--email us. We're here to help. For example, not everyone has an active GitHub account. Some people have worked (so far!) on proprietary code they cannot share publicly. We understand this and that's why we have an upload field for you to add a screenshot, wireframe, something to help us understand your project. We don't want any of the questions to get in the way of you applying--the application is just to ease the info transfer of your awesomeness into our review forms. If something's getting in the way of that, just shout. And if you're unsure of your own awesomeness, check out what Noah and Aurelia have to say about that.
 
What should you be considering as you fill out that application? The Fellows shared their thoughts today:
  • "You don't need a computer science background, just an ability to solve problems and find the answers you need to make digital things." - Brian Jacobs, Fellow at ProPublica
  • "We all have some technical background, but you certainly don't need to know everything about everything, no one does." - Ben Chartoff, Fellow at Washington Post
  • "The mandatory thing is curiosity and a passion for learning." - Gabriela Rodriguez, Fellow at La Nacion
You only have a few days left to get any other questions answered and fill in your application. Feel free to email me or stop by office hours tomorrow from 11am-4pm EDT in #opennews on irc.mozilla.org. 
 
So, who should apply? You. You, who loves to solve problems through code. You, who tackles new projects with inventiveness and enthusiasm. You, who has oddball interests you want to share. You, who wants to help change the world. Go ahead, you, apply now.
 
(Photo credit: Flickr user 18_2rosadik36)

August 13, 2014 02:54 PM

August 11, 2014

Dan Sinker

OpenNews: Last week to apply, a look at the total Fellowship package

This is the final week to apply to become a 2015 Knight-Mozilla Fellow. The application closes at the stroke of midnight Eastern time, Saturday August 16.

The last few weeks, we’ve had our news partners and our current Fellows make the case for why YOU should apply to become a Knight-Mozilla Fellow.

Becoming a Knight-Mozilla Fellow is a thrilling opportunity, one that will plunge you head-first into the problemsets of journalism, and allow you to experiment and build compelling solutions. We tell our Fellows that they should “follow your passions” in approaching their builds and projects.

But those passions require time, and moving to a new city (Fellows live in the city their host newsroom is located in) requires real dedication. As a result, being a Fellow is an adventure, but it’s also a commitment: of thought, of talent, and of time.

At OpenNews, we recognize that commitment and work to live up to it by offering a generous stipend and significant supplements to it that reflect the needs of the lives our Fellows lead.

In addition to the $60,000 Fellowship stipend, we offer a series of supplements to help offset the cost of housing, healthcare, moving, and more:

We want the year that you are a Knight-Mozilla Fellow to be amazing. We want you to make things that last long beyond your Fellowship year. We know that the first step on that is knowing that you’re taken care of during your Fellowship year, and we do our best to make sure you are.

The end of this week–midnight eastern Saturday night–is all that’s left to apply. Don’t hesistate: make the commitment to apply.

August 11, 2014 10:24 PM

August 10, 2014

Nicola Hughes

A Retrospec For The Future

OpenNews Fellows 2012. From left: Mark Boas (Al Jazeera), Cole Gillespie (Zeit Online), Nicola Hughes (The Guardian), Dan Shultz (Boston Globe), and Laurian Gridinoc (BBC)

OpenNews Fellows 2012. From left: Mark Boas (Al Jazeera), Cole Gillespie (Zeit Online), Nicola Hughes (The Guardian), Dan Shultz (Boston Globe), and Laurian Gridinoc (BBC)

As applications open for the 2015 round of OpenNews fellows, I reflect on my time and the first round of fellowships. The process was very different. We weren’t OpenNews then, we were just Knight-Mozilla fellows. We had no idea what we were getting ourselves into. But we made friends, immediately.

Looking back, what surprises me was how quickly we became comfortable with each other. Conversation, conferences and visits were so easy. I miss them and I miss the working dynamic, if what you could call when we got together “work”. It was so much fun.

I also recall meeting one of my best friends during my fellowship at The Guardian. She is a programmer to the core and a role model of mine.

I have met some of the subsequent fellows and they are of the same odd ball ilk. People you want in the community. People who make the community, give it the right flavour.

I was at a round table discussion recently held by Undercurrent about building diversity in the workforce. One thing that stuck with me was the power with which switching one little words changes the ethos of a company. When hiring, instead of asking yourself if the candidate will be a cultural fit you should ask if this candidate will be a cultural add.

This, I believe, is the beauty of the OpenNews programme. It does not look to fit fellows to a news partner. It looks to add to it in ways those partners do not yet realise they can gain from.

Similarly, what the fellows get from the fellowship is not a an easy fit but a brave addition to anyones life journey.

A quick word about what I’ve been up to: After finishing my fellowship at The Guardian I am now a Data Journalist at The Times & Sunday Times working full time scraping, parsing and analysing data as part of a data-driven investigative unit. We’ve had front page stories with both titles.

August 10, 2014 07:28 PM

August 08, 2014

Dan Sinker

OpenNews: One week to apply, our newsroom partners make the case

Becoming a Knight-Mozilla Fellow means being embedded in some of the best news organizations in the world. That means you won’t just be in the room when news breaks, you’ll be creating compelling new ways to break it. You won’t just have colleagues to learn from, but peers excited to learn from you too. And you won’t just be another set of hands in the newsroom–you’ll be experimenting, trying new things, and tackling major newsroom projects.

The deadline to apply to become a 2015 Fellow is August 16, just a week away, and the newsrooms that are partnering with us–the Guardian, NPR, the Washington Post, the New York Times, Vox Media, the Center for Investigative Reporting, and La Nacion in Buenos Aires–have articulated the opportunities fellows will have if they’re embedded with them. At the Guardian in London, incoming Executive Editor of Digital Aron Pilhofer sees “the unique vantage point” of a Knight-Mozilla Fellow:

You will be fully part of the London newsroom, able to collaborate with reporters, editors, graphics editors, interactive developers, designers and more. You’ll also have the ability to collaborate with business-side teams as well, including the Guardian’s world class digital development, analytics and product teams.

But as a Knight-Mozilla Fellow, your goal isn’t just to improve the Guardian; it’s to improve journalism as a whole, with one of the world’s most important newsrooms as your laboratory.


NPR wants a fellow to join their unique hybrid Visuals team in Washington, D.C. For Brian Boyer, the NPR Visuals editor, the fellow will be a teammate–plus:

You’ll be our teammate: making stuff with us, learning what we’ve learned, teaching us what you know and what you’re learning elsewhere during your fellowship year.


The Washington Post and the New York Times are teaming up with Mozilla and OpenNews to build a next-generation community platform for news. As the Washington Post’s Greg Barber writes, they’re bringing two Knight-Mozilla fellows into the New York-based team as well:

One thing we know for sure is that we want Knight-Mozilla Fellows with us, doing what they do best: experimenting and breaking boundaries. We want fellows to push the work our core team is doing in new directions, to think of things we haven’t, to be independent operators within this deeply collaborative project.


Vox Media sees their fellow as someone that can bridge their seven media sites and help “open source the elements that would be beneficial to the larger journalism community.” Writes Chief Product Officer Trei Brundrett:

We have benefited greatly from open source as we have aggressively built a media company from scratch. Now we’re eager to give back as an active member of the OpenNews community. This year at our hack week, VAX, we kicked off the process by making it easier for our teams to share our work with the open news community and releasing some code, but there is still much left to do. We want you to help us shape that commitment.


The Center for Investigative Reporting is looking for someone who “loves visual data” to help bolster their dataviz work. Writes Jennifer LaFleur, CIR’s Senior Editor for Data Journalism:

We work with many graphic designers and have featured their incredible work. But we’ve never had anyone dedicated to making our reporting and data analysis really shine. When it comes to news apps, we’ve been pretty good at faking it, but we know that we can really up our game.

We need to be able to tell readers things they don’t already know and are actually worth knowing. We need your help to communicate that information more effectively. We’ll challenge you to help users understand complex concepts and help us understand the best way to distill millions of relevant records into a compelling presentation.


In Buenos Aires, La Nacion’s data team is “motivated by the possibility to produce changes with our work, using technology to open data, especially in a country where there is no transparency law and with high levels of corruption,” explains data manager Momi Peralta Ramos. Their fellow would join their team in opening datasets and making them accessible to the Argentinian people. As their whole team explains in a subtitled video:


The opportunity to work with these incredible news organizations is yours. If you love to code and want to spend 10 months deeply immersed in the problem-sets of journalism, then apply now.

August 08, 2014 02:00 PM

August 05, 2014

Dan Sinker

OpenNews: So what is the Fellowship experience like anyway?

We are under two weeks until the end of our search for our 2015 Knight-Mozilla Fellows. If you love to code and want to spend ten months having an impact in journalism, you should apply.

Of course, you may have questions about what being a fellow is actually *like*, and so the last two weeks, our current Fellows have written about their experiences. Each experience–like each one of our fellows–is different, and the takeaways are unique. The Knight-Mozilla Fellowships are about writing great open-source code, but they are also about so much more. And what that is, is up to you.

Harlo Holmes, who has spent her fellowship year at the New York Times likens becoming a fellow to “Scrooge McDuck taking a swim in his vault.” Except in this case, the vault isn’t filled with money but instead “a veritable treasure trove of code libraries, frankenstein-y demos and PoCs, and wacky ideas.”

Ben Chartoff’s fellowship at the Washington Post has been all about learning:

I know so much more than I did half a year ago, and have so many more people and communities I can learn from. I may not be in school anymore, but I’m certainly a student. Today, tomorrow, and for the rest of my career, I will be learning every day, and I’m figuring out out how to life-long-learn because of this stupendous, magical, yes-it’s-really-that-great fellowship


For Gabriella Rodriguez, her fellowship at La Nacion–which involved moving her entire family of four from Portland, Oregon to Buenos Aires, Argentina–has been “una aventura!” Gaba has been focused on bringing more voices into journalism, she explains in Spanish, through her work on the VozData project, and by organizing “cafés de DATA” with the civic hacker community across the Rio de la Plata in Montevideo, Uruquay.

Brian Jacobs applied to become a Knight-Mozilla Fellow two years in a row, and his second time was the lucky one, landing him a fellowship at ProPublica in New York. His time as a fellow has been about doing the unexpected:

I’m working on a project that involves visualizing NASA data, integrating with repositories of satellite imagery, processing it in Photoshop, in the command-line, making it interactive in a news application, helping to create what I hope will be something really beautiful and worthwhile to explore. Working with data from space is basically the coolest thing I could be doing right now. Did I expect to be doing this? Not really. All I did was follow my interests, because I have less of a job description and more of a general mandate to work with incredibly smart people and make interesting things.


Marcos Vanetta moved from Buenos Aires to Austin, Texas for his fellowship year at the Texas Tribune. His first time in America, and his first time in a newsroom, he has adapted quickly. He writes in Spanish that after only four months, he’s writing software and participating in projects that are visited by thousands of people every day.

Aurelia Moser, who has had a joint East African fellowship with both Ushahidi and Internews Kenya, has juggled her collaboration with news partners, fellows, and many others in the journalism code community (the workflow has been tricky enough that she’s managing it with Github issues). And it’s embracing working in the open that has impacted her fellowship year the most:

Some of the more tacit benefits are nearly impossible to articulate without being gushy. It’s the stranger famery you’ll experience in the news community that clashes with your impulse to imposter syndrome; the kind where you’ll get requests to collaborate on projects from strangers instead of just your friends. Pre-fellowship, I never really had comments on my Github projects and my public code persona was pretty weak; 5 months in, I get regular email about blog posts I’ve written and repos I’ve open-sourced.


Each of our current fellows has had a singular experience. They have learned more about journalism, more about their coding skills, and more about working with others, and about themselves. As Ben Chartoff says in his post, “This fellowship has already changed my life.”

And, a year ago, each of them was were where you are right now: Wondering if they should apply, wondering what it would mean to their lives. They know the answer now because they applied. You have until midnight Eastern August 16 to find out for yourself.

August 05, 2014 09:29 PM

July 18, 2014

Dan Sinker

OpenNews: Why Develop in the Newsroom 2015 (part two)

This week, as part of our search for our 2015 Knight-Mozilla Fellows, who spend 10 months writing open code in the newsroom, we have asked others that develop in the newsroom why they do what they do.

The answers–we highlighted a couple on Wednesday–are still flowing in, but wanted to touch on two great ones, both from members of the team at Vox Media.

Lauren Rabaino, a product manager at Vox, outlines ten compelling reasons to write code in journalism. One hits on the fact that, in journalism, you’re constantly having to learn new things:

In order to execute on products that work, you have to force yourself to learn about processes and history and key players for topics you previously knew nothing about. Working in a newsroom with journalists is like going back to school, but more fun (there’s often a lot more cursing and whiskey and no tests except whether you’ve met the user’s needs).

Another of Lauren’s reasons hits hard at why *I* do this work: the ability to solve new problems:

The information industry has come far in recent years in evolving how we do storytelling in a digital world, but there’s still so much more to do, so much more progress to make, so many more problems to solve. This is a world that has immense and ever-growing potential at building the kinds of information solutions that help people live richer, more informed lives. And you can be a part of that. You can shape that. You can lead that. We need more leaders in this space.

For Ryan Mark, who recently joined the Vox team after a long stint developing at the Chicago Tribune, coding in journalism is personal:

I build for news because I’m building for myself. News and information, learning and knowledge is an extremely important part of my life. The free flow of knowledge that the internet has made possible has brought me happiness, wonder and purpose. I couldn’t imagine not being a part of it.

The application to apply to become a 2015 Knight-Mozilla Fellow is open until August 16. If you love to code, want to learn new things, challenge yourself, and help make information more open, you should apply today.

July 18, 2014 10:07 PM

July 16, 2014

Dan Sinker

OpenNews: Why Develop in the Newsroom? 2015 Remix.

One month from today, August 16, the search for our 2015 Knight-Mozilla Fellows will come to a close. Knight-Mozilla Fellows do amazing work–they spend 10 months embedded in newsrooms writing code to help solve journalistic problems–but they don’t do that work alone. When you become a Knight-Mozilla Fellow, you join two communities: a community of fellows (both your peers and alumn from the program), and a community of developers working in the newsroom.

To mark this final month of our 2015 Fellowship search, we’ve invited a lot of voices to talk about their experiences coding in the newsroom. Later in the month you’ll hear from our fellows (both current and past) and our news parnters as well. But this week we’re going to hear from the community of developers currently doing this work in newsrooms big and small around the world.

The developer community in journalism is a dynamic one, and there isn’t one single reason anyone decides to start coding in a newsroom instead of a startup or in the enterprise. Instead, developers start coding in newsrooms for all sorts of reasons.

This week (as we’ve done in the past), we’ve asked developers to share their reasons and experiences with you. These stories–we’ll share a few a day–are wonderful; each one a unique argument to join a singular community.

For Jeremy Bowers, a developer at the New York Times, journalism offers something different than traditional coding jobs. He explains:

We’ve got soul.

We’ve got a mission.

We’re self-critical.

We’ve got stacks of interesting structured data aching to be investigated and summarized. Our reporters are staring down the federal government, tracking people who are otherwise invisible and watching the epidemics most people don’t even know about.

Aaron Williams, who codes at the Center for Investigative Reporting, echoes Bowers when he says that, in traditional programming, “it’s not often the code you write influences the politics of the community.” But, Williams also adds:

I develop in a newsroom because, honestly, it’s just plain fun.

On any given day you may have to write a web crawler to harvest crime logs from your local law enforcement agency or use Mechanical Turk to crowdsource analysis of PDFs you received from a public records request.

On other days you’ll need a better map than Google offers and end up making creating your own slippy map tile set. Or you may start picking up libraries like pandas and SPSS to do some serious data analysis on a 25 GB data dump you’re trying to clean in another Terminal window.

Needless to say, you’ll stay busy and you’ll become a better developer than you ever thought.

Have fun and change the world while you do it: Become a 2015 Knight-Mozilla Fellow by applying today.

PS. if you’re a developer in the newsroom and want to contribute your voice to this collection as well, just let me know.

July 16, 2014 09:46 PM

June 19, 2014

Dan Sinker

OpenNews: Building New Communities with the New York Times and the Washington Post

Community is at the core of what we do at Knight-Mozilla OpenNews–helping to build and strengthen the community of people writing code in journalism. And community is a big part of what has made Mozilla successful–the global community of contributors that has helped to build the Firefox web browser.

Community is also at the core of journalism: whether it’s geographic communities that form the bedrock of local news or the communities of interest that form around subjects as broad as basketball and politics, journalism has always had community at its core.

Which is why it’s exciting to announce that today, Knight-Mozilla OpenNews, the New York Times, and the Washington Post are joining forces to create a next-generation community platform for journalism. The web offers all sorts of new and exciting ways of engaging with communities far beyond the ubiquitous (and often terrible) comments sections at the bottom of articles. We’re looking forward to writing code together to enable them.

We don’t see this project as a single product, but instead as building blocks for engaging communities throughout the web. Open source at its core, and focused on giving users unprecedented control over their identity and contributions, this is a project we believe in.

It’s also a unique collaboration between two of the largest and most respected news organizations in the world. Enabling that kind of collaboration is something that we’ve worked on for from the beginning at OpenNews. While this is a huge project–the grant is equal to the one that enables us to do our core work at OpenNews–it also feels like a natural extension of what we do.

Finally, this is a project that has the opportunity not only to improve community engagement in journalism, but to strengthen the web itself. Technologies like Backbone.js, D3, and Django have all been forged and tested in the demanding environment of the newsroom, and then gone on to transform the way people build on the web. We don’t know that there’s a Backbone lurking inside this project, but we’re sure as hell going to find out.

There’s much more to come, and we’ll be getting down to work soon. But for now, here’s to new experiments, to thinking big, and to communities, new and old–and all the the things we can accomplish, together.

June 19, 2014 01:00 PM

June 16, 2014

Dan Sinker

OpenNews: Apply to become a 2015 Knight-Mozilla Fellow

I’m excited to announce that starting today, applications to become a 2015 Knight-Mozilla Fellow are open. The Fellowships offer an opportunity for people that love to code to get paid to spend ten months building new things in collaboration with some of the best news organizations in the world. Fellows spend their time following their passions, working in the open, sharing ideas, traveling the world, and writing transformative code.

2015 marks our fourth year of the fellowship program, and we’re going strong with seven incredible news organizations:

Our news partners offer a home base for each fellow, colleagues to bounce ideas off of and collaborate with, and plenty of problem-sets to work with. Knight-Mozilla fellows are in the newsroom when news breaks and gets to feel the electricity in the air as the world changes.

This year’s partners represent some of the best we’ve yet assembled, pushing new boundaries in reporting, in visualizations, in presentation, and in the news product itself. From Argentina to England, from New York to the San Francisco Bay, our 2015 News Partners are trying new things and breaking new ground—and *you* can join them.

The Knight-Mozilla Fellowship year is an amazing chance for a creative coder, civic hacker, data geek, engineer, or technologist to challenge herself, to write amazing code, and to help journalism transform on the open web. This is a golden age of web-native journalism, and you can be on the cutting edge of it.

If you’re up for the challenge (and you should be), you have until August 16th to apply. We’ve made the application fast: just a few quick questions and links to your best stuff. You have a couple months, but should apply today.

June 16, 2014 09:10 PM

May 30, 2014

Erika Owens

Why you should attend SRCCON

SRCCON logoIt's hard to put into words how excited we are for our first conference, SRCCON! After seeing how well small group conversations and collaboration can work at the Mozilla Festival, we knew we had to bring that experience stateside. And bonus, it's in Philly.

SRCCON is designed to be a place for developers, interactive designers, and others who love to code in and near newsrooms to build and create together. But since it's the first time this event is happening, it might take a little convincing to get the time off work to attend. So here's some thoughts about why you (and your boss) should be excited about your participation in SRCCON.

  • The conference is going to be small. 125 people, that's it. As such, it's geared to lots and lots of small group conversations and workshops, which means you get to dictate what you want to talk about and take away from the event. If there's a question your newsroom is really struggling with, discuss it at SRCCON. If there's a project you are super excited about, discuss it at SRCCON.

    •  Related, other people are going to want to discuss these same concerns and when you talk from you're experience they'll remember "oh yeah, [your name and org] grappled with that too and they learned X."

  • The conference is not an isolated event. It's part of a community ecosystem. As such, it will likely give you concrete next steps, whether it be other speaking opportunities, writing opportunities, connections with peers at newsrooms (and the possibility of support for something you want to work on together, like during a code convening), or training opportunities. It's connected with OpenNews' other work (which you and other folks in the community shape) and it is meant to make it easier for you to directly connect with other folks and organizations too.

  • We've got your back and want you to be there. Erin Kissane put a lot of effort in the code of conduct for the event to make explicit what our expectations are for participation and what we will do if anyone feels unsafe. It's really important to us that SRCCON be accessible and open to anyone who would like to attend. We've set up a travel scholarship and put a lot of effort into outreach to help make sure the conference reflects the diversity of the communities we serve.

  • And, in bottom line fashion: it's affordable. Tickets are $150 and cover you for two days, a couple meals, and snacks. It's also happening in Philadelphia, which is a not expensive by big-city standards. $2 slices of pizza are right around the corner from the venue, as are many other diversions.

Now that you're convinced, pick up a ticket. And please, once you have, help us spread the word. Can't wait to see this community in person, and have the chance to show everyone what makes Philly such an exciting city for journalism, technology, and civic hacking.

May 30, 2014 02:37 PM

February 27, 2014

Erika Owens

OpenNews at NICAR 2014

Baltimore view

The lobby of the Marriott in downtown Baltimore is already crowded with news nerds and the National Institute for Computer-Assisted Reporting annual conference has only just begun. A bunch of OpenNews staff and Fellows are in town and we've got a lot of exciting stuff to share with the NICAR community this year. This is your cheat sheet to where to find OpenNews and what you'll want to chat with us about.

Sessions

OpenNews staff and Knight-Mozilla Fellow alumni are participating in several sessions:

  • Thursday, 3pm - Mike Tigas will be on panel entitled Docs! Docs! Docs!
  • Friday, 9am - Ryan Pitts will co-lead a demo entitled A tour of the new Census Reporter
  • Friday, 10am - Mike will co-lead a hands-on training entitled Liberate data from PDFs with Tabula
  • Friday, 11am - Noah Veltman will be on a panel entitled Creating maps: Principles, mistakes and potential
  • Friday, 3pm - Ryan Pitts will be on a panel entitled DataViz for everyone: A practical guide to going responsive
  • Saturday, 3pm - Noah Veltman and Erika Owens will be on a panel entitled Crossing the language boundaries across your newsroom: journo to dev and back
  • Sunday, 10am - Brian Abelson will co-lead a demo entitled Build your bot army

 

OpenNews News

We've got a lot to talk with you about:

  • Last week, we launched Source Jobs, a listing of journalism code job postings. During NICAR, talk to any OpenNews staff person about how to get your organization signed up to list jobs.
  • We have the date and location picked for the first ever SRCCON (Source Conference, an event specifically for the news nerds community focused on making and thinking together). We'll see you in Philadelphia on July 24 and 25.
  • You're excited about the lightning talks on Friday and so are we. Enjoy the energy in the ballroom, and then check Source over the next two weeks for articles expanding on the ideas covered in the talks.
  • As NICAR closes, we're going to launch yet another feature on Source: Guides. You’re going to be heading back to your job, and guides will offer some context for putting the new things you learned this weekend into practice.
  • News partners are central to our work, from our Fellowships and Hack Days to a new program we're developing this year to be more engaged with teams in smaller newsrooms. Wherever you're based, we want to talk with you about opportunities for collaboration.

 

OpenNews Event

We're also organizing our own event during NICAR, in partnership with the Pop Up Archive. On Sunday, join us at the Newseum for a design day around the topic of archiving news apps. We've arranged for a van to transport people from the conference hotel in Baltimore to DC on Sunday morning. Join the road trip and then spend the day working with archivists, educators, coders, and journalists to sketch out some ways to handle the conundrum of keeping news applications accessible over time.

 

Say Hello!

Many OpenNews staff, plus current and alumni Knight-Mozilla Fellows are in town. OpenNews staff in the crowd include me, Erin Kissane, Ryan Pitts, and Kio Stark. Fellows at the event include 2014 Fellows Ben Chartoff, Harlo Holmes, Brian Jacobs, Aurelia Moser, Gabriela Rodriguez, Marcos Vanetta; 2013 Fellows Brian Abelson, Annabel Church, Sonya Song, Mike Tigas, Noah Veltman. And, if you'd like to say hi but cannot locate a person or are attending NICAR virtually, you can always ping us @opennews or @source.

February 27, 2014 05:05 PM

February 19, 2014

Dan Sinker

OpenNews: Announcing Source Jobs

As journalism continues to break new ground on the web, news organizations large and small are hiring developers, designers, and others who bring new skills and ideas to journalism. Growing the community of talented developers working in news is one of the things we try to do at OpenNews. Our Fellowship program, our sponsorship of hack days, our website Source–it’s all part of trying to build the community of folks coding in news. Today we’re taking a very direct path to that: We’re launching a new section on Source that will list the latest journalism-code jobs.

Source is designed (from the database up) around the people building journalism on the web. Jobs is a natural compliment to the project breakdowns, behind-the-scenes articles, Q&As, and learning pieces that we feature on Source: you can learn how it’s done, and then you can go and do it in some of the best newsrooms around the world.

The listings are lightweight: a one-sentence job descriptions and a link to the full listing on an external site. They’re also self-serve. Today we’ve also opened up an organizational backend on Source so news orgs can list their own jobs. Erin Kissane explains how to get the keys to your organization page on the official announcement.

This is an exciting time for journalism and an exciting time to code in news. We’re thrilled to be able to play a small part in helping to bring talent into newsrooms. And we can’t wait to see the code all these new jobs produce!

Source Jobs is the first of many new features to come on Source, all possible thanks to our renewed grant that puts additional emphasis on community building and Source in particular. Expect much more to come soon–including dates and a location for the SRCCON conference, which we’ll be announcing at the NICAR conference next week.

February 19, 2014 03:49 PM

February 07, 2014

Erika Owens

Smart people working on a tough problem: NICAR news apps archive designathon

Sometimes I look around the room at a conference and am just awed by the intelligence and energy and generosity of everyone in the room. The annual NICAR conference, to me, is one of the embodiments of what is possible when you get smart people together to actually learn, teach, and talk with one another. Though NICAR is not focused on the news nerds community exclusively, it is one of the few events that brings a huge chunk of that group together in person. We learn and mentor one another, but don't necessarily get a lot of time to build, to do hands-on work with those colleagues at other newsrooms.

So this year, OpenNews is teaming up with the Pop Up Archive and Newseum to host a designathon the Sunday of NICAR. We'll take a brief roadtrip from Baltimore to DC to work together at the Newseum on the topic of archiving news apps. The idea was sparked by Jacob Harris and a discussion he ran at Newsfoo. On Source, Jacob laid out the archiving conundrum facing data journalists:

"Narrative journalists rarely think about this infrastructure. It’s just there for everything they write, because everything they write goes through the CMS and there are strong archival and financial reasons to syndicate, index and archive that content for posterity. But, then there’s us data journalists. Remember, we decided to pitch our tents outside the CMS so we can build exciting and new types of interactive website experiences. Which often means that our work is invisible in this greater world. It doesn’t show up in site search. It doesn’t show up in Google News. It isn’t rankable on the homepage. Our projects look like they belong to the website, but they are also fundamentally apart and often invisible when running. When they are mothballed, they can vanish almost completely."

On March 2, we'll spend the day working with news nerds and archivists, educators and journalists to get a better handle on what the scope of the archiving issue for news apps is and what are some tactics or tools we can use to deal with it. Later in March, a group will reconvene for two-days of hacking to build out the designs sketched out during NICAR.

Want to get involved?

As Jacob notes, there's a lot of people in other disciplines already working on archiving web content. With NICAR's proximity to DC and many of those institutions, we have an excellent opportunity to get heads-down and do some planning and sketching. Please, join us on March 2 at the Newseum.

And...if spending a couple of days chatting and building with super smart news nerds sounds of interest to you, just wait for the SRCCON announcement during NICAR.

February 07, 2014 07:00 PM

January 28, 2014

Friedrich Lindenberg

OpenInterests.eu: relating lobbying, expert groups and public finance in the European Union

During last weekend's #EPhack in Brussels, I built a minimalistic frontend for OpenInterests.eu. The site lets everyone explore which people, companies and institutions have political or financial interests in decisions of the European Union institutions.

OpenInterests.eu
OpenInterests.eu interface prototype from the #hackEP coding session.

What is it good for?

While it's still an early prototype, my hope is to offer accessible briefing pages on individual actors, find surprising overlaps between different categories of activities (such as lobbyists which are part of expert groups), and offer summary reports about financial beneficiaries of EU procurement, lobbying activity and expert group consultations.

In some ways, the site brings together three projects that I've been involved in over the past three years: a (still unreleased) effort to build a more accessible version of the EU lobby register; an analysis of the Commission's direct expenditure (FTS) we did as part of OpenSpending; and the contract awards data we extracted as part of OpenTED. Each of these datasets would make for an excellent news application on their own. By combining them, I hope to discover a type of serendipitous overlap that would reveal real-life interactions.

A possible concern with this approach is that I would need to consider yet another aspect for the picture to be fully relevant (e.g. find data on the ways in which organisations involved in lobbying benefit from EU policy).

While that's always going to be true, having some basic information about actors and their relations available makes for a good research tool. Of course, I hope to integrate new sources of information, such as the MEPs declarations of financial interest. But the result will never be a complete picture of influence in a complex environment like Brussels.

How it works

Most of the work behind OpenInterests.eu is focussed on data extraction: the application includes a set of scrapers for the EC/EP register of interests, register of expert groups, the EC's financial transparency system and TED, the joint European procurement system.

Once the data has been extracted, it gets loaded into a common database where some cleaning is performed as well as geo-coding and, soon, reconciliation with OpenCorporates. What comes out of this process is still fairly messy, though - and we're going to need editorial work to fix the data eventually.

The application itself is powered by grano, my everlasting code name for a social network analysis tool. In its current revision, grano provides a graph framework where each actor and relationship are defined by a set of properties. Since we're combining different data sources, the provenance of each property's value is tracked individually - turning all 'facts' in grano into sourced 'assertions'.

The web interface of OpenInterests.eu sits on top of this graph, providing a bespoke interface for the EU datasets. In the future, I hope to remove more of the current attribute tables and make the presentation of the information much more specific to the semantics of the data (show maps for geographic information, tables for funding data, and text styled for easy reading).

Whats next?

While there are a lot of potential additional data sources for the OpenInterests.eu graph, my next step has to be to improve the quality of the information that is there. This includes pushing forward the de-duplication of entities (including linking companies to OpenCorporates), providing context and documentation on the data that is included and – perhaps most importantly – a full excerpt of all data in simple formats. Some of the included material - such as four years worth of EU procurement data - aren't publicly available at all yet.

Following that, I'm hoping to add some basic reporting functions: listing the companies which have received the most in grants, contracts; which organisations are particularly active in expert groups and who has the largest lobbying budget. Additionally, the graph structure will start to provide its own metrics, such as the centrality of directorates and people, or the betweenness of think tanks and industry forums.

All of this is, as usual, overly ambitious but fun to think about. If you want to help make some of this work, get in touch or submit a pull on GitHub :)

January 28, 2014 12:00 AM

December 30, 2013

Dan Sinker

OpenNews: Kicking 2014 Off Early

2013 was an incredible year for OpenNews. Our Knight-Mozilla Fellows did fantastic work; Source continued to grow as a hub for the incredible work done by the news nerd community; we helped to sponsor more than 50 news hack days around the world, and much much more. But 2013 is almost over and, in these waning days of it, I wanted to tell you about some amazing stuff that’s happening right out of the gate in 2014:

Surprise Sixth Fellow!

When we announced our 2014 Fellows at the Mozilla Festival in London this year, our friends at the Knight Foundation approached us about adding a sixth fellow, to be hosted by the team doing great work at the Washington Post. We jumped at the opportunity, in part because we received so many stunning applicants for our original fellowship search we were excited to revisit the list and find someone amazing to work with. And today, I’m thrilled to announce our sixth 2014 fellow:

Ben Chartoff designs and creates data visualizations. He is committed to building data literacy and numeracy through viscerally clear and compelling visuals. At the Sunlight Foundation in Washington, DC, Ben worked to demonstrate the value of open government and open data as essential elements in a democracy. He has a background in both the arts and sciences, and strives to bring both beauty and rigor to every project. He is passionate about most things, including food and backpacking.

Ben will be joining our five other fantastic 2014 Knight-Mozilla Fellows at our Fellowship Onboarding event in San Francisco in mid-January. We’re so excited!

We’re growing in 2014!

One of the most exciting aspects of our new grant is the ability to add some staff to OpenNews. And today I’m so excited to announce that in 2014, Erin Kissane will be joining us as Director of Content and Ryan Pitts will be joining us as Director of Code. We’ve been lucky enough to work with Erin and Ryan extensively on the Source project, but starting in 2014 (Erin immediately, Ryan a little later in the year), they’ll be joining as full-time partners in OpenNews. We’re *thrilled* to have them on board and excited about what that’ll mean for everything we can accomplish together.

And much more to come

In early December, Erin, Ryan, Erika Owens, Kio Stark, and myself got together in New York City for two days of building a calendar and a plan for 2014. There is so much to come this year, from SRCCON (our maker-heavy Source conference for the journalism-code community) mid-year, to two Code Convenings that will bring news developers together to open-source code, to learning and hacking events around the world, and much, much more. 2014 is going to be an incredible year.

Here’s a quick look at our whiteboarded calendar, with much much more to come:

Get ready for maximum OpenNews ass-kickery in 2014!

December 30, 2013 05:18 PM

December 17, 2013

Brian Abelson

Scrape the Gibson: Python skills for data scrapers

Most of the code in this post is based on a workshop my fellow ex-OpenNews fellow @pudo gave at Hacks Hackers Buenos Aires Media Party.


Two years ago, I learned I had superpowers. Steve Romalewski was working on some fascinating analyses of CitiBike locations and needed some help scraping information from the city’s data portal. Cobbling together the little I knew about R, I wrote a simple scraper to fetch the json files for each bike share location and output it as a csv. When I opened the clean data in Excel, the feeling was tantamount to this scene from Hackers:




Ever since then I’ve spent a good portion of my life scraping data from websites. From movies, to bird sounds, to missed connections, and john boards (don’t ask, I promise it’s for good!), there’s not much I haven’t tried to scrape. In many cases, I dont’t even analyze the data I’ve obtained, and the whole process amounts to a nerdy version of sport hunting, with my comma-delimited trophies mounted proudly on Amazon S3.


In this time, I’ve learned a lot about what not to do when scraping websites. This trial-and-error process has involved, among other fiascos, having my office’s IP address permanently banned from IMDB. At one point, I got so frustrated with R’s error handling that I just wrote my own library to do it. In the past year, however, I’ve started using python for scraping, and have learned a tremendous amount from my fellow ex-OpenNews Fellow Friedrich Lindenberg. In this post I hope to share some of this knowledge.


Basics

There are many amazing libraries for parsing HTML in python – pyquery, scrapy, and cssselect to name a few – but I’ve always preferred BeautifulSoup for it’s deceptive simplicity. Below, we’ll walk through a simple example of scraping missed connections from CraigsList. As I further develop and refine this code, I’ll introduce some best practices I’ve learned from @pudo.


Let’s go through a simple example of retrieving missed connections from craigslist. To run the code on your computer you’ll need to have four python modules installed: requests, BeautifulSoup, dataset, and thready. To install these, you can run this command in your terminal:

sudo pip install beautifulsoup4 requests dataset thready

If this returns and error, try installing pip and running the command again:

sudo easy_install pip
sudo pip install beautifulsoup4 requests dataset thready

If everything works you should be able to open a python terminal and import the libraries with no errors:


python
>>> import requests
>>> from bs4 import BeautifulSoup
>>> import dataset
>>> from thready import threaded


Alright, now that we’re ready to get started, let’s walk through a basic scraper for CraigsList missed connections. In the code snippet below I add detailed comments that explain what I’m doing for each line. You can follow along with each of these scripts on this github repository.


#### 00-basics.py

import requests
from bs4 import BeautifulSoup
from pprint import pprint
from urlparse import urljoin

# The base url for craigslist in New York
BASE_URL = 'http://newyork.craigslist.org/'

def scrape_missed_connections():
    """ Scrape all the missed connections from a list """
    
    # Download the list of missed connections

    # here were using requests, 
    # a python library for accessing the web

    # we add "mis/" to the url to tell requests
    # to get the missed connections 
    # on newyork.craigslist.org

    response = requests.get(BASE_URL + "mis/")

    # parse HTML using Beautiful Soup
    # this returns a `soup` object which
    # gives us convenience methods for parsing html

    soup = BeautifulSoup(response.content)

    # find all the posts in the page.

    # here we're telling BeautifulSoup to get us every
    # span tag that has a class that equals pl

    # these tags might look something like this:
    # <span class='pl'> {content} </span>

    missed_connections = soup.find_all('span', {'class':'pl'})

    # Get all the links to missed connection pages:
    for missed_connection in missed_connections:
        
        # for each span list, find the "a" tag which 
        # represents the link to the missed connection page.

        link = missed_connection.find('a').attrs['href']
        
        # join this relative link with the 
        # BASE_URL to create an absolute link

        url = urljoin(BASE_URL, link)
        
        # pass this url to a function (defined below) to scrape 
        # info about that missed connection

        scrape_missed_connection(url)

def scrape_missed_connection(url):
    """ Extract information from a missed connections's page. """

    # retrieve the missed connection with requests

    response = requests.get(url)

    # Parse the html of the missed connection post

    soup = BeautifulSoup(response.content)

    # Extract the actual contents of some HTML elements:

    # here were using BeautifulSoup's `text` method for retrieving
    # the plain text within each HTML element.

    # see and example of what this page looks like here:

    data = {
        'source_url': url,
        'subject': soup.find('h2', {'class':'postingtitle'}).text.strip(),
        'body': soup.find('section', {'id':'postingbody'}).text.strip(),
        'datetime': soup.find('time').attrs['datetime']
    }

    # Print it prettily. 
    pprint(data)

if __name__ == '__main__':
    scrape_missed_connections()


If all goes well you should see a series of python dictionaries printed to your console:

{
    'body': 'I saw you on your way home from work last night. I hoped to see you on the way to work this morning, and I did. Actually, I usually see you on the way to work. I wanted to say hello this morning, and I stupidly smiled at you wanting you to smile back. You looked at me in acknowledgement but that seemed about it. I will have to talk to you the next time I see you and tell you how cute you look in that hat you were wearing.',
    'datetime': '2013-12-18T10:50:29-0500',
    'source_url': 'http://newyork.craigslist.org/brk/mis/4249329594.html',
    'subject': 'A train to work this morning - m4w'
}


Databases

We could stop here and probably be fine, but it’s usually a better idea to save the data you scrape into a database. This way, if the script breaks midway through execution, we can retain the information we scraped up until that point. In addition, by using a database, we can also quickly construct an API or app on top of the data we scrape. Luckily, @pudo wrote an amazing python library called dataset that makes writing to a database as easy as writing json to a file. To incorporate it into our script, we only need to change three lines:


01-dataset.py
import dataset

# connect to a database

# here we're just going to use sqlite3 which is a lightweight
# SQL store, ideal for most simple scraping jobs.  However, we could
# easily use MySQL or PostgreSQL by simply swapping out the path
# to the database:

db = dataset.connect('sqlite:///missed_connections.db')

...

# now instead of simply printing our data to the console,
# lets put it into our database

# here db['posts'] signifies that we are going to insert data
# into the 'posts' table of the database. We'll use "upsert"
# instead of "insert" because we'll probably want to run the
# scraper a few times to test it, and this way we won't continually
# add duplicate records to our database.

db['posts'].upsert(data, ['source_url'])

Putting it all together, our new script should look something like this.


Caching

One of the most common scraping problems is realizing your script is buggy midway through execution and having to start over from scratch. This isn’t too big of a problem if you’re scraping a few pages, but if you’re trying to pull in everything from IMDB or CraigsList, you’ll slowly drive yourself insane when, three hours into a big job, you realize you forgot to grab an important piece of data. One easy way to deal with this problem is to cache the html files that you’re scraping (in other words, save them to a local file).


To implement this, we need to write a function that checks whether we’ve already saved a local version of the page already and, if so, load the cached version rather than hitting the site’s server again. If not, we’ll proceed as normal and request the page from the site’s server and then save a version of it locally:


02-caching.py
import os
from hashlib import sha1


# a directory for caching file's we've already downloaded
CACHE_DIR = os.path.join(os.path.dirname(__file__), 'cache')

def url_to_filename(url):
    """ Make a URL into a file name, using SHA1 hashes. """

    # use a sha1 hash to convert the url into a unique filename
    hash_file = sha1(url).hexdigest() + '.html'
    return os.path.join(CACHE_DIR, hash_file)


def store_local(url, content):
    """ Save a local copy of the file. """

    # If the cache directory does not exist, make one.
    if not os.path.isdir(CACHE_DIR):
        os.makedirs(CACHE_DIR)

    # Save to disk.
    local_path = url_to_filename(url)
    with open(local_path, 'wb') as f:
        f.write(content)


def load_local(url):
    """ Read a local copy of a URL. """
    local_path = url_to_filename(url)
    if not os.path.exists(local_path):
        return None

    with open(local_path, 'rb') as f:
        return f.read()


def get_content(url):
    """ Wrap requests.get() """
    content = load_local(url)
    if content is None:
        response = requests.get(url)
        content = response.content
        store_local(url, content)
    return content


Now, everytime we request a new missed connection, we should use our get_content function instead of requests.get(). Merging this code in, our script should now look this.


Multithreading

Up to this point, our script has only been capable of downloading a single missed connection at a time. It turns out that a single processor is capable of executing multiple tasks at a time via something called “multithreading.” This is different than parallel processing where a set of tasks are executed across across a series of networked computers. In the case of our task – scraping multiple missed connections – this means that instead of simply looping through the list of each missed connection, that we’ll first detect all the urls to the missed connection pages and then download and parse these pages utilizing multiple threads within a single processor. It turns out that, once again, @pudo has solved this problem for us. With a simple module he wrote named thready, we can pass this list of urls to our function that scrapes each missed connection and very quickly and easily increase the speed with which we parse all the pages. This is implemented by modifying our scrape_missed_connections function as follows:


03-multithreading.py
from thready import threaded

...

def scrape_missed_connections():
    """ Scrape all the missed connections from a list """

    response = requests.get(BASE_URL + "mis/")
    soup = BeautifulSoup(response.content)
    missed_connections = soup.find_all('span', {'class':'pl'})

    # create an empty list of urls to scrape 
    urls = []
    for missed_connection in missed_connections:

        link = missed_connection.find('a').attrs['href']
        url = urljoin(BASE_URL, link)
        
        # iteratively populate this list 
        urls.append(url)


    # download and parse these missed connections using
    # multiple threads
    threaded(urls, scrape_missed_connection, num_threads=10)


Now when we execute this script, it should run much, much faster than our previous scripts. Be warned, however, many sites do not appreciate you requesting multiple pages at once and may ban you from the site for throttling their servers. Make sure to excercise caution and be respectful when utilizing multiple threads to scrape a site.


With these three simple skills, you should be able to start scraping the web like a true hacker. Enjoy!

December 17, 2013 05:00 AM

December 12, 2013

Sonya Song

Psychology of Sharing on Social Media: Attention, Emotion and reaction

I am very glad to see my Boston Globe/Facebook study well received by curious readers and featured by several organizations, such as the Harvard Nieman Journalism Lab, Chartbeat, Social Fresh, and ISHP Consulting. Meanwhile, I've been giving talks on this research at different places, including the Boston Globe, Mozilla Festival in London, Spiegel Online in Hamburg and Hacks/Hackers Berlin. If you find this research interesting and want to further the discussion, please buzz me on Twitter @sonya2song or drop me a line at sonya2song#gmail. Please also feel free to download the slides (last updated on December 9, 2013) developed for my presentations.

In the previous study, I presented data analysis that examined how users read and share Boston Globe posts on its Facebook Page. In this extended analysis, I’ve included qualitative analysis with a focus on content, cognition and emotion. My goal is to help newsrooms better promote their stories on and attract more attention from social media.

To achieve this goal, I’ve been digging into psychology literature for inspirations. Overjoyed, I’ve discovered some theories and findings that are portable to the social media environment:

  • Two modes of thinking, fast and slow, attract different types of attention.
  • Sharing on social media is
    • Charged with emotions,
    • Bounded by self-image management, and also
    • By concerns over relationship with others.

Again, this report is based on the three key metrics featured by Facebook Insights: reach, engaged users, and talking about this. According to Facebook, reach is defined as “the number of unique people who have seen your post”; engaged users as “the number of unique people who have clicked on your post”; and talking about this as “the number of unique people who have created a story from your Page post. Stories are created when someone likes, comments on or shares your posts; answers a question you posted; or responds to your event”. These metrics are counted as absolute numbers of unique visitors in various ways and reflect user behavior from passive consumption to active interaction.


-    *    -    *    -    *    -    *    -    *    -    *    -    *    -    *    -


BIG PICTURES and BREAKING NEWS, OH YEAH!


In the proto-analysis of Boston Globe traffic on Facebook, I reported the findings on image size and the “BREAKING” label. The general pattern is that illustrating a post with an image is associated with higher traffic compared to no image, so is a large image compared to a thumbnail. This pattern holds across three key metrics by Facebook (Figure 2). In addition, mere “BREAKING” is associated with a higher reach, although not with engagement or talking about this (Figure 1). In fact, not only BREAKING NEWS but also other uppercase words are associated with a higher reach, including WEATHER WATCH, MAJOR UPDATE, BIG PICTURE, NOW LIVE, etc.

As a hardworking journalist, you may tell me it’s upsetting to know that readers are attracted to this kind of superficial stuff like BIG PICTURES and BREAKING NEWS. But the good news for you is that the attention triggered by primitive tricks is fairly cheap. To gain more engaged attention, sophisticated messages would be a better choice, which we’ll discuss in the section on cognitive strain and System 2.

Figure 1: "Breaking" is associated with higher "reach"

Figure 2: Larger images are associated with higher traffic
 

As a not quite positive example, MIT Technology Review may show us how to gain little attention. Look at its Facebook Page, we can see a lot of big T’s, certainly the logo of the magazine. It’s quite obvious that the stories are shared as links and the logo is automatically extracted by Facebook. As such, these stories have failed to have an interesting or simply relevant visual companion. The repeated T’s may also have turned the fans blind toward this symbol. The sad situation is that, even though the Review generates a lot of thrilling stories, its Facebook presence is far from compelling—you may have noticed the small numbers of shares and likes in Figure 3.

Figure 3: Facebook Page of MIT Technology Review


-    *    -    *    -    *    -    *    -    *    -    *    -    *    -    *    -


Thinking, Fast and Slow


To understand how we deal with simple and complex stimuli (e.g., text, pictures, puzzles, etc.), Daniel Kahneman’s (2011) Thinking, Fast and Slow is a good read. In this book, Kahneman examines various theories and findings related to two thinking modes of humans: System 1 (fast) and System 2 (slow).

 

System 1 – Unconscious Attention

 

System 1 deals with innate skills that are crucial for survival and it works fast and automatically. One example is that we withdraw our fingers from fire before we realize what happened. Another one is to drive a car on an empty highway. System 1 is completely involuntary, for instance,  when seeing 1 + 1 = ____, we feel compelled to fill in the blank. In other words, System 1 can’t be turned off, since it’s crucial for our survival. When hearing a sudden noise like an explosion, we’ll turn our heads to orient the source and wonder if danger arises. Besides innate skills, abilities gained through prolonged practice can also be handled by System 1, such as envisioning next steps for chess masters, or 210 = 1,024 for computer scientists.

System 2 – Conscious Attention

 

By contrast, System 2 handles learned skills, such as a foreign language, logical reasoning, mathematics, etc. It’s slow and effortful, as how we feel when calculating 23 x 67 = ____. In other words, it demands much attention, like driving on the left if someone hasn’t done it before. Distinct from System 1, System 2 doesn’t always stand by and that’s why we are vulnerable to marketing and advertising techniques. In addition, System 2 can be refrained when we are tired, hungry or in a bad mood. Imagine how tough it is to take a test after staying up all night to prepare for it.

Division of Labor and Law of Least Effort

 

Between System 1 and 2, labor is divided. Most of the time, we’re on the fast mode of System 1. Meanwhile, System 1 assesses the environment and determines if it needs to call System 2 for extra effort. When difficulty, conflict and pressure are detected, System 2 will be mobilized and take control.  System 2 processes information perceived by System 1 and corrects it if necessary. Also, System 2 overcomes the impulse of system 1. For instance, we have to make an effort to suppress emotions at work when an argument escalates. Hence, System 2 has the last word.

However, laziness is our nature as animals and we tend to retain energy for unexpected threats in the future. Since mental effort also consumes resources (e.g., glucose), it has limited capacity. Therefore, we by nature have a constant aversion to making efforts and let System 1 take the lead.

Now consider how we use social media: We browse posts fairly fast. Although occasionally we jump into an online debate, while quickly scrolling a large or small screen, we generally feel relaxed, relieved, and even pleased. In other words, we come to social media for an easy time rather than challenges.

 

Cognitive Ease and System 1


This easy time on social media is what Kahneman (2011) calls “cognitive ease”. He describes cognitive ease as “a sign that things are going well—no threats, no major news, no need to redirect attention or mobilize effort” (p. 59). In other words, System 1 is in charge and System 2 is dozing.

Figure 4:Causes and consequences of cognitive ease


Figure 4 illustrates a variety of causes and consequences of cognitive ease. (Maybe you have sensed the cognitive theories advertisers have been exploiting: Repetition, clarity, and happiness are more likely to make you feel better and persuade to believe some product or service deserves your money). Like clear display, legibility effects how easily we parse a piece of information.

 

BREAKING NEWS and Legibility


Highly legible text will result in cognitive ease and pamper System 1 pretty well. Look at the following example given by Kahneman:

Adolf Hitler was born in 1892.
Adolf Hitler was born in 1887.

The experiment shows that more people would believe the first statement is true compared to the second—the fact is that Hitler was born in 1889. It’s as simple as you have guessed: the first statement is in bold font and is more legible.

If we port this psychological finding to social media, we may realize BREAKING NEWS is playing a similar role and thus attracting more attention, though nearly unconsciously. Since we can’t customize the size, color or font of text we publish on social media, using uppercase is the only option we change the level of legibility of our posts. If you’re desperate to overcome the barrier on social media, you may find lolcats inspiring.

Like uppercase text, large pictures are the same eye candy that soothes System 1 and attracts nearly unconscious attention from social media users.

 

Simple Text and Readability


Another way to cater to System 1 is to make the text simple and easy to parse. Readability is used to measure this aspect. A number of formulas have been established to measure readability.

As expected, number of average syllables per word was negatively correlated with all three metrics, namely “reach”, “engaged users”, and “talking about this”. The implication for social media editors is to prepare messages that cater to fast reading, because people are often guided by System 1 on social media, even though audiences of various media outlets name readable language at different levels (check out Figure 8 for the readability level across newsrooms). Presumably, once people are led away from Facebook, they will switch gear from fast to slow mode. But before that, each post has merely a split second to catch people’s attention.

On the subject of simple language, Orwell stresses that, regardless of literary use, simple words enhance language as an instrument of expressing thought. You may test yourself with the following example and see if you can smoothly slide your focus from the first to the last word of this sentence.

“Objective considerations of contemporary phenomena compel the conclusion that success or failure in competitive activities exhibits no tendency to be commensurate with innate capacity, but that a considerable element of the unpredictable must invariably be taken into account.” – Politics and the English LanguageGeorge Orwell, 1946


This paragraph is actually “translated” by Orwell from Ecclesiastes to make the point that text can be unnecessarily difficult. The original text reads as follow:

“I returned and saw under the sun, that the race is not to the swift, nor the battle to the strong, neither yet bread to the wise, nor yet riches to men of understanding, nor yet favour to men of skill; but time and chance happeneth to them all.”

To make Orwell’s point more precise, the original text scores at 18.5 while the translation at 27.5, based on the Flesch-Kincaid formula that I’ll cover in the section on cognitive strain and System 2.

 

Priming Effects and Asking Questions on Social Media


Related to System 1, psychologists have discovered many interesting phenomena, such as priming effects. Priming effects is an overarching concept and discusses people’s behavior under nearly unconsciously influence. For instance, ask two groups of people to complete SO_P; before this task, one group saw EAT and the other saw WASH. The EAT group would more likely finish it as SOUP and the other SOAP. Here, EAT primes SOUP and WASH primes SOAP.

Another experiment relevant to unconscious influence was conducted at the University of Newcastle (Bateson et al., 2006). An honesty box was placed in office to collect payment for tea and coffee. Different posters were displayed in turn above the price list. The posters had no text but featured two themes, eyes or flowers. The results (Figure 5) showed that during the “eyed” weeks, more money was paid than the floral weeks. This experiment nicely demonstrated that “priming phenomena arise in System 1 and you have no conscious access to them” (Kahneman, 2011, p. 57).

Figure 5: Eyes on you, Bateson et al., 2006

If we apply this concept to social media, what could we find? I started exploring evidence with a small task. The research question is: Could questions raised on social media prime people’s behavior of answering them? That’s to say, would a question mark generate more comments on a post?

Figure 6: Asking questions is associated with more comments

Yes, questions were significantly correlated with more comments (note: not with likes or shares), after controlling news section, sharing time and other factors. The general pattern is that more reach is associated with more engagement and more engagement with more likes, shares and comments (see Figure 10). The posts with a question, by contrast, more likely appear above the trend line, indicating better performance, despite some outliers beneath. Based on a sample collected within two weeks, the posts with a question is associated with 80% more comments.

Again, the notion for this finding is that don’t overuse it like the temptation from BREAKING NEWS. Although people say there’s no such a thing called a dumb question, you’ll be well aware when your question isn’t that smart. In addition, “the effects of the primes are robust but not necessarily large” (Kahneman, 2011, p. 56). That suggests that content itself remains the main drive for the traffic, while promotions help to some extent.

In discussing a variety of priming effects and anchoring effects, two phenomena related to the fast thinking mode (System 1), Kahneman notes that it’s human nature to be influenced unconsciously, although we can make extra effort to rein in our System 1. The following quote should ease both readers and journalists who may have developed moral concerns over these findings:

Your thoughts and behavior may be influenced by stimuli to which you pay no attention at all, and even by stimuli of which you are completely unaware. The main moral of priming research is that our thoughts and our behavior are influenced, much more than we know or want, by the environment of the moment. Many people find the priming results unbelievable because they do not correspond to subjective experience. Many others find the results upsetting because they threaten the subjective sense of agency and autonomy… If the stakes are high you should mobilize yourself (your System 2) to combat the effect. (p.128).

 

Cognitive Strain and System 2

 

Now you hardworking journalists are about to learn some encouraging and exciting findings of this research: Your sophisticated messages deserve more attention, not necessarily of higher quantity but probably of higher quality.

Figure 7: Causes and consequences of cognitive strain


In contrast to cognitive ease, cognitive strain has the opposite causes and consequences (Figure 7). Guided by System 2, people get more vigilant and make fewer errors, but meanwhile become less creative and feel more effortful.

Given the pretty flowchart, in practice, how shall we activate System 2 and engage people’s slow thinking? Here’s an interesting experiment. There are some tricky questions that people often get wrong. Three of them are included in Shane Frederick’s Cognitive Reflection Test. This test is so tricky that even students from top schools would give wrong answers. However, when the test was given in a washed-out poor print, the error rate dropped from 90% to 35% (Alter et al., 2007). This experiment has demonstrated that cognitive strain mobilizes System 2 and System 2 engages slow and careful thinking.

Complex Text and More Comments


On social media, what kind of cognitive strain can we use to engage readers? More complex text may be one way to do it, such as sentences with more words and words with more syllables.

To measure the complexity of the text on Facebook, I adopted the Flesch-Kincaid Grade Level in my study. This instrument has been tuned to reflect the number of years of education a US reader needs to understand the given text. For instance, an article scored 5.2 can be understood by fifth graders and above. The F-K Level is built upon the average number of syllables per word and the average number of words per sentence and calculated as follow:

Flesch-Kincaid Grade Level = (11.8 * syllables per word) + (0.39 * words per sentence) - 15.59

On Facebook, various media outlets present themselves quite consistently with the styles reflected in their own publications. The readability scores in Figure 8 are calculated with the latest 200 posts from the Facebook Pages of seven media outlets.

Figure 8: The median score of Flesch-Kincaid. BuzzFeed: 4.83, Boston.com: 6.01, Boston Globe: 7.23, Washington Post: 7.37, CNN: 9.69, New Yorker: 12.91, The Economist: 14.62.

Here comes the exciting news for journalists. More complex text was correlated with more comments (note again: not with shares or likes)! The effect was modest though: 12 more points in the K-F Level was correlated with 12% more comments. In Figure 9, saturation and size of dots indicate the readability score of posts. Larger and more saturated dots more likely fall above the trend line, indicating better performance.

Figure 9: Harder text is associated with more comments

The fast and slow thinking modes may help interpret this evidence. When the text appears difficult, some social media users give up and march on to next posts. Those who decide to read the complex posts are in fact engaged in slow thinking and slow thinking allows them to understand the message better and to form an opinion. Hence, like poor print is correlated with better answers, complex text appears to be correlated with more comments from readers.

Besides the psychological perspective, other factors may help explain this evidence as well. 1) The overall complexity of the text may imply the importance of a post, and therefore more complex text may attract more views and clicks. And 2) complex messages tend to be longer, and longer messages are displayed in larger blocks of text. As such, more complex and thus longer posts take slightly more time for Facebook users to parse and therefore attract slightly more attention. These two parameters (importance of a message and parsing time) were not controlled in my statistically analysis, so their effects couldn’t be ruled out.

Tension + Relief = Conversation


The general pattern between the three Facebook KPIs is that more reach is associated with more engagement and more engagement with more likes, shares and comments (Figure 10). This trend appears in a roughly linear relationship, between reach, engaged users, and talking about this (after log-transformed). Meanwhile, we can easily discern outliers above and beneath the trend lines. So why did those stories generate fewer activities?

Figure 10: Reach vs. Engagement vs. Talking about this (log-transformed)

To quantify this question, some people have developed a metric called conversation rate. This metric is calculated as the ratio of “talking about this” to “reach”. I’ll list the least as well as the most conversational stories and give a quick summary of the observed patterns in the content. Let’s first look at the most and least conversational stories and try to investigate why they turned out to be so.

Most Conversational Stories
  1. Oklahoma City Thunder star Kevin Durant today pledged $1 million to recovery efforts after yesterday's devastating tornado. http://b.globe.com/191OR6r
  2. Romeo and Juliet, the swans who reside at the Boston Public Garden during the summer (and at Franklin Park Zoo during the winter), returned there today in a sign that the spring season is truly here. See photos: http://b.globe.com/ZOid4O
  3. The lilacs are in full bloom at Arnold Arboretum.  This photo was taken yesterday, known officially as Lilac Sunday at the Arboretum.  Stop by if you have a chance.    Globe staff photo / Yoon S. Byun
  4. Say hello to the CapeFlyer. It had its inaugural run today and is scheduled to have its official debut next weekend, the first time in about 25 years service from Boston to Cape Cod will be offered. http://b.globe.com/13xiois  Would you ride it?
  5. The Marathon bombing sheared off the right leg of Marc Fucarile (pictured, with his fiancee Jen Regan) in a millisecond. It spared the left, but not by much. Now, he and his family are in a painful waiting game to see if his “good” leg can be saved. http://b.globe.com/11igVlA
  6. A child was pulled from the rubble of Plaza Towers Elementary School in Moore, Okla., after an EF-4 tornado struck. The tornado, with winds up to 200 mph, was up to a mile wide and left behind large areas of devastation. http://b.globe.com/12pP8KY
  7. “It was one of the greatest moments in Boston sports history,” writes the Globe’s Dan Shaughnessy about the Bruins’ thrilling win over the Maple Leafs. “And then came a miracle… the Bruins scored and scored and scored.” http://b.globe.com/18H5GTZ
  8. The Boston Athletic Association is inviting all runners who failed to finish 2013 Boston Marathon to run in next year's race.  This affects 5,633 runners.
  9. Brad Marchand scored the Bruins' game-winning goal over the Rangers at 15:40 of overtime. Story: http://b.globe.com/10vZlak    (Photo credit: AP)
  10. After learning she had an 87% chance of developing breast cancer, actress Angelina Jolie underwent a preventative double mastectomy.  Jolie shares her story in a powerful The New York Times op-ed today: http://nyti.ms/18HZFX3     EPA photo

Least Conversational Stories 
  1. Keith Reddin’s thriller “Almost Blue” at the Charlestown Working Theater, isn’t so much blue as noir http://goo.gl/PwlBT
  2. #Recipe for paella-stuffed peppers http://goo.gl/PwlBT
  3. New: Matthew Gilbert's Buzzsaw column. As the cult favorite, "Arrested Development," returns with a season-sized “episode dump,” Globe critic Matthew Gilbert asks, does giving viewers too much leave them with nothing to talk about? http://b.globe.com/10a7Sg5
  4. Make mom feel even more special with these stylish Mother’s Day gifts.
  5. The Phoenix Suns named 33-year-old Ryan McDonough, formerly of the Boston Celtics, as their new general manager.
  6. Album review: The soundtrack for Baz Luhrmann's film adaptation of "The Great Gatsby," curated by Jay-Z, is a fantastical reimagining of that era, putting ‘20s jazz in the modern context of pop and hip-hop. Oddly enough, the one thing the soundtrack is missing is heart.
  7. Creative restlessness and a sense of adventure are at the heart of Iron & Wine’s latest album, “Ghost on Ghost,” which Sam Beam will celebrate with a show at Berklee Performance Center tonight.
  8. Book review: The beloved author of “The Kite Runner,” Khaled Hosseini, returns to the rugged landscape of his home country, Afghanistan with "And the Mountains Echoed."
  9. Jon Lester gave up six runs in six innings in Chicago as the White Sox defeated the Red Sox, 6-4.
  10. Yahoo is buying Tumblr for $1.1 billion. Do you think this will help rejuvenate the Yahoo brand? Is Tumblr a good investment?

Here is my quick summary of patterns related to conversational potential of stories.
  • Beautiful and pleasant stuff was the most conversational, such as photo slides.
  • Also highly conversational: there’s a problem but there have been (or would be) a solution:
    • Tie but broken by miracle win in sports
    • Failed to finish marathon but were invited back to do it
    • Marathon bombing victims but were given medical care
    • Natural disaster but children were saved
    • Chance of cancer but intervention minimized it
  • Two cities were apart for 25 years but recently connected
    • The least conversational:
    • Arts related (music, movies, books, etc.)
    • Factual information (sports scores, settled business deals, etc.

The conversations may also be explained through the fast and slow thinking framework. Two cases of beautiful photos (swans and lilacs) likely attract System 1 and help with conversations. By contrast, the other eight top conversational stories don’t feature beautiful photos at all. Instead, they first present a problem (tied-game, tornado, cancer, disconnection, etc.) and then provide a solution or a triumph. This pattern I name as tension-relief, which is combined by a turning point. As a turning point disrupts the flow of a message, it may slow down people’s thinking and engage System 2.


Two Routes and One Goal


Figure 11: How to attract System 1 and System 2 for higher traffic

To sum up, there are various ways to attract attention on social media. Beautiful photos, simple messages and uppercase words likely attract System 1 for some unconscious attention. To attract more conscious and meaningful attention, we can address surprises, sophisticated language, and turning points in our messages. Both approaches would help again more attention and thus more traffic on social media (see Figure 11).

Some may say these approaches are deceitful. I believe, however, the judgment hinges on the goal. If our goal is as sincere as to reach out to a larger audience and increase the civic impact of a newsroom, the use of these techniques is justified and appropriate. Here I want to quote Kahneman’s thought on this:

All this is very good advice, but we should not get carried away. High-quality paper, bright colors, and rhyming or simple language will not be much help if your message is obviously nonsensical, or if it contradicts facts that your audience knows to be true. The psychologists who do these experiments do not believe that people are stupid or infinitely gullible. What psychologists do believe is that all of us live much of our life guided by the impressions of System 1…

—How to write a persuasive message
Daniel Kahneman (2011, p. 64)


-    *    -    *    -    *    -    *    -    *    -    *    -    *    -    *    -


Which Number is Larger, #Comments or #Shares?

 

Figure 12: Comments vs. shares (log-transformed)

There are three ways for people to express on Facebook: like, share and comment. In general, likes exceed shares and comments, because it’s the cheapest expression people can afford, except some cases when the messages are negative or controversial (Figure 12), which make a “like” contradictory to people’s cognition. This finding isn’t unique to Facebook but other online media as well. On YouTube, despite a large number of views, a thumbs up or down accounted for only only 0.22% of the total views, and comments accounted for a smaller ratio of 0.16% of them would leave a comment (Cha et al., 2007); on Wikipedia, 4.6% of the visits were related to edits; on Flickr, 20% of the users ever uploaded photos (Auchard, 2007). 

The question becomes tricky when I ask which number is larger, comments or shares. The best answer to it is: It depends. In Figure 12, the line marks shares matching comments; above it more shares and below it more comments.

Here’s a pair of cases (Figure 13). The first post is a Rolling Stone cover featuring the Boston bombing suspect. It collected about 600 comments and 100 shares (you may also notice that it only got 57 likes, fewer than both shares and comments). The second example is an alternative cover featuring the hero police and the victims, which attracted about 100 comments and 600 shares. These two posts are about the same topic but the activities they induced are distinct.

Figure 13: Contrast between comments and shares

To answer why people react by sharing or commenting, let’s examine more examples. They’re extreme cases where shares exceed comments the most and the opposite.

More Shares than Comments

  1. You stayed classy, Chicago:  The Blackhawks took out a full-page Boston Globe ad today to send along thanks and praise to the Bruins and the people of Boston.  http://bo.st/14BGdHL
  2. San Francisco City Hall was all lit up with the colors of the rainbow flag last night following the Supreme Court's decision that cleared the way for gay marriages to resume in California following a bitter, five-year legal battle.   Story: http://bo.st/18inxlD     EPA photo
  3. This is possibly the best fireworks over Boston photo we've ever seen.  Photo by Globe staff photographer Matthew J. Lee    We hope everyone had a great Fourth of July!    More photos of the celebration in Boston: http://bo.st/18AIVmv
  4. DeVann Vincent shared with us an alternative Rolling Stone cover.
  5. David Ortiz doubled in his first at-bat to become baseball's all-time leader in hits as a designated hitter and hit a two-run homer an inning later, leading the Red Sox to an 11-4 win over the Seattle Mariners Wednesday night. http://bo.st/16t6Wpz
  6. "Boston Strong" is the central theme at the 10th Annual Revere Beach Sand Sculpting Festival, taking place through Sunday.
  7. One of our favorite photo collections of the year: Winners of the National Geographic Traveler 2013 Photo Contest. The Eastern Screech Owl is seen here doing what they do best. You better have a sharp eye to spot these little birds of prey.
  8. A two-headed turtle born last month at the San Antonio Zoo has become so popular that it now has its own Facebook page. You find out what Thelma and Louise are up to here: http://on.fb.me/1e4rdWB
  9. The Red Sox are back in first place after a 15-inning win that began in July and ended in August. Stephen Drew finally got the decisive hit that lifted Boston to a 5-4 victory over the Seattle Mariners in a game that ended 14 minutes after midnight on Thursday. http://bo.st/169Fcrn
  10. The Red Sox handily beat the Mariners, 8-2, at Fenway Park tonight. http://bo.st/158PjQu

More Comments than Shares

  1. The latest cover of Rolling Stone magazine features a photo of Marathon bombings suspect Dzhohkar Tsarnaev. Is this appropriate?  http://bo.st/15lJKKX
  2. Very sad news: Two-year-old Logan Stevenson of Western Pennsylvania died last night in his mother's arms after serving as his parents' best man at their wedding last weekend.
  3. This dead shark was found lying in front of the Sea Dog Brew Pub in Nantucket this morning.
  4. Both Tedeschi Food Shops and CVS have pledged not to carry the Rolling Stone issue with Marathon bombing suspect Dzhohkar Tsarnaev on the cover.
  5. A British researcher claims that regular sex could be the secret to looking up to seven years younger.
  6. Following the George Zimmerman verdict, former Obama adviser Van Jones tweeted this image displaying Martin Luther King Jr. wearing a hoodie…  Do you think this image depicting MLK in a hoodie is appropriate?
  7. Nancy Kerrigan and Tonya Harding will be back in the spotlight in February as part of a new documentary to air on NBC during the Olympics. Will you watch it?
  8. The Bruins will send forward Tyler Seguin to Dallas in exchange for Loui Eriksson as part of a multi-player deal on Thursday, TSN reported. The deal sends Seguin, Rich Peverley and minor-leaguer Ryan Button in exchange for Eriksson, Joe Morrow, Reilly Smith and Matt Fraser. http://bo.st/17Or9Jv
  9. A Connecticut eighth-grader who misspelled the correct answer to a "Jeopardy!" question and lost money over it says he was cheated.  Do you agree?
  10. We're in the middle of a heat wave, so let's talk about ice cream.  What's the best ice cream shop in New England? Please post your favorite here in the comments.  Thanks!


-    *    -    *    -    *    -    *    -    *    -    *    -    *    -    *    -


Emotion and Social Sharing


One perspective for examining the above 20 posts is emotion. Some psychologists study the relationship between emotion and dissemination of information (Rimé, 1991; Peters, 2009). When stories are episodes about other people, emotions would play a role as follows:
  • People share stories charged in different emotions with different audiences, such as family and friends as opposed to strangers.
  • Among different types of emotions, some are shared more than others. In an offline setting, interest, happiness and disgust were found more shared.
  • The more intense an emotion is, the more likely a story will be shared.

Some scholars have already applied emotion research to the media sphere. For instance, Berger and Milkman (2012) have examined 6,956 articles collected from the front page of the New York Time and the frequency of email shares. They have found that high-arousal emotions were associated with more email shares, including both positive (awe and amusement) and negative (anxiety and anger)emotions. In addition, the investigators found readers were likely to share a story when it became less emotional. They have controlled other factors like position on the front page and discussed the complex relationship between emotion and transmission of information. Their findings are in line with prior ones conducted in an offline setting.

Please note this study was carried out in 2008, when the investigators focused on new media rather than social media and on narrowcast via email rather than broadcast via social media. Still their findings have implications on social media. From the above 20 posts on Facebook, we can also observe a pattern related to emotions. Happy (gay marriage, Red Sox’s victory) and interesting (best fireworks, two-headed turtles) stories were shared more than commented.

 

Social Relation Maintenance

 

By contrast, sad (death, loss) and contemptible (various suspects, sport scandal) stories were commented more than shared.

Looking at Facebook, we recognize it as a venue connecting friends and family members. This characteristic may contribute to the divide between shares and comments, because commenting would allow users to keep their involvement within a semi-anonymous space and away from their friends and families.

There may be other reasons explaining why some stories are more sharable than others on Facebook. Facebook is famous for a positive emotional climate (Pew Research, 2012) and harmonious social relations—guess why no dislike button has been ever introduced to Facebook.

There are a variety of motivations we can suspect why we prefer to maintain good social relations. Some scholars believe good social relationships indicate interpersonal attraction and trust, help receive social support, and assist in transmitting information and resources crucial for professional performance (Podolny and Baron, 1997). Others name reciprocity, postulating that people give while expecting returned favors at a later point (Putnam, 1993).

As diverse as these motivations are, people generally aim to maintain good social relations. Hence, such behavior should also be observed on Facebook, a hub of social connections, and its users should try to please others and avoid offending them. That means, if a post warns of severe weather, people are more likely to share it among social connections (information exchange). On the other hand, if a political or religious story will certainly upset some users’ parents, friends, or bosses, they will feel reluctant to throw it in their faces by sharing it. Or, if it’s aligned with the belief of a social circle (sports triumphs, gay marriage, sense of justice), sharing is preferred.

Nonetheless, when a story reads controversial and emotional, people develop a compulsion to voice their opinions. In these cases, we’ll notice a lot of comments generated under a post though it may not result in many shares.

 

Self-Image Management


Besides maintaining relations with others, people also tend to improve and mange their presence in social life. Goffman (1959) compares everyday life to theatrical performance; in both scenarios people recognize a specific setting and present themselves in front of a known audience. Not only through face-to-face interactions, people also attempt to build ideal self-images by associating themselves with tangible objects, such as branded goods. As Thompson and Hirschman note: “Consumption serves to produce a desired self through the images and styles conveyed through one’s possessions” (1995, p. 151).

Similar phenomena have also been observed on the Internet. For example, personal website owners may follow a motto that “we are what we post”, because they can manage their online self-images by presenting brand logos and products at whim, as opposed to a real life with financial constraints (Schau & Gilly, 2003). On dating websites, people get fairly cautious because profiles and photos are often carefully selected if not completely misleading. We have also heard that teens are obsessed with and exhausted by frequent improvements of their online presence. Similarly, we suspect that Facebook Pages (brands, celebrities, news media, etc.) are also used to craft and improve people’s online presence, or an ideal self-image.

On social media, “I share therefore I am”. The news stories curated and shared by Facebook users inevitably signal their ideal self-images as well. Take Wired as an example, it frequently “geeks out” its fans. Complex circuits and charts are often highly shared, as shown in the following screenshots. I believe Wired has a high-tech audience, but still these “geeky” stories can be curated by non-technical readers as decorations.






Another example is the Facebook Page of the New York Times, as shown in the following screenshots. Its posts that are hip and smart (science, health, cute tips) are highly shared.






-    *    -    *    -    *    -    *    -    *    -    *    -    *    -    *    -


Summary and Implications for Newsrooms

 

In this post, I have highlighted my research on social media, especially how Boston Globe posts are perceived on Facebook. It’s a follow-up study after my first analysis that was focused on empirical evidence. Explaining why various types of reading and sharing behavior are observed is the aim of this study. The theories I've adapted and the conclusions I've reached include:

  • People constantly switch between fast and slow thinking modes.
  • On social media, people are mostly guided by the fast mode.
  • To cater to the fast thinking mode, we should make better use of image and keep our language simple.
  • On the other hand, don’t hesitate to tell complicated stories, because they may engage people in slow thinking and result in receiving more feedback.
  • If you want to encourage a discussion, please ask questions.
  • If you want to make a story conversational, show a turning point with a tension and a relief, because a turning point attracts attention.
  • If you want to see more shares, consider infusing emotions into the stories; both positive (awe and amusement) and negative (anger and anxiety) emotions may work.
  • Besides emotions, smartness also works, because people try to present themselves in an ideal way. Show some good taste in science, health, and hip stories and they’ll have a better chance to be picked up.
  • If your stories aren’t widely shared but feverishly commented, it may be caused by the controversies in them, because people cherish social relations and avoid upsetting families, friends and colleagues by sharing offensive stories.

This guideline will go even longer as my research develops, but I by no means present it as a recipe that newsrooms should adopt strictly, because 1) overuse of these strategies may wear out readers and 2) creativity works the best.

The purpose of this list, I think, serves as a nutrition facts label that helps newsrooms investigate why some posts take off while others never, because you include or exclude some ingredients. Still, it’s completely up to individual social media editors to craft their best strategies to promote the already-published news stories.

That said, in my opinion, sharing news stories on social media may be more relevant to advertising than journalism, because the stories have been finished as a final product. Coming next is a battle for attention and an effort to persuade readers to share.



-    *    -    *    -    *    -    *    -    *    -    *    -    *    -    *    -


Details in Statistics 

 

The statistical tool I used for this analysis is negative binomial regression to control the various aspects of news stories. The factors I’ve controlled are news section (defined by the Boston Globe) and share time in a day and the day in a week. When analyzing each of the three KPIs, I’ve controlled the other two. For more detailed explanation about omitted variables, please check the section on “Independent Variables, Dependent Variables, and Negative Binomial Regression” in my previous blog post focused on empirical evidence: Proto-analysis of Boston Globe Traffic on Facebook.

Another note is that Facebook has recently redefined the metrics, so future research may show different numbers. Nonetheless, the underlying theories should remain applicable, as we humans are quite persistent with our behavior.

 

Acknowledgement 

 

As a Knight-Mozilla OpenNews Fellow, I’ve been receiving constant support from the Boston Globe and their support has made this research happen. I’m very grateful to the staff at the Globe for sharing the data with me, because their kind offer has helped me as well as many other newsrooms understand how people read and share news on social media. I hope this research helps secure its leading position in the online media world, not only through responsive design but also through strategic use of social media.

Working with the Boston Globe is a privilege, and this privilege is offered through OpenNews. I can't express how grateful I feel to Dan Sinker and Erika Owens, the people behind the fellowship program, for their trust and support. This year has been amazing!

In addition, I want to thank Steve Wildman, my advisor at Michigan State University, for his continuous support in my research. Daniel Kahneman’s Thinking, Fast and Slow, the book Steve recommended, has inspired me a lot in this research. I also want to thank Professor Ann Kronrod and my fellow student Guanxiong Huang at MSU. Their expertise in advertising and marketing has helped me enrich this research.

December 12, 2013 01:30 AM

November 19, 2013

Noah Veltman

My OpenNews year-end report card

A year ago I arrived in London for the Mozilla Festival, still a little unsure of what I had gotten myself into. I didn’t know a single person there. Everyone was scary smart and talking about things I had never heard of. Some of them sported Lovecraftian beards. I mostly just tried to keep my mouth shut and soak it all in. On the long flight back to San Francisco, my head was spinning.

Fast forward to last month, when MozFest 2013 came back to London. Those same people I was so hopeless intimidated by a year earlier? Now they all felt like old friends. Instead of sitting quietly in the corner I was running all over the building, leading a workshop, giving tutorials, presenting my work, writing code, and even speaking at the plenary session.

It’s amazing what can happen in a year.

Before I started my fellowship, I laid out some goals, and then I assessed my progress at the halfway mark. Yesterday I listed some of the lessons I’ve learned this year. I think it’s important to reflect frankly not just the highs of my fellowship but also the parts where I blew it, so with that in mind, here are what I consider some of my bigger successes and failures this year:

Successes

Learning new technologies.

Before this year, I had never heard of Node.js, Django, or Leaflet. I had never opened a shapefile in my life. I had only a very vague concept of what D3 was or how OpenStreetMap worked. I had written maybe 100 total lines of Python and exactly zero lines of Ruby. I approached JavaScript tentatively, like it was a bucking bronco waiting to throw me off. Sed? Awk? What are those, Star Wars characters?

In the past year, I’ve had a chance to dive into all of these things and lots more. In some cases, I’ve even developed something approaching expertise and been asked to teach others. That is insane, and also awesome. This was the year I went from a Fisher-Price toolset to a real one, from knowing a few tricks to having genuine confidence that I can crack any coding challenge that comes up.

Getting things published on BBC News Online.

The first thing I got published for the BBC was just a measly line chart. It was about 6pm on a Friday, and a reporter and I were the only two people left in the office. Someone from the business desk needed a chart for their story, and fast. I knocked out a quick stock price chart conforming to BBC style and it went live shortly thereafter:

image

I have made hundreds, if not thousands, of line charts in my life. There’s nothing special about this one at all. But oh, what a feeling to see that up on the BBC. I was ready to dance in the streets.

Since then, I’ve gotten some more substantial work published on things like the elections in Pakistan and Margaret Thatcher’s funeral. Yesterday I was working on visualizing the history of the World Cup. It’s never a dull day here.

Of course, by far my most popular project was The Secret Life of Cats. Given how often complete strangers ask me about it, I’m convinced that will be my epitaph. Few people know this, but the original idea was the result of a decade-long effort by BBC R & D scientists to achieve Absolute Internet, a state that was previously only theoretical and thought to be unreachable within the laws of thermodynamics:

image

Open-sourcing some useful things.

When I applied for the fellowship last year, they asked for a link to my GitHub account. This was a problem. I had a username, but I didn’t have a single shred of code posted. I quickly created one little Potemkin repo in an effort to trick the OpenNews powers-that-be that I was hip to this whole GitHub thing. I don’t think it worked, but I guess it didn’t hurt.

Since then I’ve gotten better and better about getting over my code shame and putting things online. I’ve also gotten better at packaging to distribute, sanding down the rough edges of a library and thoroughly documenting it (I might be the only person who actually enjoys writing documentation). I’ve got a long way to go in this regard, but I’m headed in the right direction.

Discovering a love of teaching.

If you had asked me a year ago I would have said I didn’t have much to teach anybody else or the right temperament for that role. But then a funny thing happened. Teaching others, and figuring out the obstacles to learning new skills in the newsroom, became a central theme of my fellowship.

It turns out I really enjoy working with people as they learn, trying to find that lightbulb moment when some piece of the web suddenly goes from mystical to mechanical and the moving parts click into place. I’ve mostly learned this stuff on my own, making progress in fits and starts and getting frustrated along the way, so I think my experience is a lot closer to the average code-curious journalist than someone who got a real coding education. I know all too well what it feels like to want to tear your hair out when you copy and paste an example from a tutorial and it refuses to work, or when the goddamn text box just won’t go where you want no matter how many numbers you change, or when an experienced coder explains how to “just” do something that winds up taking you six hours.

As a coder, you will always get in over your head and get stuck. But it’s easy to forget how thick the fog is when you’re just starting out. You don’t just get stuck, you have no way to formulate a plan to get unstuck. When something goes wrong, you are rudderless, without the nose for debugging and breaking down the problem into manageable bits that you develop with experience. It can be really dispiriting. I hope I can maintain some perspective on that as I move farther away from the beginner end of the spectrum.

Saying yes to too much.

This year has been a constant flood of surprising opportunities to learn and do new things. There was always another event, another project, another idea, another chance to say “yes, and…” I chased every rabbit wherever it led. I pushed myself out of my comfort zone whenever I could. That crazy, frenetic blend made this year what it was, and I wouldn’t trade that for anything.

Failures

Not writing.

Once upon a time, I wrote things for a living, and I could sit down and knock out ten pages before lunch. But writing is a muscle. You use it or lose it, and as my days have become less about words and more about code, that muscle has totally atrophied. Now a feedback loop has kicked in, and I fear the blank page much more than the blank command prompt. I really should have forced myself to write more about my work this year.

Not collaborating with my fellow fellows.

This is probably my most profound failure this year. My fellow fellows are a remarkable bunch. Every one of them brings unique talents to the table, and my favorite parts of this year have been spending time with them in different corners of the globe. While I’ve collaborated with them casually quite a bit, tapping someone on the shoulder for advice or expertise, I really wish we had found a chance to work together closely on something more ambitious. I’m not sure what exactly that something would be, which is probably why it didn’t happen, but I consider that a big missed opportunity.

Holding on to bad habits.

This year I had hoped to move away from developing by the seat of my pants and towards more sound practices, things like build automation, test-driven development, and more reusable modules. While I’m a much better developer than I was a year ago, I don’t think I succeeded on this front. I still sometimes end up with code like the messy office where you know exactly where everything is but nobody else would stand a chance. Part of this is because the newsroom is unusually forgiving of code sins; timetables and code lifespans are both short. But I also had the unfair luxury of being a one-man band on lots of projects, either because it was a solo effort or because my contribution could be easily compartmentalized. I wish I had done more of the deep collaboration that forces you to get smarter about structuring your code to serve other masters.

Neglecting photography.

Photography has been an important hobby of mine for a long time, but this year my camera barely made it out of the bag. I was also determined at the start of the year to do some sort of project around the place of photojournalism in the news app world. This is something I care deeply about. It worries me that the medium of stills seems to be an afterthought for most people in this space, like we invented the audio slideshow and called it a day. Even animated gifs seem to get more play than traditional photojournalism. I had some interesting conversations with photo editors about this, but beyond that I failed utterly in my plan to turn it into anything of substance.

Saying yes to too much.

The downside of saying “yes, and…” to every opportunity was clear. There are only so many hours in a day, and I stretched myself too thin. I ended up with an elephant boneyard with dozens of unfinished projects big and small. I really should have prioritized those better, starting fewer and finishing more.

November 19, 2013 03:07 PM

November 18, 2013

Noah Veltman

What I've learned as an OpenNews fellow

In just over a week, my OpenNews Fellowship at BBC News will end in the traditional fashion, with Dan Sinker pushing me into the Atlantic Ocean on an ice floe.  It’s been an incredible ride, one of the best years of my life. I’ve pretty much run out of superlatives for it. Above all else, my fellowship has been a learning experience, a crash course in the frontiers of code in the newsroom.  So what have I learned?

You have to run your own race.
The sheer volume of inspiring work and great new ideas coming out of the news nerd community presents a challenge. Every day I see literally dozens of new tools, resources, and news apps worth my time. But the time isn’t there, and that used to drive me nuts. I would bookmark things to come back to later and the list would just grow and grow. It will overwhelm you if you let it.

I hear a variation of this problem echoed by journalists and students who are trying to get started learning to code.  There is just so much out there. Where do you even start?  How do you dip your toe into Class 5 rapids?

In this world, your plate will always be overflowing with opportunities to learn, and that’s exactly how it should be. Any domain worth mastering is impossible to master. News development is changing too fast for any one person to keep up, and it’s a hydra; everything you do ends up opening five new avenues to explore. You have to find a way to let go, to stop trying to take it all in or “keep up.” There will always be more cool stuff out there than you can read, learn, or use, and that’s OK.

Doing something different ≠ Experimenting
I often hear people in newsrooms talk about “experimenting” when all they really mean is just “changing things up.” Experiments are designed to test a hypothesis, emphasis on designed. You need to understand what exactly you’re trying to test, and then you need a plan for testing that specific question and assessing the results without getting faked out. It’s the difference between retrospective medical studies, which have so much noise in the data they rarely produce meaningful insights, and double-blind clinical trials, which are the gold standard of medical research for a reason.

When thinking about new ideas in the newsroom, put on your scientist hat. Turn your idle speculation about what will or won’t work into a testable hypothesis. Figure out what counts as success or failure ahead of time. Don’t just gather all the data you can get your hands on and see what you can find out later. You’ll wind up drunk on metrics without any useful conclusions. For a smarter take on this issue, read my colleague Stijn Debrouwere’s piece on cargo cult analytics.

The bubble is a big challenge.
One of the hardest parts of doing deep data projects and interactives is maintaining empathy about your audience. You spend dozens of hours with your data and it becomes your best friend. You guys go on long walks together. Maybe you rent a tandem bike if the weather’s nice. You end up cramming every possible angle into your story and adding lots of big, beautiful charts and widgets. Surely everyone will want to investigate the nooks and crannies of this fascinating topic as much as you did.

Cut to next morning, when your readers skim your story for 10 seconds on their tiny phones while they walk into the subway station for their morning commute. Whoops.

If you work in data journalism, you’re probably the sort of person that loves deeply exploring data. Meanwhile, for your readers, your story is just one of dozens they might come across during the brief cracks in their day.  It’s easy to forget this. Your journalist/coder peers will ooh and aah over inside baseball sorts of achievements. And it will be hard to kill your babies; you spent weeks on this stuff, and now you want to get it all on the page.

The end result is that we still produce lots of bloated stuff with a good story buried somewhere inside, gasping for oxygen.

My favorite formulation in response to this is what the ProPublica team calls the “near” and “far” view: making sure you give the big picture up front so someone who will only give you 10 seconds gets something out of it, then offering the opportunity to explore and personalize the story in greater depth for someone who will give you a full 10 minutes. Think of it as progressive enhancement for attention: some people will have tiny screens, some people will have cinema displays. You want to serve them all.

And while we’re on the subject: a data dump is not data journalism. Just throwing up a giant dataset online without adding context or conclusions is a capitulation, the equivalent of printing the notes from your steno pad in the morning paper. Once you have the data, your job is just getting started.

Conferenceitis is a serious medical condition.
Feeling lethargic? Eating too many finger sandwiches? Tweeting about airports a lot?

You may be suffering from conferenceitis. Talk to your doctor today.

Conferences can be fun, but it’s easy to go overboard. I certainly did this year. After you go to enough events, fielding the same questions over and over, not only are you not getting your other work done, but you aren’t even producing original thoughts anymore. You just end up quoting yourself. You accidentally develop a spiel instead of just having a conversation. That sucks.

There aren’t a lot of shortcuts to real wisdom; it comes one fumbling step in the dark at the time. My favorite events of the year have been the ones that skip the forest and stick to the trees, especially MozFest and NICAR. In both cases the sessions focusing on teaching something useful and applicable, not on grand principles or the Future of All Things. Beyond events like those, I plan to only conference in moderation from now on. So join my colleague Friedrich Lindenberg and me in 2014 for the launch of DeskCon, a new kind of unconference where we all sit at our desks and finally get some work done.

Data journalism offers new and exciting ways to screw up.
People tend to presume a certain authority and accuracy of computer-assisted reporting methods, but these methods are only as smart as their human practitioners. In reality, they offer a delightful bouquet of new ways to screw up, many of them subtle enough to avoid detection until they produce maximum embarrassment. Remember that time I left out every country starting with S? Or when half the points on my chart were wrong because of Daylight Savings Time? Or when I mistook a moving car for a housecat in a GPS trace? I sure do.

Data doesn’t just radiate truth and meaning on its own. It’s a volatile raw material, one you have to treat with great caution and care to glean any legitimate insight. The bad news? This takes a lot of hard work. The good news? Hey, maybe you won’t get replaced by a robot after all!

Things have a way of coming full circle.
It would be hard to overstate what a radical change this year was for me. I jumped into a totally new industry. I packed up and moved 5000 miles away to a strange island full of fried food and royal corgis. And yet I’m constantly surprised by the ways threads from my past lives keep showing up again. A big focus of this year has been about the value of open source; my first job after college was actually working at a PR firm representing open source companies and organizations, back when GitHub was just a twinkle in SourceForge’s eye. I find tech policy work from my past resurfacing in the newsroom through issues like censorship, online surveillance, open government, and internet standards. I even get to dust off my mothballed political science degree when it’s time to get wonky about election coverage.

When I talk to journalism students or recent grads, I hear a note of panic as they struggle to plot out their future career path and wonder how to connect the dots. This year has been a good reminder that you don’t get to connect the dots ahead of time. They only connect in retrospect, after lots of zig-zagging along the way. If you just seek out interesting work with interesting people and never stop learning, wonderful things will happen.

Community is everything.
The beating heart of all of this is the incredible news nerd community, a motley crew of journalists, coders, civic hackers, and all manner of hybrids that somehow manages to be both so tight-knit and yet so welcoming to all comers. It’s amazing to me how all of these people theoretically working for competitors can be so totally on the same team, giving freely of their time to share their work, collaborate across organizations, and help us all get better.  I’ve benefitted from the kindness and genius of my peers more times than I can count; I hope I’ve been able to give something back.  I’m very proud to be a part of OpenNews, building connective tissue to help grow this community even more.  I can’t wait to see what the future holds for it.

November 18, 2013 04:57 PM

November 15, 2013

Noah Veltman

Working with developers in the newsroom

Last month I co-led a “Web Developer Literacy” for reporters and editors at the Online News Association conference. I expected a lot of questions about particular technologies, but the discussion wound up focusing much more on process and office politics, touching on tough questions like:

How do you integrate developers into a team of reporters?
How do you spec out digital projects when you have no idea what’s feasible?
How can developers, designers, and reporters work together effectively in the crucible of a newsroom?

These are far from solved problems, and newsrooms have some particular handicaps.  They typically lack the time or money for a strong project management function. Needs are unpredictable (I don’t know of many software companies where a product is conceived in the morning and then launched before lunchtime). Decisionmakers are unlikely to come from technical backgrounds, and they’re still adjusting to the relatively new phenomenon of developers in the newsroom.

Despite those challenges, though, lots of interactive teams seem to be converging on certain successful principles.  Here’s the short version of what I said last month:

Clarification: when I say “developer” below I mean a newsroom developer who works on interactives, graphics, data journalism projects, etc. How much this applies to a developer who works on your CMS or your mobile app is a separate question.

Talk to a developer early and often.

One of the worst things you can do is let the editorial horse get way out of the barn and then drop your request on a developer’s desk at the last minute. Supposedly “technical” questions have real implications for design and storytelling, and you need that perspective when the project is taking shape, not after all the important decisions have been made. 

Even something as simple as geography matters. If a reporter and developer are working together on a project, they should probably sit next to each other. If that’s not possible, get them on chat or have them pick up the phone often. Email and tickets are great, but asynchronicity is the enemy when you’re working against a deadline.

Ask a lot of questions, especially ones you’re afraid are stupid. Odds are someone else in the room has the same question. When a developer lapses into obnoxious developer-speak, swallow your pride and ask them to translate. Don’t just nod and make a mental note to go Google “S3” later. Having the conversation right then will clue you in, but it will also help your developers understand where you’re coming from.

Developers are journalists, not technicians.

Your news developers may not be writing or calling sources, but they are journalists, and should be treated as such. You need everyone invested in the common cause of the story and the audience. If they aren’t, and they feel like their job is only to worry about the technical details, the thread will get lost along the way and you will end up with a beautifully designed, beautifully coded piece of crap.  Your developers will be gatekeepers who spend their time saying no to things instead of contributing ideas and working with you to solve the problems that matter.

Talk up front about what might change.

In a newsroom, information rarely comes as a perfect batch, especially on a breaking story. It comes in bits. It gets revised and replaced. As you prototype things or explore some data, you’ll wind up adjusting your original approach. Things will change. That’s OK. But it can save everyone a lot of time and aggravation if you express that uncertainty before you send a developer down the rabbit hole.

Whether an idea is firm or experimental, whether data is going to change or not (spoiler alert: it’s going to change), whether a project is definitely going live next week or is definitely maybe probably not going live next week: these will significantly affect how a developer approaches a task under the hood. The best thing you can do is simply be upfront about what you know and don’t know, the possible ways the project might zig and zag. This way your developers won’t paint themselves into a corner, and they’ll free up more time for other work.

Don’t think “possible” vs. “impossible.”

A lot of questions you get as a developer start with “Would it be possible to…” Almost anything can be done given enough time, enough developers, and enough duct tape, but if you just keep throwing changes into the pot one at a time without a sense of opportunity cost, it’s not going to end well. You will almost never get to produce your ideal version of an interactive. Many good ideas will be left on the cutting room floor. The starting point for discussing a new one should be about the timetable and the existing priorities.

Respect a developer’s concerns, but be ready to push back.

Developers can seem like they’re being too precious about technical issues. Maybe they’re demanding a lot of time to test a new app before launching it, or telling you you can’t do things a certain way because it would overload your servers, or expressing concerns about using a third-party API. They often have a good reason. If something breaks later, it will fall to them to clean up the mess; they need to mind the technical store. And a choice between two JavaScript libraries that seems like inside baseball may mean a difference of a full day’s work.

But this doesn’t mean you should be a supplicant, going along with whatever a developer says because they’re using a bunch of jargon or you’re afraid to step on their turf. Developers have plenty of biases, and they can easily lose sight of how one technical bugaboo balances against other tradeoffs. Challenge them on things. Ask them to explain their reasoning. That’s how we all get better. If they get prickly about it, they’re doing something wrong, not you.

This is a two-way street.

This isn’t just about reporters and designers working to better understand where developers are coming from. The reverse is equally important. What works for a software company does not always work in the newsroom. Developers should strive to better understand the reporting process, the importance of design, and the unique demands of news. They need to let go of some of their technical dogma and get used to working without a net on most projects.  They need to care about your audience, which may consume news differently than they do.  Above all, they need to learn to truly work as a team with non-developers, and that comes back to communication again, being able to explain the why of complex choices to non-developers and give competing priorities a fair hearing.

Have any thoughts about this?  I’d love to hear them.

November 15, 2013 06:13 PM

November 14, 2013

Sonya Song

Proto-analysis of Boston Globe Traffic on Facebook

Update on 7/18/2013: In this post, you'll find a fair amount of explanations about statistics and key metrics. If you're already familiar with them, please refer to a neat summary published by the Nieman Journalism Lab.

Last week, I gave a little talk at the Boston Globe, presenting my preliminary analysis that examined how the Boston Globe articles were perceived through its Facebook Page. Through my analysis, I hoped to answer two questions. What types of stories are shared by the Boston Globe staff on the social media platform? In turn, how do different types of shared stories differently affect Facebook users’ reading and sharing? By answering these two questions, I aimed to find out how well the staff’s intentions were aligned with readers’ interest as measured through three metrics offered by Facebook, and whether there were gaps between the intentions and perceptions that would signal room for improvement.

 

 Highlights of the Study

  • I examined 215 stories shared in two weeks on the Facebook Page of the Boston Globe.
  • I found several attributes correlated with attention:
    • Image size (none, thumbnail, single-column, and double-column)
    • Without or without a “breaking” label in the caption
    • Time of sharing (hour and weekday)
    • News topic defined by editors (business, metro, sports, etc.)
    • Related to the Boston Marathon bombing or not
  • There were gaps between staff’s efforts and Facebook users’ reading and sharing.

 

Facebook Insights and its Metrics

 

I exported the data through Facebook Insights, a built-in feature for Page administrators, to a spreadsheet file and later analyzed them in R, an open-source statistical tool. I kept the dataset fairly small to save time, especially since I cleaned data and labeled some of the variables manually, as automation was infeasible for them. In total, I examined 215 stories shared from May 7 to 21 this year.

My analysis was completely dependent on the three metrics Facebook Insights features: reach, engaged users, and talking about this. According to Facebook, reach is defined as “the number of unique people who have seen your post”; engaged users as “the number of unique people who have clicked on your post”; and talking about this as “the number of unique people who have created a story from your Page post. Stories are created when someone likes, comments on or shares your posts; answers a question you posted; or responds to your event”. These metrics are counted as absolute numbers of unique visitors in various ways and reflect user behavior from passive reading to proactive sharing.

The next section discusses statistical details that may not appear familiar to some people. Please click here to jump directly to the section on findings and implications.

 

Independent Variables, Dependent Variables, and Negative Binomial Regression


The statistical tool I used for this analysis is negative binomial regression, and I want to explicate the two terms, regression and negative binomial, to justify my choice of research method. Regression is a statistical process employed to estimate the relationships among variables. Variables serve different functions on analysis and some are labeled as independent variables and some others dependent variables. Dependent variables measure the attributes we expect to increase or decrease, such as life expectancy, happiness, and crime rate. Independent variables measure the factors that affect, predict or are associated with the outcome of dependent variables, such as educational level, blood pressure, police numbers, etc. Independent and dependent variables are by no means predetermined, but instead they are assigned freely for various research questions. For instance, we can estimate a graduate’s income from her educational level, or estimate how likely someone holds a master’s degree given her income.

In my case of analyzing Facebook data, I chose the three key metrics, namely reach, engaged users and talking about this, as dependent variables. The independent variables are different aspects of shared posts that possibly affect these outcomes. The aspects I included are news section, image size, “breaking” label, publication hour and weekday. Especially, I created a binary independent variable that marked stories as relevant or irrelevant to the Boston Marathon bombing, because this topic has been a beat followed closely by the Globe staff.

The reason why I chose regression is because it allows for assessing the association of each independent variable with the dependent variable separately. This is very important for the analysis. For example, more black women were reported to die of breast cancer than white women. Then could we assume that, biologically, black women confront a higher risk of the disease? Maybe not. If we include women’s occupation, education and income into the analysis, we could find that black and white women are not significantly different in developing breast cancer if they are at the same socioeconomic status (SES).

Taking the study of analyzing news stories as another example, we may observe story A is read by more people than story B. Can we claim that story A is more interesting than story B? Again, maybe not. We may find story A was shared at 8 am when people tend to check Facebook on their commute to work, whereas story B was shared at 11am when people are often busy working. Also, story A covers sports and story B covers international relations, while sports news is generally more popular than international news. Therefore, to control the various aspects of news stories, I need to run regression for more robust and reliable results.

On the question of which type of regression is most appropriate, a quick response is Poisson regression because it handles count data, such as how many times a week people watch TV, how many times a year tornados break out in the US, and how many people are waiting in front of you at a cashier. Because the data I collected violated an assumption for Poisson regression (equal mean and variance), I chose an alternative approach called negative binomial regression, because it is a good choice to deal with the overdispersion expressed by my data. For those interested in a description of these and other analysis methods, UCLA shares a lot of tutorials on statistical analysis, including negative binomial regression.

Coefficients generated by negative binomial regression are log ratios. To make the findings more comprehensible, in the following section, I present the ratios using the exponentiated coefficients.

Findings and Implications


This study was inspired by Facebook’s report on good practices for media companies. Facebook collected a sample of news institutes using Facebook Pages and reached the conclusions based on various practices of them. By contrast, my study was only focused on the Boston Globe and my findings were not always consistent with the suggestions given by Facebook.

“Breaking” Label

Facebook found that ‘posts that included “breaking” or “breaking news” received a 57% higher engagement over posts that were not identified as breaking news.’ In contrast, I did not find any significant difference in engaging users or going viral. The only difference I found is a significant increase in reach by 60%. From this, we could infer that the “breaking” label did not inhibit “engaged users” or talking about this and increased reach.

Image Size

In terms of illustrative images, four sizes can be observed in the posts on Facebook Pages. They are zero or no image, thumbnail images, single-column images and double-column images, but the double-column images cannot be seen on users’ news feeds, and is only available on Facebook Pages. For research purposes, I retained “double-column” as an image size. From the following chart, you can see how image size affected the amount of attention drawn from Facebook users. The ratios are exponentiated coefficients.
  • Quite obviously, illustrating a story with an image was better than with no image.
  • A thumbnail image appeared not to make a significant difference than no image.
  • The larger an image was, the more popular a shared story was likely to be.

Marathon Bombing

The stories about the Boston Marathon bombing significantly attracted more attention on Facebook. Across the three key metrics, reach, engaged users, and talking about this, these stories increased the metrics by 31%, 97%, and 64%. However, when I looked at how users were engaged in doing likes, comments and shares, I realized people didn’t necessarily “like” bombing-related stories. It’s not surprising because “liking” a horrible story may create a cognitive conflict for some people and therefore they don’t feel comfortable “liking” it. Regarding comments and shares, bombing-related stories enjoyed increased performance by 90% and 80%. Again, the ratios here are exponentiated coefficients.

Sharing Hour and Weekday

Because the data set spanned only two weeks, I don’t consider correlations to sharing weekday to be reliable. However, it’s large enough to compare 24 hours across a day. The following chart shows how the stories were shared by the staff and perceived by Facebook users. From it, we can see:
  • More stories were shared during business hours.
  • However, across the three metrics, the performance was not great during business hours.
  • The traffic seemed to peak around 8 am and around 11pm - 2am EST.
    • West coasters may contribute to after-midnight lags.

I talked with Joel Abrams at the Boston Globe about why peaks appeared in the early morning and late night. We’ve conjured up two theories for the phenomenon. First, people check Facebook more frequently before and after work, for instance, on commute or in bed. Second, quite uncooperatively, newsrooms share fewer stories during those “idling” hours because social media editors are also not at work. As such, those hours may see a shortage of new posts and therefore there is less competition for attention seekers. In the future, we could experiment with sharing stories in the early morning and late night to see if we could possibly boost traffic.

News Sections

There are in total 12 news sections predetermined by the Boston Globe staff: art, business, ideas, lifestyle, magazine, metro, news, opinion, slides, specials, sports, and upgrade. (Upgrade posts are advertising that invites people to upgrade their membership to subscribers.) The following chart shows how many stories the staff shared across topics and how different topics were associated with reach, engaged users and talking about this. Between the staff’s shares and the readers’ attention, there were in fact some gaps.


The regression analysis assessed with higher precision how different news sections affected stories' performance on Facebook. Art news was taken as the baseline and the other news sections were compared to it. The results were shown as ratios (e.g., 20% means only one fifth as good as art news, and 300% means three times as good as art news). Please note that the confidence intervals were exponentiated from regression estimates and that's why the upper interval is larger than the lower interval. Now we can sort out news sections by their impact on performance:
  • Sorted by the amount shared by staff, high to low are:
    • Metro, sports, news, lifestyle, arts, business, opinion, slides/mag/upgrade, ideas, and special.
  • Sorted by reach, top ones are:
    • Opinion, slides, lifestyle, and business
  • Sorted by engaged users, top ones are:
    • Opinion, metro, lifestyle, and business
  • Sorted by talking about this, top ones are:
    • Slides, opinion, sports, and metro.
  • The misalignment between staff’s shares and readers’ perception may be a starting point for adjustments.

To compare the two dimensions (staff’s posts and readers’ attention), I scatter-plotted them together on one chart. In this chart, the horizontal axis represents how many stories were shared by the staff, and the vertical axis denotes how the stories were perceived by Facebook readers, in terms of reach, engaged users, and talking about this. The data were log transformed so that the data points could be squeezed together for a more sensible view. The units in fact didn’t matter here, because what we hope to see is the ratio of effort to outcome. or efficiency. To indicate their efficiency in the readers’ responses to the staff’s efforts, I roughly grouped the news topics into high, medium and low and colored the background with yellow, grey and white. It appeared that, given the same amount of posts, opinion engaged more activities and photo slides tended to go more viral. Meanwhile, we could see that the shared posts of opinion and photo slides were fairly scarce. There is a gap between the amount of articles published by section and the traffic they capture, and this could be a fruitful point of analysis for future adjustment in article sharing choice. Specifically, this study suggest that more readers will be engaged if there were more posts of opinion, photo slides, business, and lifestyle.

Virality or Conversation Rate

The following chart shows a trend: when stories reached a larger amount of readers, more readers would be engaged in more activities around the stories, with each dot representing one shared story. This trend appears in a roughly linear relationship, between reach, engaged users, and talking about this. Meanwhile, we can easily discern some circles dangling beneath the trending lines, residing in the red circles. So why did those stories generate fewer activities?
The virality extent, or so-called conversation rate, helps to discover these underperforming stories. This metric is calculated as the ratio of talking about this to reach. I’ll list the least as well as most conversational stories and give a quick summary of the observed patterns in the content.

Most conversational stories

  1. Oklahoma City Thunder star Kevin Durant today pledged $1 million to recovery efforts after yesterday's devastating tornado. http://b.globe.com/191OR6r
  2. Romeo and Juliet, the swans who reside at the Boston Public Garden during the summer (and at Franklin Park Zoo during the winter), returned there today in a sign that the spring season is truly here. See photos: http://b.globe.com/ZOid4O
  3. The lilacs are in full bloom at Arnold Arboretum.  This photo was taken yesterday, known officially as Lilac Sunday at the Arboretum.  Stop by if you have a chance.    Globe staff photo / Yoon S. Byun
  4. Say hello to the CapeFlyer. It had its inaugural run today and is scheduled to have its official debut next weekend, the first time in about 25 years service from Boston to Cape Cod will be offered. http://b.globe.com/13xiois  Would you ride it?
  5. The Marathon bombing sheared off the right leg of Marc Fucarile (pictured, with his fiancee Jen Regan) in a millisecond. It spared the left, but not by much. Now, he and his family are in a painful waiting game to see if his “good” leg can be saved. http://b.globe.com/11igVlA
  6. A child was pulled from the rubble of Plaza Towers Elementary School in Moore, Okla., after an EF-4 tornado struck. The tornado, with winds up to 200 mph, was up to a mile wide and left behind large areas of devastation. http://b.globe.com/12pP8KY
  7. “It was one of the greatest moments in Boston sports history,” writes the Globe’s Dan Shaughnessy about the Bruins’ thrilling win over the Maple Leafs. “And then came a miracle… the Bruins scored and scored and scored.” http://b.globe.com/18H5GTZ
  8. The Boston Athletic Association is inviting all runners who failed to finish 2013 Boston Marathon to run in next year's race.  This affects 5,633 runners.
  9. Brad Marchand scored the Bruins' game-winning goal over the Rangers at 15:40 of overtime. Story: http://b.globe.com/10vZlak    (Photo credit: AP)
  10. After learning she had an 87% chance of developing breast cancer, actress Angelina Jolie underwent a preventative double mastectomy.  Jolie shares her story in a powerful The New York Times op-ed today: http://nyti.ms/18HZFX3     EPA photo

Least conversational stories

  1. Keith Reddin’s thriller “Almost Blue” at the Charlestown Working Theater, isn’t so much blue as noir http://goo.gl/PwlBT
  2. #Recipe for paella-stuffed peppers http://goo.gl/PwlBT
  3. New: Matthew Gilbert's Buzzsaw column. As the cult favorite, "Arrested Development," returns with a season-sized “episode dump,” Globe critic Matthew Gilbert asks, does giving viewers too much leave them with nothing to talk about? http://b.globe.com/10a7Sg5
  4. Make mom feel even more special with these stylish Mother’s Day gifts.
  5. The Phoenix Suns named 33-year-old Ryan McDonough, formerly of the Boston Celtics, as their new general manager.
  6. Album review: The soundtrack for Baz Luhrmann's film adaptation of "The Great Gatsby," curated by Jay-Z, is a fantastical reimagining of that era, putting ‘20s jazz in the modern context of pop and hip-hop. Oddly enough, the one thing the soundtrack is missing is heart.
  7. Creative restlessness and a sense of adventure are at the heart of Iron & Wine’s latest album, “Ghost on Ghost,” which Sam Beam will celebrate with a show at Berklee Performance Center tonight.
  8. Book review: The beloved author of “The Kite Runner,” Khaled Hosseini, returns to the rugged landscape of his home country, Afghanistan with "And the Mountains Echoed."
  9. Jon Lester gave up six runs in six innings in Chicago as the White Sox defeated the Red Sox, 6-4.
  10. Yahoo is buying Tumblr for $1.1 billion. Do you think this will help rejuvenate the Yahoo brand? Is Tumblr a good investment?
Here is my quick summary of patterns related to conversational potential of stories.
  • Beautiful and pleasant stuff was the most conversational, such as photo slides.
  • Also highly conversational: there’s a problem but there have been (or would be) a solution:
    • Tie but broken by miracle win in sports
    • Failed to finish marathon but were invited back to do it
    • Marathon bombing victims but were given medical care
    • Natural disaster but children were saved
    • Chance of cancer but intervention minimized it
  • The least conversational:
    • Arts related (music, movies, books, etc.)
    • Factual information (sports scores, settled business deals, etc.
  • The high and low engagement is consistent with prior research that higher emotional reaction leads to more frequent expression.

 

Limitations and Future Research

  • Limitations
    • The data set is fairly small (n = 215)
    • Hence, more sampling errors and biases in results
    • Also omitted to examine how the frequency of shares would affect readers’ perceptions (the more shared stories the better, or vice versa, or doesn’t matter?)
  • Future research
    • Time-series data
    • Demographics (gender, age ranger, location, etc.)
    • Devices (web vs. mobile, platform types, etc.)




November 14, 2013 11:22 PM

Brian Abelson

The Relationship Between Promotion and Performance:
Pageviews Above Replacement

This is the second in a series of two posts about pageviews. This post details some research I’ve conducted on the promotional correlates of the metric, while the previous post discussed the sometimes apocalyptic tone of the discourse surrounding the ‘death’ of pageviews.

The Meaning of Metrics


James Watt had a problem. The man erroneously known as the creator of the steam engine (he actually just added a small condenser to the existing steam engine which made it more efficient) needed a way to sell his new machine. While there were plenty of ways to measure how one steam engine compared with another, there wasn’t yet a good way to compare the power of a steam engine with that of less technologically-sophisticated alternatives.

So Watt came up with horsepower, which was based on this simple estimation of an average horse’s strength:

1 Horsepower = 745.699872 watts

With this new conversion factor, Watt was able to make the utility of his machine more marketable. Watt’s engine eventually replaced horses and water wheels as the primary power source for British Industry and was one of the most crucial factors in the Industrial Revolution of the 19th century.

The great irony in all of this though was that Watt’s main innovation was increasing the efficiency of the steam engine, not simply its raw power. Ultimately, he made a trade-off between the degree to which horsepower reflected reality and the degree to which it could be interpreted by a lay-person. In the end, he assumed that people would take the price of coal into account, and therefore make their decisions according to two metrics: resources expended and power generated.

Almost 250 years later, horsepower is still being used to sell engines. The problem, however, is that over time  —  as the cost of extracting and shipping resources around the world plummeted  —  the calculation of horsepower remained the same. One wonders how different the world might look if Watt had added a simple denominator that accounted for the resources required to generate a given amount of force; what if American auto-culture had been focused on selling efficiency and not horsepower?

Here, we begin to see what metrics are for and the effects they have over time. A metric is about communicating a complex concept in interpretable and actionable terms. But when it’s widely adopted  —  when an industry seeks to optimize its activities for a given metric  —  it ceases to be a mere reflection of reality. Instead, the measure comes to actively shape the industry, oftentimes leading to unforeseen manipulations and externalities. So just as the American auto-industry’s choice to optimize for power came at the expense of increased C0² emissions, incentivizing professional baseball players to hit more home runs led to the widespread use of steroids, and the success of the Economist’s Big Mac Index inspired Argentina to artificially depress the price of the burger.

In News, the dominant metric is currently pageviews, which in many ways is the digital equivalent to its analog predecessor: circulation size. While Watt used the power of a horse as the metaphor for the power of his steam engine, media organizations use pageviews as a proxy for their overall reach. They’ve done this in part because pageviews are relatively easy to measure and compare across articles and news outlets. However, the “landscape has changed” and earning money in a news organization is no longer a mere function of audience size. That being said, it hasn’t stopped media outlets from optimizing their activities to maximize this metric.

At their worse, companies use slideshows, link bait, and mugshot galleries to “juke the stats”. However, innocuous activities like placing an article on the homepage, sending out an email blast, or sharing a link on social media have the same effect. That this is the case is neither good nor bad  —  it simply needs to be acknowledged before one can meaningfully make comparisons across online content. But while the promotional arms of media outlets are focused on directing interest to particular stories, their metrics do not capture the impact of these energies. Just as Watt’s horsepower only accounted for the power of his engine (and not the energy required to generate it), so do pageviews fail to capture the effects of the resources expended in creating them.

Pageviews Above Replacement


What would pageviews look like if we controlled for promotion? This was the question I set out to explore earlier this year as a part of my OpenNews fellowship. Early on at the Times, I spoke with numerous journalists and editors who expressed a desire for a better way to make comparisons across varying pieces of content. For them, the difference between 100 and 100,000 pageviews was obvious, but what if one article “[got] to ‘Snow-Fall’” and the other didn’t?

An early inspiration for me in this process was the concept of Wins Above Replacement, or WAR  —  an advanced baseball statistic developed by sabermetricians. The idea is fairly simple: managers are interested in putting together a winning team. However, to do so, they must select the right combination of players that possess a variety of skill-sets. While it may be easy to identify high-impact players, the majority of a team’s roster is made up of backups, relief pitchers, and “role-players” for whom it is more difficult to judge relative talent.

WAR addresses this conundrum by calculating, for every player, “the number of additional wins that player would contribute to a team compared to a replacement level player at that position.” So, if a manager was in need of a shortstop and didn’t have much to spend, he could simply open a spreadsheet of available players, filter by shortstop, and sort by WAR. The players at the top of this list should presumably contribute the greatest number of wins to the team over the course of a season.

In many ways, media companies are in need of just a metric: one that effortlessly communicates value and drives decision-making. However, at the present moment, online metrics are too focused on decontextualized outcomes. But by incorporating the influence of promotion on an article’s performance, we can create a set of baselines that would enable more meaningful comparisons across a wide range of content. We might call such a metric ‘Pageviews above replacement’ or PAR for short, as it would allow us to determine how well a certain article performs in comparison to a similar article that received the same level of promotion.

In an attempt to build a prototype of PAR, I collected as much data as I could on the promotional activities of the New York Times. Every 10 minutes or so, I pulled in the posts from 20 Times Facebook accounts, 200 Twitter accounts, and the contents of the homepage and ~ 25 sections fronts. At the same time, I also collected metadata on articles and information on their performance. By cross-referencing these two sources by URL and time, I was able to construct a detailed database of 21,000 articles published on nytimes.com between July and August.

Exploring the Data


Throughout my stint at the Times I have often witnessed analysts, social media editors, journalists, and business executives sprung into action when a particular piece of content is not meeting its expectations. In most cases, the solution is to intensify promotional efforts. Whether it be through a concentrated social media campaign or simply leaving the article on the homepage longer than normal, the inevitable result is that traffic to the article increases. Following this behavioral pattern, I expected to see a clear relationship in the data between promotional energies and pageviews.

In the chart below, the data is visualized over time. The y-axis connotes the total number of pageviews the article received over seven days while the x-axis represents when the article was published. Pageviews are transformed along a logarithmic scale in which the differences between points correspond to orders of magnitude, rather than the raw number (in other words, the actual difference between points lower in the scale is much smaller than those higher up). In this chart and the others that follow, I remove axis annotations for pageviews so as to protect the privacy of the New York Times. Finally, each dot is colored by the time the article spent on the homepage and sized by the number of times it was tweeted by a New York Times Twitter account. For context, I add a line through the middle that represents the average number of pageviews for articles published on each day.



Click to Enlarge



At first glance, this chart appears to resemble a series of balloons floating upwards. The metaphor is apt  —  articles which spend longer on the homepage (reddish bubbles) and which are tweeted more by Times accounts (larger bubbles) always rise higher. Below the average line, there are no big red balloons.

Following this insight, we might wonder how strong the relationship between time on homepage, number of tweets, and pageviews is. The chart below visualizes these relationships by placing articles in a scatterplot, where the x-axis is the time an article spent on the homepage (log-scaled), the y-axis is the number of pageviews (once again, log-scaled), and the size of each point corresponds to the number of times the article was tweeted by Times accounts. In addition, the dots are now colored according to the four combinations of two variables  —  whether or not an article was from the AP or Reuters (“the wire” for short) and whether or not an article was tweeted by @NYTimes.

Original content, no @NYTimes tweets 4553 Articles 21.7%
Wire content, no @NYTimes tweets 15180 Articles 72.2%
Original content, tweeted by @NYTimes 1184 Articles 5.6%
Wire content, tweeted by @NYTimes 89 Articles 0.4%



Through my explorations of the data, I found that these two variables were significantly associated with the number of pageviews an article received (discussed more below). Intuitively it makes sense  —  stories from the wire should not receive the same promotional energies as those that come from journalists working at the Times. Likewise, articles which are exposed to @NYTimes’ 10 million followers will invariably perform better than those which do not receive this boost.


Click to Enlarge



This chart shows the stark contrast in the lives of articles that pass through the New York Times. On the left side of the graph, we see a stack of 13,000 articles (73% of the total) which were never promoted on the homepage. These articles exist in somewhat of an online ‘State of Nature’ and for most, their lives are nasty, brutish, and short. Of these, about two-thirds are wire content. The other third are stories from Times journalists, two percent of which were also tweeted by @NYTimes. Just to the right of this stack is content which was on the homepage for 10 - 100 minutes. Of these, a shockingly high 98% are wire articles. In this space, the articles that performed especially well were the fortunate one percent linked to by @NYTimes. Finally, on the right half of the chart are the six percent of articles which spent more than 100 minutes on the homepage. Over 90% of these articles are original content and almost 80% were promoted by @NYTimes.

What emerges from this visualization is a clear picture of four distinct classes of content on the New York Times’ site: (A) Wire articles which never reached the homepage, (B) Original articles which never reached the homepage, (C) Wire content which is featured on the homepage for a short period of time, and (D) Original content which receives promotion on both the homepage and across social media. While groups A and B encompass a wide variety of outcomes, groups C and D generally display a positive linear relationship between time on homepage and pageviews.

Predicting Pageviews


Given these clear relationships, I wondered how well I could predict pageviews. The idea here is that, by determining the factors that drive traffic to an article, we can create baselines that help us determine whether or not an article has underperformed or exceeded its expectations.

Surprisingly, I found that the three factors visualized above  —  whether or not an article was from the wire, whether it was tweeted by @NYTimes, and how long it spent on the homepage  —  accounted for over 70% of the variance in pageviews within my dataset. By including additional variables in my model  —  the type of content (video, interactive, blogpost, or article), the section the article came from, word count, promotion on Facebook and section fronts, and the highest point the article reached on the homepage  —  I was able to explain almost 90% of the variance in pageviews. This means that, given some basic information about an article and the degree to which it will be promoted, my model can, to a fairly high level of precision, predict how many pageviews that article will receive seven days after publication.

Of the variables included in the model, ten proved to have the highest predictive value:

+/- symbols signify the direction of correlation

While it may be tempting to latch on to any one of these variables, the general idea is that, holding all else constant, articles which receive more promotion  —  on the homepage, across section fronts, and on social media  —  will invariably receive more pageviews. While there are some serious problems in such an analysis  —  it is likely that content which performs well tends to attract the attention of homepage and social media editors  —  the power of the relationship suggests that, in general, the Times can selectively pick and choose the content that garners the most attention by simply manipulating their homepage and principal Twitter account. This is certainly not an earth-shattering insight, but the degree to which it holds true suggests that these factors cannot be ignored.

Below, I visualize a scatterplot of the actual number of pageviews versus the number my model predicted. Once again, dots are colored by the four categories outlined above, with lines pointing to the average number of pageviews for each category.

Click to Enlarge



While this chart mainly communicates the predictive power of the model, it also reveals some other interesting insights. For instance, there is a much higher level of variance  —  or predictive error in the model  —  at lower levels of pageviews. This is the manifestation of the ‘State of Nature’ described above. Since promotional energies are such strong a predictor of performance, articles that receive very little promotion are harder to predict. Further investigation confirmed that the majority of the error in my model was present in articles which never reached the homepage.

More importantly, the relationship between actual and predicted pageviews represents a fundamental building block of PAR. In the graph, the straight gray line signifies the threshold of a perfect prediction. Articles that fall above this line can be thought to have exceeded their expected number of pageviews while those below the line have underperformed. Computing the error of the model, or the difference between actual and predicted pageviews, allows us to calculate the degree to which any given article has performed in relationship to its “replacement”  —  or a hypothetically similar article which received the same level of promotion.

Calculating PAR


While I am still in the nascent stages of this research, I explored what PAR might look like if we were to use it to evaluate the performance of various sections on the New York Times. Below I visualize the average PAR  —  or the degree to which a given article’s actual number of pageviews deviated from the number my model predicted  —   for the top sections in terms of the number of articles produced. In the chart, the x-axis represents standard deviations  —  or the distance a data point is from its mean  —  so that we may compare multiple variables on the same scale. The x-position of reddish dots signifies a section’s average PAR. These dots are also sized by the average number of times NYT Twitter accounts shared a link to an article in a given section. Concurrently, the x-position of blue dots signifies the average pageviews for articles from each section. These dots are sized by the total number of articles published in that section. Finally, the lines connecting these two sets of dots are colored by the average time articles from these sections were promoted on the homepage.

Click to Enlarge



As the annotations on the chart suggest, PAR significantly levels the playing field. While Business, US, Technology, World, and Sports all rank much lower with regards to raw pageviews, PAR accounts for their relatively low level of promotion (visualized by the blue lines and small red dots). In turn, these sections rank in the middle of pack with regards to PAR. On the other hand, Opinion, Magazine, and Real Estate pieces, which rank high in terms of pageviews, are not so far ahead when calculating PAR. It is important to note, of course, that content which garners an extreme amount of traffic will still rank high in terms of PAR, as these signify outliers from the norm, and the model will never be able to fully account for these deviations.

Next Steps


While the research I’ve outlined above represents a very small step in creating a better set of metrics for online media, it powerfully suggests that the placement of promotional data alongside pageviews gives us a better understanding of what the metric actually means. To make these insights actionable, we might imagine an application which tracks, in real time, how articles are being promoted across an organization’s social media accounts and website. This application would then make a prediction of how many pageviews (or really any outcome, for that matter) an article should have received at an arbitrary point in time given the level of promotion it garnered up to that point. By concurrently tracking the performance of an article over time, it could also make predictions about the future, which would aid in deciding whether additional promotion of an article will have a meaningful effect. Researchers from MIT, Carnegie Mellon, and the Qatar Computing Research Institute have recently created a similar application for Al Jazeera, and have outlined their findings in an academic article. I have also begun work on open-source application (documentation is coming!) which tracks an arbitrary set of Twitter accounts, Facebook pages, websites, and RSS feeds, and detects and archives when links that match a certain pattern appear. This application could be used in tandem with Google Analytics (or any other analytics software) to create a dataset similar to the one I’ve outlined above.

An important task in furthering this research will be to apply it to contexts outside of the New York Times. The variance in promotional strategies across media organizations is vast, and it is likely that the factors which drive pageviews for the Times are not the same for nonprofit news sites like ProPublica or social media-oriented outlets like Quartz. The Times is most certainly an outlier with regards to the level which its homepage drives traffic, and in the future  —  as social media becomes an increasingly important mechanism for content discovery  —  the significance of homepages will invariably diminish. Another issue is the fact that, more and more, news organizations are serving different content to different users based on their perceived preferences. In this world, it will be very difficult to determine the degree to which certain pieces of content are “promoted” more than others.

In sum, the PAR approach is a band-aid  —  it does not fully dissolve the importance of pageviews nor does it anticipate the changes in the media landscape to come. It cannot tell you anything about the broader impact a piece of reporting has on society, nor does it ‘Save Journalism’ in one fell swoop. At best, PAR helps communicate the importance of measuring inputs, or the energy expended by a news organization to achieve a certain goal. While measuring promotional input is an important first step, we might also imagine a set of metrics which account for the time it took to put a story together or the money spent on pulling in content from the wire. As my esteemed colleague James Robinson likes to say, news analytics is about “measuring the relationship between impact and effort.” For too long we have been focused on measuring the former while ignoring the latter. It’s time to change that.

November 14, 2013 05:00 AM

November 08, 2013

Erika Owens

The journalism presence at MozFest grows

October of all of the things ended with an extraordinary Mozilla Festival. We got to welcome 5 more Knight-Mozilla Fellows and had to create a whole new track to accommodate all of the excellent session ideas. Here's a quick rundown of what happened at MozFest and some plans that have grown out of the event.

OpenNews at MozFest by the numbers:

  • 2014 Fellows: 5 (joining 13 Fellows from 2012 and 2013)

  • Tracks: 2 (journalism and open data)

  • Sessions: 35

  • Total session participants: 1,000

  • Range of participants per session: 5-175

  • Session facilitators hailed from: at least 7 countries, several news orgs, companies, and government agencies

The journalism footprint at MozFest nearly doubled from 2012. In response to a ton of data-related session proposals, we created a second track focused on open data, which covered government and elections data, data access, and more. Over the course of two days, we managed to schedule about 9 sessions in every time slot. Walking around the 8th floor, I saw groups engaged in intense discussions, sketching out plans, and getting hands-on training--and working on projects live, on wifi! (Major thanks to Mozilla staffer Ryan Watson for keeping the internet functional this year.)

 

Conferencing the MoFo way

One of the things that makes MozFest special is that even big sessions are oriented around small groups. Rather than PowerPoints and lectures, it's about breakout groups and collaboration. It's a new format for a lot of people, facilitators and participants alike. But when it works, it's incredibly effective--and thanks to fantastic planning and highly engaged participants, the 8th floor got to see the range of awesome stuff that's possible when you throw out the script.

 

Moving forward

MozFest can be a whirlwind of a weekend, but sessions were designed to have next steps after the Festival in mind as well. Some work that we expect will continue:

  • Making open data useable and understandable: Several of the data sessions grew out of existing projects or conversations--open elections, US ODI, and CivOmega. As the US government shutdown highlighted and was discussed in an OpenNews community call, even open data can have its limitations, and there are several threads the OpenNews community is pursuing coming out of these sessions.
  • International data access: In some countries, open government data has not yet been embraced and data access remains a serious challenge. Eva Constantaras from Internews Kenya (a co-host to one of our 2014 Knight-Mozilla Fellows) led the way at MozFest on making connections with colleagues on at least three continents to work together on data access and training internationally.
  • Impact and metrics: These remain topics of high interest in journalism, and to many of the 2013 Knight-Mozilla Fellows. Sonya Song  is going to share her findings on social media sharing with multiple news organizations. An API for the Open Gender Tracking project grew out of a session on impact and will help more people tap into Open Gender Tracking.
  • Building connections between journalists and developers: I was thrilled (and admittedly, surprised) to see Joanna Geary present an idea for creating a buddy program that her group developed during my session, just two days later at a Hacks/Hackers London meetup. It was awesome to see an immediate concrete outcome like that. It was also fantastic to see multiple MozFest facilitators speak to dozens of H/H London members, many of whom had spent the weekend at MozFest. It was great to see the MozFest community connections in action.

As folks continue to recover from the October of all of the things and November of fewer but still too many things, we expect there to be even more MozFest followup. Kio Stark from Source's Learning section found many leads at MozFest and Source will likely soon be brimming with case studies, Q+As, and project writeups that have some MozFest DNA.

November 08, 2013 09:33 PM

November 01, 2013

Noah Veltman

Lying with charts for fun & profit

A few months ago I discovered that Wikipedia provides detailed hourly data dumps of how many pageviews each article gets, and the former political science major in me quickly sprang into action. I wanted to look at article traffic for candidates during the run-up to the 2012 election; I figured I would find all sorts of interesting patterns and glean new insight into American politics and information-seeking behavior. It was going to be great. As usual, I was wrong.

Before I could even investigate the data, I had to jump through a few hoops. The hourly dumps include EVERY Wikimedia page in one giant tab-separated list, so you’re talking about terabytes of data in total just to grab a very short list of presidential and senate candidates. It also turns out that, shockingly, some major party Senate candidates from the 2012 election don’t even have Wikipedia articles. To further muck things up, because the end of daylight savings time occurs during the campaign, you have to do some time-shifting to get everything to match.

Once the data wrangling is done, if you plot the hourly pageviews for Romney (in red) and Obama (in blue) as a stacked area chart, it looks like this:

image

You see certain spikes there that line up with key live events.

image

OK, so this is mildly interesting.  The story here seems to be that people run to their computers to look up the candidates during the debates, on election day, and during the conventions, when something is happening right at that moment on TV.  The disparity between the activity during the GOP convention and the Democratic convention makes some sense, since Obama is more of a known quantity.  And if you zoom in on the conventions, you see that everyone is looking up Romney during the GOP convention, but it’s about 50/50 for the Democratic convention:

image

image

But what if we take the same data and aggregate it by day instead of by hour?

image

Now the story looks quite different.  The conventions and debates are really just blips.  All the action is on election day.  Actually, most of it is the day AFTER election day, East Coast time, because the big traffic rush comes during Obama’s acceptance speech, which took place after midnight Eastern Time.

We could also plot the data as cumulative traffic instead:

image

Now it mostly just looks like a slow and steady climb, with Romney getting somewhat more traffic up until election day, when Obama’s numbers get a gentle bump.

These three charts are in some sense showing the same data, but the immediate takeaways are quite different.

As another quick example, let’s look at a line chart of the same pageview data for 2012 senate candidates:

image

This looks a bit different.  There are two massive spikes, and everything else is tiny by comparison.  It turns out both one-hour spikes belong to Elizabeth Warren, the now-senator from Massachusetts, who spoke at the Democratic convention.

image

This chart seems to tell the story that Warren had two breakout moments where lots of people were looking into her online, and the rest of the Senate field was quiet (including Ted Cruz, who spoke at the Republican convention but didn’t draw nearly the same amount of traffic).  But what about the little sawtoothed pile that starts around August 20?

image

If we try aggregating by day, as with the presidential election, we get the answer:

image

Oh right, that guy.  When Akin made his ill-advised comments, he apparently had a lot of people run to their computers to look up who he was.  But unlike Warren’s convention speech, it wasn’t a second-screen, live TV moment sort of thing.  It was news that spread more gradually, over the course of about two days.

We also see that many other candidates got some attention on election day.  The person with the biggest daily peak turns out not to be Warren, but rather Tammy Baldwin from Wisconsin, now the first openly gay US senator.  She didn’t make waves during the campaign, but her historic election brought a bunch of curious Wikipedia viewers after the polls closed.

image

Had the “days” been grouped on a cutoff besides midnight Eastern Time, so that the late-night election speeches and results were grouped in with the day before, we would have seen yet another story.  We could also look at total pageviews by candidate and get a different impression:

image

And let’s not forget that Wikipedia traffic is far from a great proxy for information-seeking behavior generally.  It suffers from all kinds of biases.

So which of these charts is the accurate one?  Which one tells the story?  All of them?  None of them?

The lesson, as usual: data does not speak for itself.  It’s something you can mold into different forms, all of them “true,” none of them the whole truth.  The way you slice and scale things matters.  Context matters.  Even something as prosaic as time zones can have a big impact on what story comes out of your work.  Always think carefully about what your data is and is not telling you.

A more detailed version of the presidential pageview chart is available here.

November 01, 2013 06:51 PM

October 31, 2013

Noah Veltman

The Command Line Murders: Teaching the Terminal with a Detective Noir

Last weekend at the Mozilla Festival, a group of journalists sat down to solve a murder mystery on the command line.

image

Each person got a set of folders containing text data files full of information about the mean streets of Terminal City. The files listed who lived there, the vehicles they owned, the clubs they belonged to, the streets they lived on, and so forth. The formats were varied - some of them were tab-separated tables, some were plain text, some had instructional header or footer rows.

More importantly, 99.9% of the text was junk. It was gibberish, or excerpts from Alice in Wonderland, or names of random 2012 olympic athletes. But buried at key points in these large files were actual clues that, when followed, would eventually lead you to the identity of the murderer. With so much nonsense text to sift through, the only way to crack the case in a reasonable amount of time would be to use the command line to quickly search, filter, and inspect the data.

I thought this might be a stickier way to teach the basics of the command line than drily walking through a lot of examples, because it more closely mimics a real world data journalism scenario: you inherit a big dump of messy data without any context. There’s too much data to hold in your head, and you don’t even really know what’s in it, or how the files are structured. You have to probe and get your bearings, and then you have to be careful with your inquiries, spot checking and duplicating results as you go.  You can only see one slice of the big picture at a time.

The key thing about this whodunit exercise is that it’s freeform. You don’t have instructions to follow; you have a situation, and it’s up to you to experiment and find a path to the solution, once you figure out what the solution would even look like. There are many different ways you could find the answer.  Some might be more efficient but trickier to implement, others might be simple and stepwise but easier to follow and modify. This is an important part of getting comfortable with the command line: understanding that it consists of small pieces that do one thing well, and you can combine them in infinite ways to get what you need.

Why worry about teaching journalists the command line in the first place?  I can think of a few reasons why it comes in handy even if you have no plans to become a developer:

  1. A lot of really useful tools for journalists end up stranding you on the command line. You hear that piece of software X is exactly what you need to convert that weird file, or build a certain type of chart, or make a map, so you go try to download it.  But you end up on a GitHub page with installation instructions that are way over your head and involve fifteen different steps on the command line.

    In a perfect world you could just copy and paste the commands from the documentation and cross your fingers and hope it all works. In the real world, those tools almost never just “work,” and the documentation usually leaves out some important details. So you get some weird error message during setup, or output you weren’t expecting.  You’ll be stuck unless you have some idea of how the commands are structured and what you might need to change.

  2. Command line tools are a lot more efficient at processing text than desktop software or even custom scripts, and this starts to matter if you have a massive dataset. You can open a 5MB file in Excel, but not a 5GB one. If you’re a data journalist and you encounter a really huge quantity of data, using the command line for filtering/searching/cleaning can save you a lot of headaches.

  3. It’s useful to stop thinking of “data” as a special category, something you only interact with delicately and indirectly, with a piece of software like Excel as your liaison. Data is text, and text is data. Virtually any sort of data a journalist encounters can be treated as just a big pile of text, and once you understand that, you can get more creative in how you interrogate and modify it, because it all boils down to searching and replacing, reading text in and spitting it back out.

As for the mystery, you can give it a try yourself (you only need the file clmystery.zip). This version was kind of a rush job, with not nearly as much hardboiled, Sam Spade flavor as I would have liked, but pretty soon I’ll start working on the next case and hopefully introduce more advanced commands like sed and awk.  Get to work, gumshoes!

October 31, 2013 02:20 PM

October 24, 2013

Dan Sinker

OpenNews: Meet our 2014 Knight-Mozilla Fellows

265 applicants. When our search for our 2014 Knight-Mozilla Fellowships ended at midnight, August 17, that’s what we were staring at: 265 of some of the most talented developers, hackers, data scientists, and makers I’d ever come across. The number of slots we had for them? Five.

The process to narrow from 265 to five wasn’t easy–at every step in the process we’d have a gut check, constantly revising our narrowed lists upward to make sure we didn’t miss anyone amazing. By the time we’d winnowed the lists down to an impossibly small 25 candidates, our news partners–the New York Times, ProPublica, the Texas Tribune, La Nacion, Ushahidi and Internews Kenya–all asked the same question: Can we choose them all?

But, together, we narrowed down to a final five.

These five Fellows come at a turning point for the Knight-Mozilla OpenNews project as well. As I announced last week, OpenNews will be continuing not just for 2014, but for 2015 and 2016 as well, supported by a substantial grant from the Knight Foundation. This grant allows us to expand far beyond fellowships: we’ll be hosting our own conference, SRC CON; we’ll be holding “code convenings” to build collaborative newsroom code; we’ll be supporting hack days around the world and bringing learning opportunities to smaller newsrooms. But we will always see our Knight-Mozilla Fellows as the beating heart inside OpenNews: a chance to invest deeply in talent and ideas and new blood for a growing community.

2014 marks our third cohort of Knight-Mozilla Fellows, and the five fellows I’m announcing today have their work cut out for them to match the incredible ideas, projects, and people that came before them.

That said, they’re going to blow it all away. Our new fellows are amazing and I am so excited for you to meet them. We started at 265 and now we have five–meet our 2014 Knight-Mozilla Fellows:

Harlo Holmes | New York Times

Harlo Holmes is a media scholar, software programmer, and activist. As research fellow with The Guardian Project, she primarily investigates topics in digital media steganography, metadata, and the standards surrounding technology in the social sciences. She harnesses her multi-faceted background in service of responding to the growing technological needs of human rights workers, journalists, and other do-gooders around the world. Follow her @harlo or at harloholm.es

Brian Jacobs | ProPublica

Brian Jacobs is a designer and interactive developer. He’s passionate about multi-faceted visual tools that are civic-minded, scientific, journalistic, or otherwise educational, to benefit the people and their habitat. He’s worked in commercial and academic contexts, on GIS projects in West Virginia, web apps in Philadelphia, and towards an urban data processing and visualization platform for the MIT SENSEable City Lab, in Singapore. He’s excited about the future of open data, particularly collaborative and semantic web initiatives that can afford reproducible access to cleaner, more interdisciplinary data. Brian is also intensely interested in bagels, hikes, and sci-fi camp. Follow him @btjakes.

Aurelia Moser | Ushahidi / Internews Kenya

Aurelia Moser is a data munger and code monkey based in New York City. With a background in library metadata and lab work, she builds visualizations and narratives around data, supported dually by passions for data preservation and open information. Equal part experimenter and educator, she organizes NYC Nodebots meetups and coordinates curricula for Girl Develop It, a non-profit teaching women how to code in low-cost classes. For fun, she runs a radio show based on the semantic web, and digs studying, silent discos, and shoegaze. Follow her @auremose or at algorhyth.ms.

Gabriela Rodriguez | La Nacion

Gabriela Rodriguez is an activist and hacker who loves the intersection between media and technology. She grew up in Uruguay and now lives in Portland, OR (USA). She is a software developer with passion for free software and open knowledge. She co-founded the Uruguayan nonprofit DATA that works with open data and transparency in South America. Follow her @gaba.

Marcos Vanetta | Texas Tribune

Marcos Vanetta is a biomedical engineer truly passionate about software and technology. He is an experienced web developer and an open source enthusiast. Marcos is an active member of the Hacks/Hackers community in Buenos Aires and the lead developer of Mapa76 (aka Analice.me). You can find him in a rock & roll concert or at your closest hackathon. Follow him @malev or at malev.com.ar.

All five fellows will be with us in London this weekend for the Mozilla Festival. If you’re there, do seek them out, say hello, and find out more about them. And, if you’re at MozFest, be sure to track me down and say hi as well.

October 24, 2013 12:00 PM

October 23, 2013

Noah Veltman

On journalism and learning to code (again)

In a recent piece for The Atlantic, Olga Khazan argues that learning to code is a poor use of time for most aspiring journalists who could instead be using that time honing their other skills. Like many of my colleagues who have committed acts of code in a newsroom, it really rubbed me the wrong way, for two main reasons.

First, the author doesn’t seem to have done any reporting for the piece beyond a second-hand tweet and extrapolating from her personal experience. She could have picked up the phone to test her assumptions or gain outside insight. She could have asked hiring managers in newsrooms how much they actually value coding skills. She could have asked j-school faculty why they were or were not adding more technology education to their curriculum. She could have asked news developers what their experience is like working with reporters and dividing up roles on a project. She could have asked journalists-turned-coders how and why they chose to learn. Had she done any of these things, I imagine the piece would have been a lot more accurate, interesting, and constructive.

Second, and more importantly, the article falls victim to a lot of fallacies about code and journalism that keep coming back up in this whole discussion. To name a few:

Conflating learning to code, learning to make things for the web, and technological literacy

One of the most maddening parts of this debate is the way every possible thing that might involve a computer ends up lumped together under the umbrella of “coding.” Let’s introduce some nuance. Broadly, you have at least three different categories where a journalist might seek (or be nudged) to improve, and they’re only loosely related.

  1. Technological literacy - Understanding your medium is valuable. When reporters or designers don’t have any sense of the constraints or tradeoffs in making things for the web, everybody loses. The resulting work is worse, and all sides waste a lot of time due to poor communication and mismatched expectations. I also think a lot of journalists working on web projects overestimate how neatly “technical” decisions can be isolated. Supposedly “technical” decisions tend to have real editorial and design implications, especially when they have to be made hastily on a deadline. If you can’t have an informed conversation about those decisions, you’re handing over the keys to the people who can.

    Now, is actually learning to code yourself a good use of time if you’re just trying to gain a better understanding of the web? I honestly don’t know. It’s a good and important question. But I’m pretty sure reading the JavaScript Wikipedia entry isn’t going to get you anywhere.

  2. Learning to code for research and analysis - Khazan talks a lot about positioning yourself to get hired and a lot less about whether technical skills might help you keep the job by actually being good at it. If you want to work on a subject like school performance, crime, government spending, or any of the countless others that involve complex data, having a technical toolset is important. A little bit of code can give you a big leg up in terms of finding, cleaning, and exploring data. If you think you can compartmentalize the “data” work and give it to someone else, or that you’re fine only reporting stories you can find browsing Excel, someone else is going to eat your lunch.

  3. Learning to make things for the web - If I set aside my own bias as a web developer, this is probably the category I’m least sanguine about for a broad audience. I certainly think a basic working knowledge of HTML and how the web functions is necessary, but I’m willing to buy the argument that we shouldn’t send journalists who really are just looking to write too far down the web rabbit hole. This is mostly because, whereas a journalist who dabbles in using code to analyze data can get real immediate value out of a few tricks, the same is less true of the web. Once you get past the frisson of excitement you get the first time you switch from web consumer to a person who just made a real live web page, there’s a long road before you can make something complex that could go on your news organization’s website. You have to put in a lot of reps before you can wrestle with all the little gotchas of making something for public consumption on every imaginable browser and device.

    That doesn’t mean I would discourage a young journalist from poking around with web technologies. Far from it.  I love the web, and I happen to think it’s a lot of fun even when the stuff you’re making kind of sucks. But if it turns out not to be your idea of a good time and you want to draw the line at the basics, more power to you.

Arguing against “every journalist must learn to code”

There are some people out there who make it sound like all journalists have to become software developers. This is a silly position. And I sympathize with Khazan that the people who beat the “learn to code!” drum indiscriminately do everyone a disservice, and may even put more people off coding than they draw in. But I don’t think it’s fair to claim that “everyone” is “always” telling journalists to learn to code; arguing against that reductive version is a straw man. Last I checked, journalism schools don’t exactly suffer from a dangerous glut of technical education.

Of course “every journalist must learn to code” is a silly proposition, just as “no journalist should learn to code” is. Journalism is not a monolith. It depends on what sorts of stories you’re trying to tell and in what media. I will freely grant that some journalists have goals that won’t benefit much from technical savvy or coding skills. Khazan may be one of them. But it’s strange to take an anecdotal case and just suppose that it applies to a majority, or even a substantial minority, of young journalists.

“Serious coding is for people with computer science degrees”

I run into this assumption a lot, that people who code in newsrooms must largely be trained computer scientists. It’s really not true. Anecdotally, very few of the newsroom coders I know have a computer science background. I’m pretty sure Chris Groskopf dreams in Python, and he was a philosophy major.

But don’t take my word for it, I actually tried to gather some data on this subject since it keeps coming up. What did I find? Only 1 in 4 news developers studied computer science in school, and nearly half of them didn’t start learning to code until the end of college or later. The sample wasn’t perfect (if anything, I suspect it actually overcounts computer science majors), but it’s probably a lot closer to the mark than idle speculation.

This makes sense: if you’re the kind of person who decides to study computer science and sticks with it, you probably have a talent or affinity for the inherent puzzle-solving of programming, and will be right at home working at a software company solving hard technical problems (where you’ll make a lot more money). Coding in the newsroom tends to be less about deep technical puzzles and more about storytelling and design, and attracts people who are interested in the world and just happen to use code as a tool while they figure it out. Think MacGyver, not Edison (the web: it’s paperclips all the way down).

Treating learning to code as an all-or-nothing proposition

A lot of people have unreasonable expectations about what learning to code actually looks like. Despite what the latest crop of “teach yourself to code” hucksters will tell you, you don’t get to go from zero to web developer in 4 weeks. Learning this stuff is a long, challenging, humbling process. There may be a few people who have such a supreme aptitude for it that they glide right through and never struggle, but I have yet to meet one. The frustrations Khazan describes felt very familiar to me, as I suspect they would to any developer. Ask the most hardcore coders you know and they’ll tell you that they too get stuck and then want to tear their hair out when they realize they wasted an afternoon over one lousy semicolon.

But here’s the thing: learning to code is not all or nothing. There seems to be this sense that deciding to learn to code is a radical act of self-redefinition, that you are embarking on a dramatic journey. If you think of it this way, and you think that you have to slog through for three years before you get any value out of it, I can understand why you would look at the investment required and say “no thanks.” But it really doesn’t work like that. There’s no blood oath, I promise.

Journalists and journalism students (and journalism professors) should quit thinking about “learning to code” in the abstract. Instead, think about the stories you want to tell, and to the extent there are ways that code would help you tell them, learn what you need for the situation. Different journalists will benefit, or not benefit, in different ways. Don’t sit down with a big boring book and an online course and declare you’re going to learn Python. You’ll probably get stuck, get bored, and give up. Set out to build something you like, or explore some data you care about, and figure out what you need to learn to make that happen. And don’t go it alone; ask your developers for help, or find a community of other learners to collaborate and commiserate with.

Learning to code is not like learning calculus, with some big fixed corpus of knowledge you need to absorb. It’s more like learning to be handy around the house. You start off knowing nothing, and then as needs come up you learn bits and pieces without a grand plan, weekend by weekend, with plenty of hammered thumbs and structurally unsound carpentry. Slowly but surely, those bits and pieces coalesce into something approaching expertise. You build up the confidence to be bold and take on problems you don’t yet have any idea how to solve. And everyone ends up learning a different subset of things, which is fun, because we can all help each other. Want to learn a little bit? A lot? SQL? JavaScript? Excel formulas? All of the above? None of the above? Great. It takes all kinds. But if you want to avoid picking up technical skills, do it because you’ve honestly evaluated the kind of journalist you want to be and found that they’re not going to help, not because you’ve just written them off as Things That Are Not In Your Job Description.

October 23, 2013 10:19 PM

October 17, 2013

Erika Owens

Celebrating investment in OpenNews and the journalism code community

Hack day participants

Today the Knight Foundation announced its renewed commitment to Knight-Mozilla OpenNews, with a $4 million grant. That's a whole lotta money. Big news. It's an investment not in a product, but in a community.

Earlier this year, the CEO of the Knight Foundation publicly asked for feedback about the Knight News Challenge and how to help their grantmaking have more impact. If I may be so bold: this is it. OpenNews has plans to build some amazing stuff, but more than that, this project is about acting as the connective tissue in the journalism code community. It exists to meet people where they are already at and support them in deepening their engagement, their skills, and their leadership in this community. It's rare that an organization has the chance to focus on building connections between people, institutions, and projects. But that's how we get to spend the next three years.

Journalism isn't the only industry plagued by silos and duplicative work. During editorial board meetings with the Notebook, it came up time and time again that programs had been tried in schools only to be abruptly canceled, whether they showed positive results or not. A new administration would take over and the cycle would start again - test idea, kill idea, forget idea happened. It was just so clear that it shouldn't have to be that way. There should be some way for programs to be tracked, supported, given room to thrive or fail, and when they worked, be replicated.

It seemed so simple.

But in an environment of overworked staff and underfunded institutions (sound familiar?) it's hard to find the time to think beyond the day to day, nevermind to take a step back and plan, organize, and implement longer view plans. OpenNews has been given the resources it needs to play that eagle-eye role in the journalism code community. With the luxuries of time, money, skilled staff, and a highly engaged, international community, OpenNews has the opportunity to show what can happen when you invest in community, in building connections, in supporting the development of leaders.

I just gave a session at the Online News Association conference about journalism hack days, one of the areas of community support we'll be able to continue and expand with this grant. It was a chance to share some of the amazing examples of what hack days can create as well as the cultural change it can foster. Several people in the session were interested in planning hack days and were eager to learn more about how to build energy and interest in journalism hacking. We've already supported hack days all over the world, and I'm looking forward to helping even more people plan events in their local community that offer opportunities for creating connections and code that will ensure journalism thrives on the web.

I'm tremendously excited about the next three years. We're gonna do intensive evaluation into our work and share it back to the journalism community, and I hope, the nonprofit community more broadly as well. It's rare to get the chance to do something new, necessary, and that you know has the power to improve how we support vital civic institutions. I'm awed. I'm ready. Let's show what ground-up community support can do for journalism.

October 17, 2013 08:41 PM

Dan Sinker

OpenNews: Ascent Stage

“How can we help?” When I first joined OpenNews (at the time it was called the Knight-Mozilla News Technology Partnership–a mouthful to be sure), I asked that question a lot. If I was in a room with news developers, it was one of the first things out of my mouth. If I was sending e-mails, it was toplined. If you had a beer with me those first few months, I asked it. If we went on a walk, I asked it. If we passed in an airport, I asked it.

When the answers came–varied and honest and clear–they helped to transform the program, turning us from simply a fellowship program that placed technologists in newsrooms into a program that also helped support the nascent journalism code community through initiatives like Source (one year old yesterday) and our journalism Hack Day sponsorships (more than 40 in 20 countries since Spring 2012).

And now, today, I can’t believe I get to announce that we’re transforming even more. Thanks to a significant grant from the Knight Foundation, OpenNews will be expanding our work helping to strengthen the community creating code in journalism through 2016.

The core work we’re doing is continuing:

But we’re also doing a bunch that’s new:

This list is just the start. With three years of runway, we’ll be taking off in all sorts of new directions as well.

Everything we’re doing–new and old, on this list and still-to-come–comes from talking, collaborating, and building with the incredible community of newsroom coders, civic hackers, open-source contributors we’ve met through the work we’re doing at OpenNews. It’s a vibrant, growing community that is not only transforming journalism, but also the web itself.

We’re incredibly lucky to call this community home and to be able to help it thrive. The next three years are going to be amazing.

Let’s do this.

October 17, 2013 01:00 PM

October 09, 2013

Brian Abelson

Whither the Pageview Apocalypse?

This is the first in the series of two posts about pageviews. This post will deal with some of theoretical baggage tied up in the metric, while the second will detail some research I’ve conducted on the correlates of pageviews.


“Wild, dark times are rumbling toward us, and the prophet who wishes to write a new apocalypse will have to invent entirely new beasts, and beasts so terrible that the ancient animal symbols of St. John will seem like cooing doves and cupids in comparison.” – Heinrich Heine, Lutetia; or, ‘Paris’, Augsberg Gazette, 1842


The pageview is dead. Jeff Jarvis presided over its wake in 2007, soberly preparing us for a brave new world of Flash, AJAX, and embeddable widgets in which a page was no longer just a page. Chartbeat announced its death as early as 2012 and most recently in a sponsored post for the Online News Association’s upcoming conference. My current position as a 2013 Knight-Mozilla OpenNews Fellow was granted a messianic status when some equated the fellowship to a violent crusade against the metric.


Yet, when I open Google Analytics, I am presented with a time series chart of pageviews. Most newsrooms I visit include a ‘big board’ with a list of the top ten articles by pageviews.** And when I sit in analytics meetings, I regularly hear the metric tossed around as a means of benchmarking one article against another. If, as I’ve been led to believe, this is a post-pageview world, then we must be living in a zombie apocalypse as I’m relentlessly haunted by the metric’s lifeless corpse.


What then of the pageview apocalypse and the prophets who giddily proclaim it? To what ends are these revelations leading us? What strategic aims and benefits are these claims predicated upon?


In Jacques Derrida’s 1982 essay, Of an Apocalyptic Tone Newly Adopted in Philosophy, he writes of a tendency prevalent in academia in which scholars paradoxically announce the ‘death’ or ‘end’ of their fields. This same tendency can be located in art and culture when critics bemoan the death of Hip-Hop or Punk. The last decade of the News Industry has often resembled the final minutes of Reservoir Dogs, with publications announcing the demise of their counterparts only to be shot down themselves in the following frame. There is even an online newspaper dedicated to the death of newspapers.


For Derrida, though, such apocalyptic declarations are intriguing not for the end they depict, but for the transformative visions embedded within their rhetoric. Grounding his discussion in etymology, ‘apocalypse’ is derived from the Greek apokalupsis which translates to “reveal” or to “uncover.” In the Hebrew Bible, the equivalent word gala is used over a hundred times saying in effect “disclosure, uncovering, unveiling, the veil lifted from about the thing,” most often in reference to the sex and/or genitalia of a man or woman, but also in reference to their sensory organs (eyes, ears, mouth). ‘Apocalypse’, then, literally means the act of uncovering: the removing of clothes, the shifting of hair, or the unveiling of eyes to reveal a secret or unknowable existence just beyond the surface. In this manner, an apocalypse is “essentially a contemplation,” or a meditation upon a veiled state, which is structured by a desire of a particular disclosure or revelation to its thought process.


So what is being revealed when the death of the pageview is proclaimed? Our first clue comes from the coroners. More often than not, they are the stewards of Web Analytics, an industry that was largely built upon measuring pageviews. From Chartbeat, to KISSmetrics, to WebTrends, and Nielsen, it seems like every analytics company has shared in the apocalyptic glee. Mixpanel, a particularly brazen analytics startup, even purchased billboard space along Highway 101 to announce the death of the pageview:


Remind you of anything?




Why then are so many analytics companies taking up arms against themselves? What is the meaning of this cargo cult of counter-analytics? Here, we can use the common structure of the above billboards as our guide:


  1. Loudly announce the end.
  2. Suggest a counter-action.


In almost every case, when we are told the pageview is dead, we are given a list of metrics that will take its place. This is the revelation. Instead of pageviews, we’re advised, we should be quantifying engagement, trying out A/B tests, conducting funnel analyses, or deploying click/event tracking on our sites (conveniently enough, these tools have often just been added to the next iteration of their platforms). And it’s not that these metrics are useless - they can be of great value when designed and deployed correctly - it’s that, in lieu of a critical assessment of how and why pageview-centric platforms failed us, we are instead told that our egotism led us to pay attention to the wrong things in the first place.


Yet, having experimented with many “actionable”, rather than “vanity metrics,” I can tell you that their results are often just as murky and misleading. Engagement is a moving target; A/B tests, when poorly designed, often produce inconclusive results; event tracking, while incredibly powerful, does not readily enable comparisons across varied contexts. And, even when these tools are utilized to their full potential, it can be very difficult to translate their insights into action. The fact of the matter is that there are no silver bullets, no secrets to be revealed just beyond the pageview. All there is is hard work, open dialogue, and relentless experimentation to find what works in your particular context. After all, we’re talking about measuring the complex behaviors of millions of people.


Still, many will try and seduce you into believing otherwise. This act of seduction, Derrida explains, is the principal strategy of apocalypticism:

“the subject of [apocalyptic] discourse can have an interest in forgoing its own interest, can forgo everything in order to place yet its death on your shoulders and make you inherit in advance its corpse, that is, its soul, the subject hoping thus to arrive at its end through the end” (52).

In this powerful, if enigmatic passage, we begin to understand the true motivations of apocalyptic prophets. Doomsayers do not merely seek acknowledgement of an end-to-come or one that has already passed, they are more concerned with seducing you into accepting the terms on which their continued existence, their vested interests, and their vision of ‘the end’ are all equally possible. It’s not that analytics platforms are flawed, they say, it’s simply that you’re not paying attention to the right parts; It’s not that insights are difficult, they promise, it’s that you’ve been going about finding them in the wrong way.


So when we are told that the ‘pageview is dead’, what we are actually being told is that ‘analytics platforms are dying’; that the current paradigm is fundamentally flawed and that the companies responsible are scrambling to convince us otherwise. And, rather than accepting their share of the blame, they cite our inherent egotism - our blindly narcissistic desire for validation - in a plot to absolve themselves of guilt and justify their importance. In this vision of the apocalypse, Babylon is inhabited by the users of metrics, not their marketers or makers.


“This is the way the world ends. Not with a bang but a whimper.” - T.S. Elliot, The Hollow Men , 1925

While the big data bubble has inflated quickly, it will not simply ‘pop’. So, instead of worrying about whether we’re measuring the wrong things, or using the wrong tools or software, or falling behind the competition, let’s take a deep breath, ignore the doomsayers, and do the best we can with what we have right now. And, if after a while, that’s still not working, then perhaps we should reassess precisely why, in what manner, and by whom we were convinced that analytics would solve our problems in the first place.



Correction: The original version of this essay contained the sentence, “When I glance at the Chartbeat dashboard looming above my desk, I see a list of the top five articles by pageviews on the New York Times’ site.” In fact, Chartbeat does not measure “pageviews”, but “concurrent visitors.” Read more about it here.

October 09, 2013 04:00 AM

September 13, 2013

Friedrich Lindenberg

Why German elections don't make sense

There's only a week left on the German federal campaign, so I've decided to take a closer look at the mechanics of the actual election by implementing the complete tabulation process in a JavaScript library, btw13.js (see it in action).

Tally of the 2013 German elections, based on btw13.js's calculations of seat allocations.

Our electoral system is notoriously complex, with 299 districts electing both a direct representative, and party representatives from one of 16 state lists. Theoretically, this should leave us with a total of 598 members of the Bundestag (MdBs). To ensure proportionality between the state and district votes, however, additional seats are created in some states, based on the outcome of the vote. This has lead not only to a total of 620 MdBs in the current session, but also to situations in which additional votes in a particular state can cost a party seats in parliament.

Of course, more votes should mean more seats, so in 2012 the German constitutional court declared the rules used in the 2009 election unconstitutional. This resulted in a lot of backroom negotiations in parliament, leading to an even more complex electoral law for this year's election. The most politically attractive solution to negative vote weights turned out to be simple: more parliamentarians.

Basically, the new system works by first calculating the minimal number of seats that each party should get in each state, then adding them up and finding a seat constellation in the federal parliament which has the proportions of the federal vote, but at least as many seats as each party has been allocated in the states. Finally, the seats then get distributed down to the states, where one of the parties can somehow still end up with fewer seats than they've been allocated in the first step.

All of this is really complex and weird, so I decided to learn about it by writing it up as an algorithm. The resulting script is a nice way to parse the data feed released by the German elections authority on election evening, and to turn it into a tabulation of state and federal seat distributions. Which, by election evening, I'm hoping to turn into something similar to the NPR Big Board.

Sample data released by the German elections authority contains an extreme case of the possible election results. It shows a constellation - just a few points off current polling - in which we end up with 810 seats in the Bundestag. That's more than all of the US Congress, and more than the bloated, 28-country EU Parliament with its 766 seats.

One way to fight unemployment.

September 13, 2013 12:00 AM

September 12, 2013

Friedrich Lindenberg

Twindle: lessons learned from data mining Twitter

Twitter is the new voxpop: a quick way for news organisations to show what the people think, without the actual hassle of talking to any. Over the last few months, I've spent some time recording a subset of German Twitter traffic as a way to track public engagement on political issues during the current campaign season. While the resulting application didn’t end up getting launched in time for the elections next week, the process of building the necessary tools was a great learning experience. In this post, I want to take a look at the lessons we’ve learned about tracking the Twitter status stream.

Topics of the campaign in Germany, generated by applying regular expressions to Twindle's database of tweets.

Our tool: twindle

When we started the project, I was convinced that there would be some off-the-shelf toolkit for grabbing and storing data off the streaming API. But while there are plenty of library bindings for different programming languages, we didn't find a ready-made script to store the data. That's why we're thrilled to release twindle, the tracking software we've built, as an open source set of tools.

Using twindle is simple: editors can enter search terms into a Google Docs spreadsheet, and a few moments later the software will incorporate these new searches into its configuration and begin tracking all related messages.

The core software behind this is based on node.js, as an asynchronous system seemed a good match for the streaming service (although, eventually, node did crash at one point, losing over 4.5mn unstored status messages). Our analytics tools, on the other hand, are based on Python, which ensures easy re-use of a wide range of machine learning libraries. Statuses themselves are stored in a Postgres database, while bundled copies of the recorded data are uploaded to S3 for batch analysis. Violating the golden hipster developer rule ("social media data can only survive in NoSQL technology no older than six months") has given us a lot of flexibility to quickly test a query, while still processing a few hundred thousand statuses every day without major issues.

After going through a couple of iterations, twindle is now performing reliably and quickly. Still, the streaming API is an odd thing, and we've had to learn a few lessons on our way.

Lesson 1: Build a dragnet

As we started, our plan was to focus on keyword tracking - essentially curating a wide range of topical searches which we would monitor. While this produces a good signal/noise ratio, it also results in virtually zero ability to detect upcoming trends organically or to backtrack on an emergent topic.

At DataHarvest, Gavin Sheridan of Storyful offered their solution to this problem: building a dragnet. After initiating a few keyword searches, a tool can identify multipliers and track them directly.

To that end, twindle will score users by how many of their messages have been relevant to our keyword searches and follow all future updates of those who score highly on this metric. This way, we are identifying a dynamic group of about 80,000 users who become a survey panel in addition to any pre-defined search terms.

Lesson 2: Count your retweets

This one should be obvious, but it really didn't occur to me before it had cost us two weeks worth of data. Since the streaming API forwards messages at the precise moment when they are created, their retweet count will always be zero. In order to identify retweets one therefore has to count them by hand via a field which is only set for retweeted messages. I'm fairly sure that this method will not yield a perfect retweet count, however, although a few samples I've taken have been accurate.

Network of interactions between parliamentarians and other politicians during the last month of the campaign.

Lesson 3: Store in a bucket

The conversion from Twitter's JSON format into our relational data schema is somewhat lossy, so we decided to also keep a copy of the original format. While we initially stored this in the database, the additional data put unnecessary strain on the server.

Uploading bundles of tweets to S3 as plain files reduced that pressure while establishing a flexible archive. The S3 bucket holding this data can be accessed directly through Elastic Map Reduce jobs, giving us a secondary data mining mechanism capable of large-scale aggregation.

Lesson 4: Queue up

After Twitter kindly granted us elevated access to their API in mid-June, the initial node.js application became less responsive. Since not loosing any messages is a priority, we uncoupled the stream reader from the database backend through a message queue, creating an additional buffer in between the two layers. This solution has worked great and also allows us to tune and restart the database without missing out on any data.

Lesson 5: Make sense of it all

Twitter data in itself is incredibly messy. Even with keyword and language filters, most tweets will still be about someone's cat, food or just random spam. One man's hashtag for a liberal party is another's scathing insult.

Twindle includes a set of analytical scripts which help to classify, geocode and aggregate the collected messages. These are fairly simple for now, but we're hoping to prop them up and to include patches from other people as the project evolves.

Developing a community-driven set of easy-to-use, well-documented and well-tested data mining scripts for Twitter would make a great resource. The evolved Twitter API may be tricky and full of subtle issues, but running queries against the collected data should be fun and productive.

This makes twindle an invitation to collaborators. We'd love to see further scripts and tools added to the code base and to hear about other people's use cases and deployments.

September 12, 2013 12:00 AM

September 03, 2013

Erika Owens

The amazing, overwhelming Knight-Mozilla Fellowship applicants

2014 applicants

It was great fun watching the applications to the Knight-Mozilla Fellowship roll in. People applied from all over the world. We got 102 more applications than last year. More women applied. More awe-inspiringly qualified people applied. It's been a humbling and exciting experience. So, how'd we get to this point of a wealth of amazing applicants, and where do we go next in the process?

Rundown of the numbers

  • Applicants for 2014: 265
  • Applicants for 2013: 163
  • Percentage increase in applicants: 62%
  • Women applicants in 2014: ~50, 20% of total applicants
  • Women applicants in 2013: ~10, 5% of total applicants
  • Example geographic distribution, 2014:
    • Argentina - 29
    • US - 88
    • India - 15
    • Kenya - 6
    • Germany 7
  • Example geographic distribution, 2013:
    • Argentina - 9
    • US - 47
    • India - 6
    • Kenya - 3
    • Germany 3
  • How'd people hear about the Fellowship? (question only asked 2014)
    • Twitter - 47
    • Friend/colleague - 35
    • List serv - 20
    • Current Fellows - 16
    • In-person event - 16
    • La Nacion - 11
    • Facebook - 11
    • Mozilla - 9
    • Misc people, blogs, modes of communication - 2-4 each

Getting more applicants

This was the second year we conducted a traditional application process. We had a good template to work from, the 2012 Knight-Mozilla Fellows had already completed their Fellowship when the application opened, and the 2013 cohort was months into their Fellowship. Applicants had a much better idea of what to expect from the Fellowship and we had a better idea of the types of concerns people had. We conducted several Q+As on our community calls and spruced up the Fellowship section of the OpenNews site with a more extensive FAQ and description of the growing community of Fellows.

As the referrals show, the Knight-Mozilla Fellows themselves have been an important way that new applicants find out about the Fellowship. Many applicants mentioned meeting Fellows at in-person events or following them on Twitter. These personal connections are built throughout the year, and then with a focused two-month application window and individual recruitment, we heard from many more applicants this year. By having a solid plan for recruitment and a distributed network of allies and ambassadors, the applicant pool grew in both size and in quality and level of experience of the applicants.

The focused two-months of promotion and personalized outreach resulted in a steady stream of applications, and then a flurry right before the deadline: 95 applications this year arrived in the last two days, with 18 coming in the final hour alone. Last year the final days were also busy, with 58 applications in the last two days. Interestingly, six of the eight people who became Fellows applied within two days of the deadline. And one applied within 11 seconds. Will be interesting to see if this last-minute trend of Fellows continues, and learn what we can do to help nudge more people to submit that application.

Getting more women applicants

We made an explicit effort to recruit more women applicants as well as applicants who represent all of the types of diversity of the communities we serve. And, it turns out that when you make that effort, people actually apply.

I asked people for feedback and advice over the past year about how to recruit women in tech and one of the things I kept hearing was about needing to give people lots of opportunities to get "asked" to apply. This is good advice in general and we were sure to add explicit and implicit asks throughout our marketing materials. Beyond that, person to person outreach was really key here. I posted to women-in-tech list servs like Tech Lady Mafia and Ada Camp and asked for help spreading the word, which people did via Twitter, forwarding to other list servs, and personal recommendations and support. I also emailed individuals specifically asking them to recommend women and people in other under-represented groups in technology for the Fellowship. This explicit ask, rather than a general call for recommendations, is something I read about as being helpful in getting a diverse array of event speakers and that was definitely true here as well.

I would also attribute the increase here to the same dynamic at play with the overall increase in applicants: year-round relationship building. Targeted follow up over the summer helped push people to click that submit button, but the real work is in getting to know people and communities all year so the first communication wasn't an out-of-nowhere ask.

Maintaining a balance of geographic distribution

It was extremely encouraging to see that the geographic distribution of applicants held steady from 2013 to 2014. In 2014, there will no longer be a Fellowship news partner in Europe, but it is clear that it did not deter applicants from that region. In fact, we got more applicants from Germany this year even though last year two news partners were in Germany. Again, this can largely be attributed to the existing networks we have in Europe and the presence of so many Fellows and alumni.

One noteworthy difference is the increase in the number of applicants from Argentina. This is also attributable to the presence of a Knight-Mozilla Fellow in the region as well as the strong promotion and community building efforts of La Nacion. Similarly, the networks of Ushahidi and Internews Kenya resulted in more applicants from Kenya and throughout Africa this year. And even as a larger proportion of news partners will be in the U.S. next year, the amount of American applicants only rose 5%.

OpenNews works with organizations all over the world to support hack days and that year-round work was reflected in personal referrals from hack day organizers and numerous examples of applicants finding out about the Fellowship via in-person events. Many people also keep up with us via Twitter, and the online and offline opportunities for engagement help reinforce one another. Plus, each interaction is another chance to offer another "ask" to applicants to finally push them to go ahead and apply.

What next

It's going to be a busy September. By the second week of September, we will notify all applicants of their application status. At that point semifinalists will move forward to an interview round. We'll follow up with people not selected as Fellows about all the other ways they should stay connected to OpenNews. After interviews with semifinalists, we'll move to a finalist stage of interviews with news partners. Fellows will be notified of their status by early October. Public announcement of the 2014 Knight-Mozilla Fellows will take place at the Mozilla Festival in London on October 25-27.

September 03, 2013 02:22 PM

August 30, 2013

Friedrich Lindenberg

Why civic hackers should hang out in a newsroom

In Spiegel's snack bar, the sixties are going strong.

Coming out of a successful hackday, you usually need two things: a good night's sleep and ten months of time to grow the things you've worked on into a real product.

While sleep is easy to come by, finding the time and the environment to work on crazy ideas is much harder. This is why, after visiting open government-related hackdays for three years, I'd begun to dread these events: I'd meet interesting people and begin an exciting project with them, but then wouldn't find the time to follow up.

Things felt very different in January, when I flew to Boston to join the 2013 OpenNews fellows for a hackday at the MIT Media Lab. Collaborating on our projects that week, we knew that we had the backing to grow things and to get them out into the real world. We're still working on these tools now, a hacked-up collection of scripts to search data for news stories has turned into a real software project, datawi.re.

Pushing a project forward means talking to the users of your tools, or finding out how you can build an audience for the information that you're trying to share. Newsrooms are great places for both of these things: reporters and editors are demanding and critical customers. And, needless to say, publishing a nicely prepared dataset on a high-traffic news site is a quick and effective way of getting it out there.

For example, I've learned a lot by pitching tools developed in the OpenNews community to my host organisation. DataWrapper is now in regular use on the site, while more advanced tools like Overview, Poderopedia and OpenSpending have spawned debates about the types of data gathering and analysis activities Spiegel and Spiegel Online should engage in.

Learning about these technologies is just one aspect of the community that sourrounds the fellowship. During a two-week visit to the US, Annabel and I managed to not only visit the Guardian's US offices, ProPublica, the New York Times and Google NYC, but we also got to attend the MIT's Civic Media conference, an incredible melting pot of the worlds of civic technology and news.

Visiting Brian and his newsroom graphics rendering farm at the New York Times.

Five months into my fellowship, I'm certain that the best place to learn how to engineer great civic applications is in a newsroom. Working to create narratives that feed into a news cycle, address a wide audience and tell a clear story is an amazing challenge for any technologist. Being involved in journalistic projects as a fellow puts you in the center of a three-way interaction between reporters, the data at the core of your story and the technologies used in its presentation.

At the same time, being a fellow gives me the independence to experiment with new technologies and to go outside of the organisation. During the past few months, I've met people from all across the globe and learned about their ideas and work. This is a great time to be involved in this discussion, and the fellowship will put you right at the center of it.

Apply, and join us.

p.s. Have I mentioned how fun all of this is? I'm writing this post in a circus tent at OHM2013 near Amsterdam, surrounded by 3000 international hackers discussing the future of privacy, home-made technology and code.

August 30, 2013 07:00 AM

August 16, 2013

Dan Sinker

OpenNews: Eleven Seconds

Eleven seconds. When he hit “submit form” on his Knight-Mozilla Fellowship application, that was all the time that was left between Brian Abelson getting a Fellowship at the New York Times and, well, not. Reflecting back on it now, Brian remarks that it was “incredible that I was that close to missing this life-changing experience”

Eleven seconds.

We’re down to the wire on applying to become a 2014 Knight-Mozilla Fellow—the deadline is Saturday August 17 at midnight (technically Saturday at 11:59:59pm), and if you’re worried that it’s already too late, remember Brian Abelson’s story. Because he only had eleven seconds to spare—as last second as anything gets—and he became a Fellow.

So what took him so long? “The main thing that held me back was the fact that I had no easy way to share the projects I had worked on,” Brian says, echoing a similar concern we’ve heard from other applicants. Plus, he says he looked at current Fellows’ websites and code on Github and “I felt really intimidated.”

That feeling of intimidation is natural—and is one we’ve heard repeated by every person that has been awarded a Fellowship. Current Fellow Noah Veltman explains it this way: “The crazy thing is, I almost didn’t apply. I didn’t even think I was a candidate. I had never studied computer science, I just tinkered with code in my spare time because I had fun projects I wanted to try.”

For Brian, the struggle with “imposter syndrome” (as well as setting up a Tumblr to showcase his work), ate up most of his time on the final Saturday to apply. “It actually took me so long to complete everything that I had to bail on one of my best friend’s birthdays to finish. She actually told me, ‘I’ll only forgive you if you get the fellowship.’”

He got it. With eleven seconds to spare. So can you.

If you love to code—whether you’re a “tinkerer” like Noah or seasoned developer looking for meaningful challenges—it is not too late to apply to become a 2014 Knight-Mozilla Fellow. We designed our application form to be quick—five short questions and some links to projects you’ve made—so you can get it done between now and midnight Saturday. But you have to apply.

"When I finally pressed ‘submit,’" Brian remembers, "I felt totally dejected. Not only had I just jeopardized a friendship, but I had done it for a fellowship I didn’t think I had a chance at getting." Six months into his Fellowship year at the New York Times, Brian is still surprised at it all. "Given all this, I guess you can understand how shocked I was (and still am) that this happened to me."

It can happen to you too. But give it more than eleven seconds. Apply now.

August 16, 2013 10:18 PM

Annabel Church

Apply now and get one year **free***

So, tonight is the last night to get your application in. Tomorrow, applications close to become a Knight-Mozilla Fellow in 2014. Maybe you still don’t know why you would apply. Why you should join us on a ride of a lifetime. Why you shouldn’t be afraid of what happens after.

The fellowship by Laurian Gridinoc The 2013 Knight-Mozilla Fellows at the MIT Media Lab. Photo by Laurian Gridinoc

I spent a few days at OHM recently, sleeping in a tent, woken by the sun at an absurd time after listening to the heavy beats of breakcore late into the night. Fibre was laid into the ground, and Data Toilets had been set up as access points to provide internet out to the distant trees. People hacked on hardware, software, locks, rings, beer and ham and cheese toasties.

In that green field I hacked too, and when OHM was over I went home to Berlin to hack. The fellowship has allowed me the privilege of meeting those who are in the news technology community from all over the world, to observe, hack and make journalism as a part of this ever growing community, regardless of which field they are sitting in.

My fellowship feels a bit like an extended OHM, a summer camp, surrounded by a tight-knit community in a world that sometimes seems like magic. I think you should apply and join the community. It is not too late.

If you are a coder, seething with passion and an ability to make and create. If you want a year freely exploring a newsroom and hacking on journalism. If you want to join a community of passionate people at the intersection of technology and journalism. This is the place for you.

* Nearly a year. A bit like summer camp.

August 16, 2013 12:00 AM

August 15, 2013

Dan Sinker

OpenNews: Looking vs Leaping

I went to a water park with my son this weekend. He’s an analytical kid and, even though he’s been talking about hitting the big slides all summer, once we got there he put the breaks on pretty fast. They were too fast, too tall, too crazy. We spent about 25 minutes just watching a fast tube slide, talking it all through, before he finally agreed to get in line.

"You know how I am," he explained, "I like to look before I leap.”

The line wound around, and then there was a long climb up a tall hill to the top of the slide. We got the raft into the water, and he freaked out. Flat-out refused to get in. A line forming behind us, I picked him up, put him in the tube, and down we went. By the time we hit bottom he yelled “LET’S DO IT AGAIN!”

We picked up the tube and I turned to him and said, “That’s why sometimes you just gotta leap.”

With 48 hours left before the opportunity to apply to become a 2014 Knight-Mozilla, the time for looking is rapidly coming to a close. It’s time to leap.

For the last two months, we’ve been looking for people who love to code—developers, civic hackers, journo-coders, data crunchers, stats geeks, and more—to join us at OpenNews as Knight-Mozilla Fellows, where you’ll spend 10 months creating open-source code, hacking around the globe, working in some of the world’s best newsrooms, and helping to build out journalism’s codebase on the open web. We’ve been looking for two months. There are only two days left. Leap.

Our five fellows will spend the ten months of their fellowship embedded in some of the best news organizations in the world: The New York Times, ProPublica, the Texas Tribune, La Nacion, and (in a joint fellowship) Ushahidi and Internews Kenya. There, you’ll have the opportunity to develop next-generation tools that are tempered in the real-world fires of breaking news. You’ll be in the room when news breaks and you’ll write code to react to it. You’ll create libraries and tools that will shape reporting on the world around you. You’ll write code that makes a difference. Leap.

In addition to working with the incredible colleagues at your newsroom hosts, you’ll also be part of a cohort of fellows—five total in 2014—who will be your collaborators, your troublemakers, and your friends during this adventure. Over your ten months, you’ll have ample opportunity to code together, travel together, and collaborate on ideas and experiments. You’ll make connections that will ripple out past your fellowship year and into the life that grows beyond it. Leap.

We want you to be able to focus on doing amazing work, not making rent, and so during your time as a Knight-Mozilla Fellow, you’ll be compensated well, with a full stipend and additional suppliments for yourself, your partner, and your children. You’ll have the financial support to research, travel, experiment, and build projects you care deeply about. You’ll have the time to dive deep into problemsets, to craft code that truly matters. Leap.

And, most importantly, you’ll join a growing community of journalist-programmers who are helping to redefine what journalism means on the internet and helping to craft code that is transforming the way we understand the world around us.

If you want to do bleeding-edge visualizations, leap.

If you want to scrape and analyze data, leap.

If you want to build applications that help people learn about the world they live in, leap.

If you want to speak truth to power, strengthen civic engagement, and engage communities? Leap.

The time for looking is done. If you want to become a 2014 Knight-Mozilla Fellow, it’s time to leap.

August 15, 2013 09:29 PM

August 14, 2013

Dan Sinker

OpenNews: The Knight-Mozilla Globetrotters

With just three days left to apply to become a 2014 Knight-Mozilla Fellow, I wanted to take a moment to talk about the globetrotting our fellows get up to. While Knight-Mozilla Fellows work out of some of the best newsrooms in the world, they also spend a fair amount of time travelling to conferences, hack events, festivals, or even just to get together and hack.

This is a big part of the “choose your own adventure” aspect of the Knight-Mozilla Fellowships. We want our fellows to be able to dive deep into journalistic problemsets and to join the global community of people writing code to solve them. That means going to where folks are gathering to engage with them, run workshops, show off things you’ve made and collaborate on code together. As a result, our Fellows travel widely. Here’s just a fraction of the places our Knight-Mozilla Fellows have gone during their Fellowship year:

In addition to these trips (and many more not on the list), we bring all our Fellows to “tentpole” events, like a group onboarding trip at the start of the Fellowship (last years was at the MIT Media Lab), the Mozilla Festival in London, the Knight-MIT Civic Media Conference, and the Hacks/Hackers Media Party in Buenos Aires.

Want to see the world while you change it? Apply to become a 2014 Knight-Mozilla Fellow. But don’t delay! Just three days remain to apply. At midnight, Eastern time on Saturday August 17th, the opportunity will close. So apply now!

August 14, 2013 08:21 PM

August 13, 2013

Dan Sinker

OpenNews: With four days left to apply, a look at Fellowship benefits.

We’re counting down the days to the end of the 2014 Knight-Mozilla Fellowship search. As of today, just four days remain—the window closes at midnight Eastern, this Saturday August 17th. If you’re a developer who wants to make a difference in the world by spending 10 months creating open-source code for journalism, apply today.

We’ve spent a lot of time over the last few weeks talking about the myriad of opportunities available in the newsroom for people who love to code, but today I wanted to focus on the benefits available to our Knight-Mozilla Fellows.

We want our Fellows to have the "best year" of their lives, and that means making sure that they’re taken care of during their time as Knight-Mozilla Fellows. To that end, we offer a series of great benefits for our five Knight-Mozilla Fellows:

Our Fellowship benefits are extensive, and it’s worth checking out the full listing and cost breakdowns over on the OpenNews site. The 10 months you will spend as a Knight-Mozilla Fellow should be all about creating, experimenting, and collaboration, not about struggling to make rent or how to cover the cost of daycare, and so we’ve worked hard to make sure our benefits help to make your year the best one possible.

Do you love to code? Do you want to make a difference in journalism and beyond? Do you want to see the world while you change it? Act now. There is very little time left to apply to become a 2014 Knight-Mozilla Fellow—Saturday the 17th at Midnight EST, the opportunity ends. Make the move and apply today.

August 13, 2013 05:51 PM

August 12, 2013

Dan Sinker

OpenNews: Five days left to apply to become a 2014 Knight-Mozilla Fellow

After a two-month search, it’s come down to this: As of today, there are just *five* short days left to apply to become a 2014 Knight-Mozilla Fellow. The Knight-Mozilla Fellowships offer a once-in-a-lifetime opportunity for people who love to code to spend 10 months making a difference by creating new ideas and open-source tools to transform journalism on the web. Al Shaw, a news apps developer at ProPublica, puts the opportunity succinctly: “If what you’re interested in is changing the world and making useful software that tells a story and kicks some ass, please join us.” He’s right. Apply today.

Over the last month, we’ve asked other newsroom developers, our fellows (past and present), and our news partners to blog about writing code in the newsroom and the opportunities of the Knight-Mozilla Fellowships. As we start this final five day push before the end of our 2014 Fellowship search, here’s a roundup of what people have been saying this month:

There’s a lot to read in all these pieces and a lot to think about. But if you love to code and want to become a 2014 Knight-Mozilla Fellow, don’t think for too long: Only five days remain to apply. Once midnight Eastern Time hits on Saturday August 17, the opportunity will be over. Apply today.

August 12, 2013 05:14 PM

August 09, 2013

Dan Sinker

OpenNews: Our 2014 News Partners make the case

This is it. There’s only one week remaining to apply to become a 2014 Knight-Mozilla Fellow. If you love to code, want to make a difference in journalism, and want to see the world while you do it you only have until August 17th to apply.

All this week our six amazing news partners who host our fellows in their newsrooms for the 10 months the fellowships run have been writing about why they’re involved in the program and what fellows can expect in working with them.

Aron Pilhofer at the New York Times, wants their fellow to spend their time at the Times “building an easy-to-use, easy-to-deploy suite of tools to help journalists work with original source documents.” This toolkit will be informed by a deep-dive in newsroom collaboration, Pilhofer explains:

The fellow will be attached jointly to the two teams at The Times most involved in solving the documents-to-data dilemma: Interactive News and the computer-assisted reporting team. He or she will spend 10 months working on real stories with real reporters and editors, the end goal of which will be to develop and, ultimately, release the document toolkit that real people can understand and use.

In Nairobi, Kenya we’re offering a unique joint fellowship, where one of our Knight-Mozilla Fellows will spend time at both Ushahidi and Internews Kenya going deep on data.

Eva Constantaras at Internews Kenya explains the opportunity for a Knight-Mozilla Fellow in Kenya:

Open data has a whole different definition in Kenya and developers have a chance to change the way the Kenyan media reports the news by encouraging data- driven instead of politically-driven journalism. Access is a huge issue: from gathering data held by traditional tribal leaders, to reluctant county administrations to NGOs that have been waiting for someone to come along and make use of the years of data they have collected on everything from female genital mutilation to the impact of progressive farming techniques on food security.

Heather Leason at Ushahidi explains the opportunity for a fellow similarly, though Ushahidi’s focus remains global:

One of our core goals is to connect citizen voices to action. Sometimes this makes us professional tinkers, but it also means that you can dive in into data and maps from around the world on various topics from crisis response, election monitoring or civil society communities like anti-corruption or anti-harassment or environmental action.

Back in New York City, at the nonprofit investigative news outlet ProPublica, news app developer Al Shaw explains “why you should be our 2014 OpenNews Fellow (and join the epic team of awesomeness)”:

The ProPublica news apps desk isn’t a service desk. As a colleague of ours put it, we’re not the deli counter slicing meats to order for the rest of the newsroom. We work side by side with traditional reporters, and often write stories as well as code. We use our telephones as much as we do the command line. We answer to editors, and all our software needs to tell a story. We develop on deadline, meaning no long development cycles or Gantt charts.

At the Texas Tribune, a local reporting powerhouse, Director of Technology Travis Swicegood makes his pitch for a Fellow by focusing on the the local opportunties in Austin, Texas:

Our News Apps team — and our fellow — will be in the middle of the biggest Texas story of the year, making election results available throughout the state via our interactive election coverage. Think brackets, scoreboards, campaign finance, and whatever else we can come up with.

Data has been in the Trib’s DNA since day one — and we’ll be reinforcing that in 2014. Apply today to join us for one hell of a ride.

Did I mention we have the world’s best brisket?

Finally, we jump down to Buenos Aires Argentina for our fellowship at La Nacion. In their Spanish-language appeal for developers to join them, the La Nacion Data team emphasizes the importance of open-source development in the Fellow’s work. The team put together a video with their fellow, Manuel Aristaran to discuss his work and the opportunity at La Nacion:

As Al Shaw at ProPublica puts it: “If what you’re interested in is changing the world and making useful software that tells a story and kicks some ass, please join us”

The time is now. One week remains—if you love to code and want to change the world apply now to become a 2014 Knight-Mozilla Fellow. If you have last-minute questions about the fellowship, we have updated our FAQ, we’ll be hosting another live Q&A on our OpenNews Community Call on Wednesday August 14, and you can always drop us a line or ping us on Twitter. Don’t hesitate to be in touch.

But more than anything: Apply now.

August 09, 2013 03:09 PM

August 08, 2013

Friedrich Lindenberg

ReGENESIS: German statistics as raw data

A choropleth map to indicate the availability of high-quality, machine readable statistical data in ReGENESIS.

One of the first tasks I was given by Spiegel Online was to make a set of simple maps to display basic statistics about Germany - things like population, unemployment or insolvencies. As Germany's statistical data are collected in a system called GENESIS, I though that this would be trivial. I'd just have to write a script to grab the tables once a month, convert them to JSON and thus update the maps.

Unfortunately, while the GENESIS interface offers downloads, they are both hard to access (through an arcane and untested SOAP interface) and hard to parse. Essentially, the tables are reports which have been manually layed out, and getting out a predicatable data series requires you to pretty much write a bespoke parser for each table.

So I decided to solve this issue for others as well and make ReGENESIS, a service and toolkit to provide clean and well-structured data from the German statistical services.

This was inspired by some great examples of similar projects other countries: Census.IRE.org provides a lot of structured data around the US census, and the CensusReporter project is now thinking this through a lot further. I'm also really impressed by the work that Brian and the @csvsoundsystem have been doing on treasury.io, a convenient data source with a ScraperWiki-based SQL query endpoint and client bindings for a variety of languages.

ReGENESIS is powered by a collection of Python scripts available on GitHub. The scripts will first scrape bulk data exports from the official site and store them locally. These are then processed and loaded into a database, retaining a rich set of metadata as well as the actual observations. Then, the database contents are dumped to CSV file extracts, two for each dataset:

Finally, Flask helps render a simple user interface to flat files to represent the metadata. Finally, the entire site is uploaded to Amazon S3 so that no server is required to serve any of the content. This makes ReGENESIS easy to maintain, all I need to do is run the extractors once a week to make sure that we're offering the latest data.

Not really related, but that TV show was a lot of fun.

Whats next?

Obviously, ReGENESIS is in a very early prototype stage and a lot of the use cases and usability hasn't really been ironed out at yet. Beyond that, there are plenty of ideas for the future.

Go federal: At the moment, I'm only importing data from the Regionalstatistik portal which publishes statistics from state level authorities. The much larger GENESIS database operated by the federal statistical office has its bulk export function locked down and requires a EUR 500 annual subscription. Maybe this could be an opportunity for an open data kickstarter?

Have an API: ReGENESIS holds some fairly large tables, and in order to pull them into interactive graphics or other client applications it would be nice to serve filtered and aggregated versions instead of the full data. I'm somewhat reluctant to run a server for this (something like Stefan Urbanek's cubes), but most of the hosted data API tools I've checked out so far are either too expensive or very limited in terms of capacity.

Rank notifications: when I pitch ed ReGENESIS at a data journalism meetup earlier this year, one request was to ease access to local statistics for reporters at regional papers. This could, for example, be done through email alerts which notify journalists when the relative rank of their regions on any of the major statistics sees significant change.

Map it out: just before this release, I was contacted by Felix, one of the StadtLandCode grantees. He's been working on getting regional statistics for a while and has done a lot of mapping work to generate customized maps from the data. As the ReGENESIS gives him the data in the form he needs, we've agreed to cooperate on integrating his GeoJSON map layers with the service.

Of course, I can't do all of this on my own. That's why I'm releasing this early: for you to get on board now and to try it out, to contribute your use cases and, of course, your code!

August 08, 2013 12:00 AM

August 05, 2013

Dan Sinker

OpenNews: Knight-Mozilla Fellowships: "What do you mean, I get to do whatever I want?"

The 2013 Knight-Mozilla Fellows at the MIT Media Lab. Photo by Laurian Gridinoc

The question in the title of this post is posed at the top of Knight-Mozilla Fellow’s Stijn Debrouwere’s blog post about his time as a Knight-Mozilla Fellow. Every Knight-Mozilla Fellow experiences their 10 months as a fellow differently, in part because the mandate we give our fellows is as broad as Stijn says: We want our fellows to choose their own adventure, to create their own pathways, and to do the groundbreaking work that they want to do. Sound good? This can be you next year if you apply to become a 2014 Knight-Mozilla Fellow. For two weeks, our current fellows have been blogging about their experiences as Knight-Mozilla Fellows. Last week, I rounded up the posts of four of our fellows (and one alum), and this week three more wrote about their experiences, plus two alum.

How does a Knight-Mozilla Fellowship work? Stijn, who is spending his Fellowship year at the Guardian in London, explains in his blog post:

Here’s how an OpenNews fellowship like mine works: work on whatever with whoever, learn anything, and talk about it wherever. It’s the sweetest gig ever invented, but it’s also a bit of a brain melt.

For Mike Tigas, who has been working at ProPublica in New York, the diverse makeup of his fellow Fellows has made for unique collaborations:

We have backgrounds in programming, statistics, censorship research, cybersecurity, satellite communications, and we’re from nearly every corner of the earth. That range of backgrounds and skills has honestly made collaboration fun and at times surprising (in a good way).

For Mike, those diverse backgrounds have meant that “we fellows inspire one another, we inspire other people, and we in turn, are inspired by a lot of the people we get a chance to interact with through the course of our fellowship work.”

Friedrich Lindenberg, working in Hamburg with Spiegel Online, joined the Fellowship from a background of civic hacking. It’s the opportunities to hone the skills he came to the Fellowship with that’s been important to him:

Five months into my fellowship, I’m certain that the best place to learn how to engineer great civic applications is in a newsroom. Working to create narratives that feed into a news cycle, address a wide audience and tell a clear story is an amazing challenge for any technologist.

What a fellow does during their Fellowship year continues to resonate after their 10 months conclude, as 2012 Knight-Mozilla Fellowship alumni Nicola Hughes, Laurian Gridinoc, and Dan Schultz write:

Everyone’s fellowship experience is unique, and the impact from it hits each person differently as well. But for everyone, becoming a Knight-Mozilla Fellow is a life-altering experience.

And now, it’s your turn: We’re looking for five people who love to code and want to make a difference to join us as 2014 Knight-Mozilla Fellows. The application is simple (just a few questions and a bunch of links to your best stuff), but there are only two short weeks left to apply: The application window closes August 17th. What are you waiting for?

August 05, 2013 07:54 PM

Dan Schultz

OpenNews Applicants: Be Warned

Being a Knight-Mozilla Fellow ruined my life. My fellowship ended three months ago; I still don&apost have a job, my wife and I haven&apost spoken in days, and none of my friends take me seriously. There is only one piece of advice that I have for anybody considering applying: ignore all the obvious reasons why this fellowship is a great opportunity and run away.

Run like the wind.

Being an Alumnus

As you approach the end of your fellowship you are going to ask yourself many questions. Will Dan Sinker still love me when I&aposm old? Is it true that on your last day they brand your inner thigh with a hot iron that says "PROPERTY OF MOZILLA"? Where did I leave my FitBit?

The biggest one is going to be "where the hell should I go from here?" I&aposll give an example of what a fellow&aposs immediate future can be by describing my current status as a functioning adult.

It&aposs difficult to say what I do for a living. When asked, I usually give up and declare that I am a freelancer. In reality I&aposm…

1: A Cofounder

I spent this week in San Francisco for the orientation of Mozilla&aposs accelerator program, WebFWD. I&aposm here as one of three founders of Hyperaudio Inc., an nonprofit organization formed on behalf of my fellow fellow, Mark Boas.

Together, with a few others–including yet another 2012 Fellow, Laurian Gridnoc–we will spend the next year taking Mark&aposs baby and turning it into a sustainable nonprofit ecosystem for remixable, transcribed video and audio.

2: A Teacher

There is a letter from Syracuse University’s Newhouse School sitting on my doorstep right now which offers a part time, remote faculty position. It is very likely that I will spend the next academic year mentoring students and creating a new set of resources to help them learn "how to make almost anything on the web."

3: An Innovator

Last month I worked with an amazing team at an OpenNews hackathon to build CivOmega. CivOmega makes it possible for people to ask questions about their government and get answers powered by open datasets and APIs. This month I&aposm in the running with 2013 Knight-Mozilla Fellow Mike Tigas to get funding to turn it into a real, contributor-ready open source project.

4: A Greybeard

Last Friday I was in Miami to serve as a judge for the Knight Community Information Challenge. I read many applications from around the country that pitched ideas about how they want to solve a major community issue with digital tools. The month before that I spoke on a panel about newsroom innovation at the MIT-Knight Center Media Conference.

If nothing else, being a Knight-Mozilla Fellow means you can trick otherwise reputable organizations like The Knight Foundation into thinking you know what you&aposre talking about.

5: An Architect

I work part time to help startups build out their technology. This involves spending a few hours a week managing a team of developers and playing the role of architect and tech lead. Not every startup has to do with my immediate interests, but this is a nice way to keep things fresh. For instance, last month I helped make a button that rich people can press to give themselves more money.

Usually just mentioning my relationship with Mozilla is enough to cause people to swoon and faint, but sometimes I decide to go with vague threats instead. "I know some very important people on the internet. If you don&apost hire us, life could get very &aposdifficult&apos."

6: A Fellow (again…?)

In addition to being a Knight-Mozilla Fellow for life, #km4lyfe, I&aposll be a remote 2013 RJI Fellow starting in September. My project is an effort to flesh out of my good ol&apos thesis project, Truth Goggles, an automated bullshit detector for the internet.

7: A Trainer

The 2012 fellows have started a collective brand organization called Shape Journalism. It&aposs a loose group of makers who are willing to help media organizations by training, building, or just offering advice. For example last week Nicola Hughes, Mark Boas and I started laying out plans for a week long data viz training we&aposre expecting to run in November.

8: An Advisor

Have a crazy idea related to journalism, new media, or technology? Apparently I&aposm the guy to talk to to get feedback! But seriously so many people have reached out to pitch ideas, and it has been wonderful to get to help out.

There are so many people getting into this space, and being a Knight-Mozilla fellow is eerily similar to being a leader.

9: A Hired Gun

Organizations reach out to me fairly regularly to help them build out a prototype, apply something I have made in the past to their mission, or otherwise write some code. It&aposs always awesome to get to work on something you love and get paid at the same time.

10: A Hobbyist

The best part about not having a job that provides health insurance is that you can do whatever you want in between other work. This means learning new skills (like professional-level soundscaping) but it also means getting to continue to make things.

For instance I&aposm working on a forum that lets groups of people talk to each other in a closed community without isolating them. Basically you can share threads between forums (and be part of lots of communities), so you can have conversations spread to the most relevant places without getting inundated with the anonymous jackasses that we lovingly call “the general public.”

The Moral of the Story

Dan Sinker wants YOU to join Open News.

Poster masterfully created by Lyla Duey.

While it is technically true I don&apost have a job, I am here very much by choice. Being a fellow has set me up with a network of amazing people, who I still work with closely to build awesome things and participate in some badass events.

By the time you complete your fellowship you will be an unstoppable force of raw digital power. You will be oxymoronically established as both an outsider and an insider (so your perspective is priceless), and you will have had 10 months to show off what you can do. Following your passion at that point is as easy as breathing, unless you&aposre a fish.

If your dream is a startup, you will come out of this with mentors, collaborators, and understanding. If you want to teach, you have an impressive set of experiences to show off. If you want a full time job, my other fellows have shown that you can absolutely do that too.

But honestly, seriously, not kidding, what are you waiting for.

August 05, 2013 04:01 AM

Friedrich Lindenberg

Data Management in News Organisations

Over the last few months, I've had the pleasure of joining a few discussions regarding the long-term management of data inside Spiegel. The company is already running DIGAS, a massive, hand-curated archive of news content from the past decades, including articles published not just by Spiegel but also by many other German and international news outlets.

Can the same thing be created for data? A well-stocked data warehouse filled with the outputs of previous investigations would certainly make a great asset for journalists and researchers, but the diversity of topics and fast pace of the environment make this hard to achieve.

What problem needs to be solved?

Like anyone working with diverse datasets, I've built up the misc folder, random bits of data without much context.

A data management approach for a news organisation might include a variety of goals, many of which are, of course, similar to what the larger (open) data community is facing:

This, of course, is not a good list to start off with - it's just a set of related concerns to keep in mind.

What are people doing now?

Luckily, instead of having to start from scratch, we can already see some of the strategies used by other media organisations and their data teams.

My information on the Guardian data blog's approach to this may be a bit outdated, as most of the people I spoke to no longer work there. The team's key tool was Google Spreadsheets, with every story linking to its source data somewhere in the article. Often, a GSpread table would even be embedded into the article.

Lisa also mentioned that they kept a meta-sheet internally, a list of all the spreadsheets and the stories they related to. This may be somewhat redundant, however, given that all the sheets are linked up from the article anyway and the whole blog is a SEO treasure trove - the way to find things is probably through Google.

This meta-sheet is also something that I've seen used by the FarmSubsidy project to coordinate the acquisition and reporting on European farm subsidy data. Their tables list the source location, state and last change for each year's batch of data for each member state.

One of my favourite approaches was demoed to me by Martin Stabe at the Financial Times: all of their graphics and data reporting staff have access to a shared MySQL database. When a new dataset is collected, it becomes another table in this database. Over time, this has lead them to create a massive collection of tables which are easy to query and can be joined up for ad-hoc relationships.

The obvious downside of this approach is that it has a fairly high barrier to entry: even with tools like phpMyAdmin, a relational database is probably not every reporter's idea of a comfortable working environment. The FT's solution to this appears to be teamwork, with reporters and members of the data team collaborating on projects where needed.

Platforms to the rescue?

While these approaches are very pragmatic, it's doubtful how well they scale both on a human and a technical level. Of course, there are a few efforts to develop platforms to assist journalists in managing their data assets.

Probably the most relevant software project to data management in a newsroom is the Panda Project, another Knight Challenge winning project from 2011. Panda tries to capture all the little spreadsheets and tables that would result from a newsrooms daily operations and gives them a permanent home.

The developers put a lot of emphasis on ease of use, so the effort required to upload data would be minimal. At the same time, they chose to base much of the platform around full-text search. This may be useful in order to find names of companies and people in the data, but for statistical and other measurement data, the search makes no sense. Since the only other option is to download the entire dataset, Panda simply lacks a good way to handle numeric data.

CKAN, the OKFN's widely-used data catalogue, is another possible solution to keep track of datasets. Since its 2.0 release, the platform features an attractive user interface and decent data previews. My concern about using CKAN for data management (as opposed to its publication) is that unlike tools like GitHub, CKAN doesn't integrate into a workflow but rather relies on the user to fill out forms to document their actions.

Interestingly, this concern was rejected by folks at Spiegel, pointing out that form-filling is something that can be enforced via policy within the organisation. I'm not entirely convinced of this, so it seems like there may be an interesting experiment to be had here.

While it may be a bit of a long shot, I'm also trying to find out if the next version of the Investigative Dashboard may be able to play such a role in newsrooms.

Finally, there is an entire category of business intelligence tools (like Pentaho or its commercial competitors) that could help manage a complex data warehouse. I'm not sure how these tools fit in well with an incremental approach to building a data architecture at Spiegel though.

What is to be done?

For the moment, Spiegel's documentation unit has set up a massive storage server running Postgres as the core data system. Reporters and researchers will be given access to this database using a visual database client, and I'm busy importing a few core datasets based on public statistical databases.

Once this is running and has been adopted by some of the research staff, I'm keen to actually run the CKAN experiment and too see if this very structured approach will succeed. At the same time, I'm hoping that other people may be aware of better solutions to this challenge - so: how does your news organisation manage its data?

August 05, 2013 12:00 AM

August 02, 2013

Erika Owens

Deepening engagement with the journalism tech community, as a Fellow

This year the eight Knight-Mozilla Fellows are roughly half folks who were not previously involved with journalism development and half people who have deepened their engagement with this community via the Fellowship. Last week, we heard from the first group, Fellows who took varied paths to their news organizations. This week we learned what it's like for people who have worked in newsrooms to experience those spaces as a Knight-Mozilla Fellow.

What do you mean, I get to do whatever I want? Stijn Debrouwere began the week by expressing his awe at the very premise of the Fellowship and the space it allows for exploration.

"Energy is good, though. Here’s how an OpenNews fellowship like mine works: work on whatever with whoever, learn anything, and talk about it wherever. It’s the sweetest gig ever invented, but it’s also a bit of a brain melt. When the 2013 fellows got started the most commonly asked question was “So I get to do whatever I want? WHAT DOES THAT EVEN MEAN?” (First world problems, tell me about them.)

So what have I been doing these past few months as a fellow? For starters, all sorts of funky little side projects. I’ve written a tool do do literate programming in Python, wrote my own blogging engine, spent a week researching how reproducible research can help journalism. Learned about design thinking from IDEO and about programmable media from Mark Boas."

On Fellowship: Mike Tigas meditated on the role of the group of Fellows in his experience of the Fellowship, which differed from the years he spent as a news dev at a small newspaper.

"A lot of the best parts of the fellowship are about inspiration and turning that inspiration into something great. We fellows inspire one another, we inspire other people, and we in turn, are inspired by a lot of the people we get a chance to interact with through the course of our fellowship work."

Why Civic Hackers Should Hang Out in a Newsroom: Friedrich Lindenberg described the overlap between the civic hacking and journalism worlds, which is a growing community that the Fellows got to really connect with at the Knight-MIT Civic Media Conference in June. But as Friedrich tells it, the real action is in a newsroom:

"Five months into my fellowship, I'm certain that the best place to learn how to engineer great civic applications is in a newsroom. Working to create narratives that feed into a news cycle, address a wide audience and tell a clear story is an amazing challenge for any technologist. Being involved in journalistic projects as a fellow puts you in the center of a three-way interaction between reporters, the data at the core of your story and the technologies used in its presentation."

Annabel Church also joined the Fellowship after working in news development and heard about it from her colleague at the Guardian, 2012 Knight-Mozilla Fellow Nicola Hughes. Looking back now, Nicola explains how being a developer journalist has shaped her current work:

"Even though I can no longer do whatever I want I still have immensely more creative freedom and self-governance than anyone else with ‘journalist’ in their job title. Because I am self-taught, because I don’t need any instruments in a news organisation to produce everything they can, because I hunt for ideas on GitHub and Source and from DataMiners; I cannot be managed. And the digital team at the Times Newspaper Limited are ok with that.

I see myself as a disruptive force and have been told to “keep on doing that”. I am working with a young, creative and determined team. We have huge challenges ahead of us and that makes us a team. We are a motley crew with the most diverse skills ever seen at News Corp. So my time as a fellow has carried on in spirit. You can’t undo a realisation. So I will always go where I can code stories."

2012 Fellows Dan Schultz and Laurian Gridinoc also shared their perspectives on the Fellowship. For Laurian, looking back nearly a year later, the tl;dr is:

"I had an awesome year being an Knight-Mozilla Fellow, I worked with the amazing BBC News Specials team, went to all the relevant conferences and hackathons in US and Europe, learned a lot, made amazing friends. In the end helped me find out where I wanted to be."

And Dan, hyperbolic as ever, detailed his many identities as a Knight-Mozilla Fellowship alumnus and concluded:

"By the time you complete your fellowship you will be an unstoppable force of raw digital power. You will be oxymoronically established as both an outsider and an insider (so your perspective is priceless), and you will have had 10 months to show off what you can do. Following your passion at that point is as easy as breathing, unless you're a fish.

If your dream is a startup, you will come out of this with mentors, collaborators, and understanding. If you want to teach, you have an impressive set of experiences to show off. If you want a full time job, my other fellows have shown that you can absolutely do that too."

News developers get to do amazing work. Knight-Mozilla Fellows get to experience the excitement of newsroom development, but as Mike described, the Fellowship is an opportunity to connect with a community beyond the day to day. To build relationships with a community of Fellows. To hack with civic hackers inside newsrooms and at events like the Knight-MIT Civic Media Conference. To spend the time necessary to really think and plan and analyze some of the big questions in journalism right now. To get to spend 10 months crafting your answer to whatever you want to do to contribute to this group and to the help us all better understand our world.

It's an incredible opportunity and we're looking for five more people to join this Fellowship community in 2014. The application is due August 17. Apply today.

August 02, 2013 10:56 PM

Nicola Hughes

What A Difference A Year Makes

OpenNews Fellows 2012. From left: Mark Boas (Al Jazeera), Cole Gillespie (Zeit Online), Nicola Hughes (The Guardian), Dan Shultz (Boston Globe), and Laurian Gridinoc (BBC)

OpenNews Fellows 2012. From left: Mark Boas (Al Jazeera), Cole Gillespie (Zeit Online), Nicola Hughes (The Guardian), Dan Shultz (Boston Globe), and Laurian Gridinoc (BBC)

Now that the OpenNews fellowship programme is beginning to recruit the third round of fellows, I thought it ripe to reflect on my time as a fellow, an alumni and the only alumni to still be in the news industry.

If you are interested in applying for the fellowship then you can read about what the current fellows are doing: Noah Veltman wrote how you should not worry about your coding skills as so many newsrooms are learning as well as practicingSonya Song says it’s a great opportunity for graduatesManuel Aristaran enumerates the many perks of being part of the OpenNews communityBrian Abelson writes about gaining knowledge and working for whimsy; and Stijn Debrouwere let’s you know you get to do whatever you want!

To get an idea of what I did and with whom I worked at The Guardian here is a video:

I am now a Data Journalist at the Times Newspaper Limited. I have the role and the job title I wanted when I decided to learn to code. Most importantly I consider myself a programmer. It took me a long time to get it into my head. I was at a D3 workshop recently and I realised I was more knowledgable and more capable than most of the programmers who have ‘developer’ in their job title. It came as a shock, but I think it has allowed me to take even longer strides in my data journalism.

Even though I can no longer do whatever I want I still have immensely more creative freedom and self-governance than anyone else with ‘journalist’ in their job title. Because I am self-taught, because I don’t need any instruments in a news organisation to produce everything they can, because I hunt for ideas on GitHub and Source and from DataMiners; I cannot be managed. And the digital team at the Times Newspaper Limited are ok with that.

I see myself as a disruptive force and have been told to “keep on doing that”. I am working with a young, creative and determined team. We have huge challenges ahead of us and that makes us a team. We are a motley crew with the most diverse skills ever seen at News Corp. So my time as a fellow has carried on in spirit. You can’t undo a realisation. So I will always go where I can code stories.

Dan Sinker anouncing the first ever OpenNews Fellows

Dan Sinker anouncing the first ever OpenNews Fellows

And with me will always be Mark, Laurian, Dan and Cole. I was recently asked to describe the best team I ever worked with. I said the OpenNews fellows. Even though we weren’t in the same newsroom and in most cases not even in the same country, I still feel they were my fellows. We hacked, we taught, we drank whiskey together.

So to the next round of fellows I say: work open, make news, cherish the fellows.

August 02, 2013 06:58 PM

Friedrich Lindenberg

Why civic hackers should hang out in a newsroom

In Spiegel's snack bar, the sixties are going strong.

Coming out of a successful hackday, you usually need two things: a good night's sleep and ten months of time to grow the things you've worked on into a real product.

While sleep is easy to come by, finding the time and the environment to work on crazy ideas is much harder. This is why, after visiting open government-related hackdays for three years, I'd begun to dread these events: I'd meet interesting people and begin an exciting project with them, but then wouldn't find the time to follow up.

Things felt very different in January, when I flew to Boston to join the 2013 OpenNews fellows for a hackday at the MIT Media Lab. Collaborating on our projects that week, we knew that we had the backing to grow things and to get them out into the real world. We're still working on these tools now, a hacked-up collection of scripts to search data for news stories has turned into a real software project, datawi.re.

Pushing a project forward means talking to the users of your tools, or finding out how you can build an audience for the information that you're trying to share. Newsrooms are great places for both of these things: reporters and editors are demanding and critical customers. And, needless to say, publishing a nicely prepared dataset on a high-traffic news site is a quick and effective way of getting it out there.

For example, I've learned a lot by pitching tools developed in the OpenNews community to my host organisation. Datawrapper is now in regular use on the site, while more advanced tools like Overview, Poderopedia and OpenSpending have spawned debates about the types of data gathering and analysis activities Spiegel and Spiegel Online should engage in.

Learning about these technologies is just one aspect of the community that sourrounds the fellowship. During a two-week visit to the US, Annabel and I managed to not only visit the Guardian's US offices, ProPublica, the New York Times and Google NYC, but we also got to attend the MIT's Civic Media conference, an incredible melting pot of the worlds of civic technology and news.

Visiting Brian and his newsroom graphics rendering farm at the New York Times.

Five months into my fellowship, I'm certain that the best place to learn how to engineer great civic applications is in a newsroom. Working to create narratives that feed into a news cycle, address a wide audience and tell a clear story is an amazing challenge for any technologist. Being involved in journalistic projects as a fellow puts you in the center of a three-way interaction between reporters, the data at the core of your story and the technologies used in its presentation.

At the same time, being a fellow gives me the independence to experiment with new technologies and to go outside of the organisation. During the past few months, I've met people from all across the globe and learned about their ideas and work. This is a great time to be involved in this discussion, and the fellowship will put you right at the center of it.

Apply, and join us.

p.s. Have I mentioned how fun all of this is? I'm writing this post in a circus tent at OHM2013 near Amsterdam, surrounded by 3000 international hackers discussing the future of privacy, home-made technology and code.

August 02, 2013 12:00 AM

August 01, 2013

Mike Tigas

On Fellowship

I’m just about halfway through my OpenNews fellowship at ProPublica.

Plenty has been said about why developers should work in (and would enjoy) the newsroom environment. I had a hand in one piece and many others have chimed in with wonderful stories to that end. As someone who was considering applying for this fellowship last year, I didn’t need these beatitudes, so to speak. I’d been a developer in news for a couple years and I was already hooked on the satisfaction that came from building things that could make a difference in people’s lives — not to mention the interesting people, the pace of work, the variety, etc. (…if you’re a developer who hasn’t yet worked with journalism or civic hacking, then you should read all of those posts, though.)

So I’ve been thinking: what do I know about this now that could’ve enticed me even more a year ago? I very nearly didn’t apply, since I was putting it off and putting it off and putting it off…


Well, it’s a fellowship. Right… but being an OpenNews Fellow doesn’t feel so much like the cut-and-dry “…person appointed to a position granting a stipend and allowing for advanced study or research” or “member of a group having common characteristics.” (Hat-tip to Webster.) No, no, these don’t quite capture the sense of the team and community that you become a part of.

Aside from the easy joke (sorry, Annabel), I’ve heard us cheekily described as “the Wu-Tang of journalism” or a “journalism Justice League.” We obviously have fun with it and I’m not sure what other fellowships are like, but in this case I like to consider this fellowship to mean an “epic team of awesome,” or some (more eloquent) approximation. We’re not just here to do research and better ourselves and the newsrooms we work in — we’re here to foster and advance the overall journalism/code community.

We have backgrounds in programming, statistics, censorship research, cybersecurity, satellite communications, and we’re from nearly every corner of the earth. That range of backgrounds and skills has honestly made collaboration fun and at times surprising (in a good way).

We had a fellowship question-and-answer session in our weekly community call yesterday. Responding to a question about highs and lows in our experiences thusfar, a resounding “high point” for many of us was “any time the fellows get together.” (And this is no slight to the hosting news organizations and the day-to-day work we do and the adventures each of us has encountered. Hell, I work with an amazing and inspiring team that I once idolized and never fathomed I’d be working with.)


Another lesson (and high point) from my first few months: the community of civic-minded programmers and data people is huge now. It’s not just journalists anymore — there are people who want to help improve government, health, education, help make history more accessible, and on and on, outside of the guise of news media. There are communities of people who want to work together to do great things for the world, and these formerly-disparate communities are starting to mingle and work together in interesting. I’ve had the opportunity to visit the MIT Center for Civic Media a couple times and been blown away at the variety and scope of projects there — from free speech to gender to robots to accountability in developing nations. I’ve met the excellent people at the New York Public Library Labs team and had a chance to hack on historic geodata alongside geo developers that created the state of the art.

A lot of the best parts of the fellowship are about inspiration and turning that inspiration into something great. We fellows inspire one another, we inspire other people, and we in turn, are inspired by a lot of the people we get a chance to interact with through the course of our fellowship work.


Maybe what I should have said is less “fellowship as epic team” and more “fellowship as epic community.” (Or…)

The Knight-Mozilla OpenNews program exists at a great time in the journalism/civic code world. There’s an ecosystem of tools and datasets out there, and there’s a whole world of people that are in this space or interested in it. People are starting to get it that transparency and open tools and sharing and collaboration — especially when it comes to data about our governments and our environments — can foster real change and real impact. And it’s part of our mandate to get out there and work in the intersection of these communities — journalism, open source, civic impact — to make all of them better.

I mean, Source, the OpenNews site about “Journalism code and the people who make it”? Well whad’ya know, the site has appeal far beyond journalism — with posts (on crowdsourcing, hardware hacking, data visualization, and so on…) often landing on Hacker News and being discussed far from the walls of journalism institutions.


If working in this amazing community, with these amazing people sounds like something you want to take a part of, you should apply to become an OpenNews fellow. You should join the community. Hop into IRC. Listen in on our community calls. Find a civic-oriented hackday. Build some tools and share them with the world. Figure out how to get an obscure dataset into something usable — and share that knowledge so the next person can do it, too. Get involved and talk to people in the community — you’ll be amazed at how many in the community are not only brilliant developers/journalists/evangelists, but they’re wonderful and nice people, too.

There’s a whole community of people doing epic things to make the world a better place, and you can — and should! — be a part of it, too. It sounds daunting, sure, but the amount of inspiration and support you can find from colleagues and neighbors is equally great.

And seriously, if you’re a developer that wants to work on amazing things and get inspired and work with inspiring people — don’t be afraid to apply for the fellowship. I’m glad I wasn’t.

August 01, 2013 04:50 PM

July 29, 2013

Dan Sinker

OpenNews: Fellowships Five Ways

Our 2014 Knight-Mozilla Fellowship search is in full swing—August 17 is the last day to apply. Being a Knight-Mozilla Fellow is a unique experience—you travel the world writing code that matters, collaborate with some of the best news organizations on the planet, and build lasting friendships with your fellow fellows. We invite our Knight-Mozilla Fellows to choose their own adventure—to create an experience that is singular to them, and that helps move journalism forward in ways that they truly want to engage. As a result, everyone’s fellowship experience is different and capturing them in a simple pitch can be hard, so last week and this week we’ve asked our current and former fellows to write about their experience.

For Noah Veltman, who is spending his fellowship year at the BBC, having moved from the Bay Area to London, he hits on the moment “when the awesome craziness of my OpenNews fellowship sank in”:

I was on my way home after my first day at BBC headquarters, looking around the subway car, and I realized that fully half of the passengers were reading BBC News on their phones. Whoa. Since then, I’ve been in the newsroom when Pope Benedict resigned, when Margaret Thatcher died, and when the bombs went off in Boston. I learn so much every day that my head is spinning by lunchtime. My seven co-fellows routinely blow my mind with their work. I’ve met so many brilliant people around the world who are not just redefining how we do the news, but doing it as a team, one big journalism Justice League. I love this job.

Sonya Song, who joined the Boston Globe as a Fellow while pursuing her PhD in Media Studies from the University of Michigan, writes about the opportunities “that you may not easily find in academia”—including the “freedom and flexibility to follow where your curisosity leads.”

Although Fellows often offer a helping hand to our hosts, we are not obliged to commit to any task in the newsrooms, because all our funding is from OpenNews. This independence lets us pursue our own interests without being bound by routine work that regular employees have to undertake. Meanwhile, we are encouraged to work with other Fellows and organizations. Right now, I am working with two other Fellows, Stijn Debrouwere and Brian Abelson, on measuring news impact and contributing my knowledge to ProPublica on a project related to Internet policy. We don’t only collaborate remotely and virtually, but we also reunite in person on different continents, to put our heads together and hack on something.

Down in Buenos Aires, Manuel Aristaran, writes about his move from a team building Argentina’s first cubesat satellite to the offices of La Nacion. The ability to collaborate across newsrooms and borders was appealing to Manuel. "As an OpenNews fellow, you’ll be part of a large network of journalists and programmers interested in the media and eager to share experiences and collaborate on a diverse array of projects," he writes. For Manuel, that meant the tool he created, Tabula, saw almost instant adoption:

That was the case of Tabula, an open source tool that was created out of the combination of a personal project and previous work by ProPublica one of the OpenNews media partners this and next year.

I started the project earlier this year and soon after decided to join forces with ProPublica, who was working on a similar tool. After announcing it on Source (the official web site for the OpenNews program), Tabula was adopted by editors and activists around the world that need to use the data sets trapped within PDF files.

Brian Abelson, who has been spending his Fellowship year doing a deep dive analyzing news metrics at the New York Times, writes about his Fellowship experience after applying at “literally the last second.”

Since then, I’ve undergone a transformation that is no less than miraculous. In my five-plus months as a fellow I’ve dove deep into the technical and intellectual challenges of impact measurement, reading as much as I could find on the topic, experimenting with the creation of metrics for News Apps, speaking at conferences, and conversing with the brightest minds in the field. I have been continually humbled at the many people working on this problem for no other reason than they think it’s the right thing to do. I’ve also found support in the many innovators and brainiacs I work with at the New York Times and the seven incredible people I’ve shared this journey with.

In this time, I’ve gone from a novice coder with some knowledge of stats to someone who regularly writes map-reduce jobs over terabytes of data (trust me, if you’re a data nerd, the New York Times is your perverse playground). The freedom of the fellowship has also allowed me to pursue more whimsical projects like building haikubots, experimenting with data sonification, and writing oh-so-many twitter trolls. I’ve also had the privilege of working with my friends in csv soundsystem to build treasury.io - a daily data feed for the U.S. Treasury.

Finally, 2012 Fellowship Alum Mark Boas, who has leveraged one of his fellowship projects into the startup Hyperaudio (which is part of Mozilla’s WebFWD accelerator class this summer), looks back on his year and writes:

It’s hard to imagine the adrenalin rush you get when your interactive goes live in front of the world, a world that is actually watching, but it makes you want to do it over and over again.

There you have it: Five different paths along a Fellowship year. The freedom to experiment, to try new things, to collaborate; they are all baked into the design of the Knight-Mozilla Fellowships. Do you love to code? Do you want to find out why Manuel Aristaran writes that “Being a Knight-Mozilla Fellow is the best thing that can happen to you”? Then apply today. The days to apply are running out: Get your application in before August 17.

July 29, 2013 06:33 PM

Stijn Debrouwere

What do you mean, I get to do whatever I want?

There’s this crazy boost of energy you get from moving to a different city and starting a different job. I joined the Guardian’s vocal choir just a couple of weeks ago. Did sewing lessons and can now mend torn pants. Figured out how to do reverse bridge and leg curls on a yoga ball, just because. Taught myself enough Spanish to read El País and enough Tyke and Manc to understand my Northern English housemates.

Energy is good, though. Here’s how an OpenNews fellowship like mine works: work on whatever with whoever, learn anything, and talk about it wherever. It’s the sweetest gig ever invented, but it’s also a bit of a brain melt. When the 2013 fellows got started the most commonly asked question was “So I get to do whatever I want? WHAT DOES THAT EVEN MEAN?” (First world problems, tell me about them.)

So what have I been doing these past few months as a fellow? For starters, all sorts of funky little side projects. I’ve written a tool do do literate programming in Python, wrote my own blogging engine, spent a week researching how reproducible research can help journalism. Learned about design thinking from IDEO and about programmable media from Mark Boas.

But there’s really one dream that I’m trying to turn into something real this year: track every big news website out there, so we can answer all kinds of wild questions about how the news industry really works.

I want to be able to see how news websites are different (or secretly the same), how news has changed over the years and how readers have responded. Are we less interested in international news than we used to be? What’s the ideal length for a piece of breaking news? What technologies does the New York Times use, what technologies did they use five years ago? How fast do different news websites load? What was the BBC‘s homepage like two months ago? Do people come to news sites through the front door, through search or through social? How much of journalism is still original writing anyway, how much is PR?

You know how professors in media studies will often have armies of students cut out articles from newspapers and put them in tidy little categorized piles to answer questions like “How often does the work of female journalists reach the front page?” It’s really messy and cumbersome but right now it’s the only way we have to get a sense of what the media is really like. What we need is the grown-up version, with less unpaid labor and more computers. We need it, and we’re building it.

Various people have been doing bits and pieces of this kind of data collection and news analysis. And now I get to add my own bits and pieces, something I’ve wanted to do for years, so, yeah, I guess I’m quite chuffed.

We’re starting really small and are tackling the easiest questions first: which news organizations produce the longest and which ones the shortest articles, that sort of thing. But who knows, eventually we might even be able to tackle the question of how to measure the impact of journalism.

But enough about me and more about you. I bet you have an idea like that. Something you’ve wanted to do for years that could make the news industry a little bit smarter and more au courant. Or you’ve always wanted to do something meaningful with your coding chops, not write the next enterprise back office whatever. You know what you want to do but you’re not doing it. You want do something that matters but you’re not. Frustrating, yes?

There’s something you can do. Apply to become a 2014 OpenNews fellow. It’s the best decision I ever made, and it will be the best decision you’ve ever made too.

July 29, 2013 12:00 AM

July 26, 2013

Erika Owens

Joining the journalism development community, as a Fellow

Last week, news devs gave an inspiring and entertaining array of responses as to "why develop in the newsroom?" This week, Knight-Mozilla Fellows shared their stories. They told how they were introduced to journalism development via their participation in the Fellowship. These Fellows represent some of the many different paths people take to join what can feel like "one big journalism Justice League."

Stop the imposter syndrome madness. Noah Veltman kicked off the week by imploring us all to stop seeing the world as coders and muggles. This toxic coder/non-coder binary almost prevented him from participating in the Fellowship.

"The crazy thing is, I almost didn’t apply. I didn’t even think I was a candidate. I had never studied computer science, I just tinkered with code in my spare time because I had fun projects I wanted to try. I Googled for examples and wrote lots of really ugly code.  But I never considered myself a developer. This fiction became increasingly ridiculous, as it went from “Well, sure, I know some HTML but I’m not a developer or anything" to “Well, sure, I know some JavaScript and I can use a webserver but I’m not a developer or anything" to “Well, sure, I know Python and PHP and some C and Java and I spend all day on the command line, but I’m not a developer or anything.""

Grad students in the real world. Sonya Song described the Fellowship as "the best reward" for her various intellectual adventures. Sonya is part of a growing group of graduate students who use technology and data analysis in their academic studies and are applying those skills within journalism. And why might grad students want to work outside of academia? Sonya laid out a convincing case:

"Through my work experience and academic training, I gained a better understanding of how people consume media content, how they behave on the Internet, and how content providers could better cater to consumers’ demands and therefore develop sustainable business models. I have been able to contribute my expertise on these topics to the Boston Globe, my newsroom host. With the support of the staff here, I conducted an empirical study of the Boston Globe's Facebook Page. When I presented my findings to colleagues, some people responded, “Thank you for sharing your findings. We didn’t know those things!” It is thrilling and satisfying to find truths and share knowledge in a practical setting."

"The best thing that can happen to you." (en español) Manuel Aristarán was part of a team that was building a satellite when he applied to the Fellowship. A satellite. For space research. Outer. Space. He was already applying his in-demand tech skills in a super cool way, but he also wanted to work with public information and journalism. He made that transition from outer space to journalism via the Fellowship.

"In my five months as an OpenNews fellow, I worked with journalists, data analysts, and designers of Argentina’s daily La Nación and the wonderful team of La Nación DATA on many projects that had great impact. Among them:

  • Proyecto Censo (Census Project), a site to display the Argentine census data for 2001 and 2010.
  • An investigative work into the tragic flooding of La Plata city.
  • Training on D3.js and information extraction for journalists, programmers, and designers from the daily."

From stats to SQL. Brian Abelson's path to the Fellowship includes echoes of each of his "fellow fellows'" stories. He also came to coding from a non-traditional path and grappled with imposter syndrome. He also applied as a graduate student. He also came to the Fellowship from another type of science: social science and statistics. And now he's at the New York Times.

"In this time, I’ve gone from a novice coder with some knowledge of stats to someone who regularly writes map-reduce jobs over terabytes of data (trust me, if you’re a data nerd, the New York Times is your perverse playground). The freedom of the fellowship has also allowed me to pursue more whimsical projects like building haikubots, experimenting with data sonification, and writing oh-so-many twitter trolls. I’ve also had the privilege of working with my friends in csv soundsystem to build treasury.io - a daily data feed for the U.S. Treasury."

"Do it again, but better!" And Mark Boas, a Knight-Mozilla Fellow from 2012 closed the week with a rousing piece about what it is like to do development in a news organization.

"I was lucky enough to be given several opportunities by Al Jazeera (my OpenNews partner) to create interactives for aljazeera.com and I was both excited and terrified about being given the chance to succeed or fail in public. Thankfully, it was always a team effort - ideas were discussed, feasibility assessed, minimal viable products presented and that all important deadline established and we were off to the races!

The most amazing thing though, was that I got to have a real say. If there was a certain type of medium I wanted to try out – I could try it. And the very best thing was that I could try it out on thousands of people."

So, OpenNews is also doing it again. Applications are open for the 2014 cohort of Fellows. It's time to join this amazing community of Fellows. Bring your civic hacking expertise or itch-scratching side projects or love for numbers. Don't let the false developer/non-developer divide hold you back. Apply now.

Coming up next week: posts from the Knight-Mozilla Fellows who were engaged with journalism development prior to the Fellowship.

July 26, 2013 06:14 PM

Mark Boas

Do it again, but better!

Rob and Nick Carter - Blue to Yellow

 “Why develop in the newsroom?” That’s the question doing the rounds of the news-developer community just now, partly because it’s that time of year where the Knight-Mozilla OpenNews program makes its final push to attract fellows for 2014.

I’ve been asked to chime in and I wanted to focus on something that I found very exciting as one of the initial group of OpenNews fellows, way back in 2012.

In their lifetimes most designers and developers will get an opportunity to have their work seen or used by thousands of people. I have to admit I’m one of those developers who gets a kick out of that. I suppose it’s the desire to be useful or make useful things that drives me the most.

However there are relatively few developers who will work on a site big enough to have their work seen by thousands, perhaps millions – in one day! This is a big part of the reason why developing in the newsroom is exciting.

In his post “Code, the newsroom, and self-doubt” 2013 OpenNews fellow with the BBC, Noah Veltman opens “I can pinpoint the exact moment when the awesome craziness of my OpenNews fellowship sank in. I was on my way home after my first day at BBC headquarters, looking around the subway car, and I realized that fully half of the passengers were reading BBC News on their phones. Whoa.”

Whoa indeed!

If you are one of the millions that reads the news online, you’ve probably noticed that news articles – or at least a significant number of them – are becoming increasingly interactive. Good ‘interactives’ are the balance of medium and message. The message is all important of course but the way we get the message across, let’s just say, is one of life’s challenges.

I was lucky enough to be given several opportunities by Al Jazeera (my OpenNews partner) to create interactives for aljazeera.com and I was both excited and terrified about being given the chance to succeed or fail in public. Thankfully, it was always a team effort - ideas were discussed, feasibility assessed, minimal viable products presented and that all important deadline established and we were off to the races!

The most amazing thing though, was that I got to have a real say. If there was a certain type of medium I wanted to try out – I could try it. And the very best thing was that I could try it out on thousands of people.

It’s hard to imagine the adrenalin rush you get when your interactive goes live in front of the world, a world that is actually watching, but it makes you want to do it over and over again. In the periods of calm you can assess what the world made of your work – I’ve found it’s best to do this in an ego-less way, after all negative feedback can be the most constructive.

Personally, I got to try out a loose concept called Hyperaudio which links text to the spoken part of media. Since then we’ve set up Hyperaudio Incorporated as a nonprofit - and now I’m happy to say that we have attracted funding and that’s in no small part due to the technology being used and proven in the wild.

So why develop in the newsroom? Because you get to try new things out on the world stage and find out how people feel about them. Then you get to do it again, but better!

 

Mark Boas

July 26, 2013 09:30 AM

July 25, 2013

Brian Abelson

One year after 'Finding the right metric for news

This is the first in the series of two posts about pageviews. This post will deal with some of theoretical baggage tied up in the metric, while the second will detail some research I’ve conducted on the correlates of pageviews. A year ago today, Aron Pilhofer, head of Interactive News at the New York Times, wrote a blog post that changed my life. In it he reflected on the impoverished status of newsroom analytics, soberly claiming:

”…the benchmarks we use now are so ill suited. They are the simplistic, one-dimensional metrics we all know: pageviews, time on site, uniques. We use them largely because they are there and because they are easy”

The lack of suitable metrics for measuring impact, he argued, was the key to journalism’s survival in a digital environment and the perfect issue for a Mozilla-Knight OpenNews Fellow to address over a yearlong fellowship.

At the time I read this I was a grad student struggling through the process of translating my questions into maths and code. While I had completed some cool projects, it had been three months since I had copy-and-pasted my way though a 5000 line script because I was scared of SQL databases and six since I first opened that ‘scary program called Terminal’ on my MacBook.

So when I read Aron’s eventual pitch - “If you’re an analytics nerd, a news junkie and think it would be neat to spend some time using The New York Times newsroom as your laboratory, we’d like to hear from you” - I was both thrilled and horrified. How could I - neither a hack nor a hacker - compete with the plethora of geniuses that would no doubt apply for such an irresistible position. Like Noah Veltman, my remarkable friend and ‘fellow fellow’, “I had a serious case of imposter syndrome”. This deep sense of self doubt (and perhaps a little procastination) led me to mull over the application to literally the last second.

Since then, I’ve undergone a transformation that is no less than miraculous. In my five-plus months as a fellow I’ve dove deep into the technical and intellectual challenges of impact measurement, reading as much as I could find on the topic, experimenting with the creation of metrics for News Apps, speaking at conferences, and conversing with the brightest minds in the field. I have been continually humbled at the many people working on this problem for no other reason than they think it’s the right thing to do. I’ve also found support in the many innovators and brainiacs I work with at the New York Times and the seven incredible people I’ve shared this journey with.

In this time, I’ve gone from a novice coder with some knowledge of stats to someone who regularly writes map-reduce jobs over terrabytes of data (trust me, if you’re a data nerd, the New York Times is your perverse playground). The freedom of the fellowship has also allowed me to pursue more whimsical projects like building haikubots, experimeting with data sonification, and writing oh-so-many twitter trolls. I’ve also had the priveledge of working with my friends in csv soundsystem to build treasury.io - a daily data feed for the U.S. Treasury.

And after all of this I can say that while my initial fears of technical incompetency weren’t completely unfounded, I was perhaps afraid of the wrong things. To return to the question that launched this crazy adventure in the first place - “what if we measured journalism by its impact?” - I’d be remiss to not complicate the current conceptualization of the problem. This perspective has been deeply informed by my interactions with James Robinson, my friend and mentor at the Times who, in sharing his vast experience in news analytics, has often served as my version of Gene Wilder in Charlie and the Chocolate Factory.

What I think we’ve learned in our experimentation with news metrics is that, more than anything, our work is less about programming than it is about proselityzing. The challenge of changing the approach to metrics in the news room is nothing less than that of sparking social and cultural change. The question we must be asking, then, is the painfully meta one of “how do we measure the impact of impact measurement?” If the tools, methodologies, and metrics we develop are difficult to use, implement, or understand, then the journalists and editors we’re trying to influence will fall back on their well-honed instincts. The key is not to prove whether a story or news organization has ‘made an impact’ but to help journalists make data driven decisions that resonate with their broader goals. This challenge, I think, is more anthropological than statistical, more collaborative than code-based.

So while I think I’ve done alot to tackle the difficult set of questions I’ve been tasked with, in many ways I’ve failed miserably. In my final five months I hope I can do more to build and write things that help more people. But there’ll always be more work. So if you know a bit of code or maths and think insights trump data, then apply to become a 2014 OpenNews Fellow and pick up where I and my other fellows have left off.

July 25, 2013 04:00 AM

July 24, 2013

Manuel Aristarán

Being a Knight-Mozilla Fellow is the best thing that can happen to you

[Versión en español: "Ser becario Knight-Mozilla OpenNews es lo mejor que te puede pasar"] It’s been five months since I became one of the eight fellows in the OpenNews program organized by the Knight Foundation and the Mozilla Foundation. I still have five months left, but I can say it has been the best year of my professional life. And I want to convince all of you to apply to become a fellow next year.  

How did I get here?

In 2012, I lived in the beautiful city of Bariloche, in southern Argentina, where I worked as part of the team that designed, built, and successfully launched CUBEBUG-1 “Captain Beto”, the first Argentine cubesat. Besides space, I’ve been always interested in public information, open data, information visualization, the web and journalism. A tweet by @LNData convinced me to fill out the form. And after two agonizing months of interviews with the OpenNews team, I was selected as one of eight fellows for the 2013 OpenNews program. Almost without realizing it, I went from a clean room to a newsroom.  

The Knight-Mozilla OpenNews fellowship is the best thing that can happen to you.

Being a programmer is a privilege. Our profession is one of the most sought after and nothing indicates that this will change in the near future. In particular, journalism needs us. Among other things, it needs us to: If that isn’t enough, as an OpenNews fellow, the result of your work will be free and open to the contributions of a great community. In my five months as an OpenNews fellow, I worked with journalists, data analysts, and designers of Argentina’s daily La Nación and the wonderful team of La Nación DATA on many projects that had great impact. Among them:  

Being part of a community

As an OpenNews fellow, you’ll be part of a large network of journalists and programmers interested in the media and eager to share experiences and collaborate on a diverse array of projects. That was the case of Tabula, an open source tool that was created out of the combination of a personal project and previous work by ProPublica one of the OpenNews media partners this and next year. Tabula is a tool that tries to solve one of the pervasive problems in data journalism: the extraction of information trapped in PDF files. Either by ignorance or a desire to block access to information, some agencies publish their information on this format that is incompatible for the exchange of structured data. I started the project earlier this year and soon after decided to join forces with ProPublica, who was working on a similar tool. After announcing it on Source (the official web site for the OpenNews program), Tabula was adopted by editors and activists around the world that need to use the data sets trapped within PDF files. Some instances:  

Still not convinced?

Apart from working with journalists and building open source tools, OpenNews fellows travel everywhere. During the first five months of my fellowship I was lucky to go to:  

It is time to apply

This year, the media outlets participating in the program are The New York Times, La Nación, TheTexas Tribune, ProPublica and Ushahidi + Internews Kenya. Next year’s five fellows will spend 10 months working in these newsrooms. If you like journalism, data, information visualization, open source software, and want to spend 10 months working on all that, as you travel all over the place and connect with a generous community, do not hesitate. Take a deep breath and fill out the application. It’s the best that can happen to you.

July 24, 2013 04:47 PM

Ser becario Knight-Mozilla OpenNews es lo mejor que te puede pasar

[English version: "Being a Knight-Mozilla Fellow is the best thing that can happen to you" Hace 5 meses que soy uno de los ocho becarios para el 2013 del programa OpenNews que organizan Knight Foundation y Mozilla Foundation. Todavía tengo 5 meses por delante, pero ya puedo decir que ha sido el mejor año de mi vida profesional. Y quiero convencerlos de postularse para ser ser becarios el año que viene.  

Cómo llegué a esto

En 2012 vivía en la hermosa ciudad de Bariloche, en el sur de Argentina, donde trabajaba en el equipo que diseñó, construyó y lanzó exitosamente a CUBEBUG-1 “Capitán Beto”, el primer cubesat argentino. Pero, además del espacio, siempre me interesaron los datos públicos, la visualización de información, la web y el periodismo. Un tweet de @LNData me convenció de llenar el formulario de admisión a la beca OpenNews y luego de 2 angustiosos meses de entrevistas fui seleccionado como uno de los 8 becarios 2013 del programa OpenNews. Casi sin darme cuenta, pasé de un satélite a una redacción de noticias.  

Programar en la redacción

Ser programador es un privilegio: nuestra profesión es una de las más requeridas y nada indica que esto vaya a cambiar en el futuro cercano. En particular, el periodismo nos necesita. Entre otras cosas, nos necesita para: Como si fuera poco, como becario OpenNews, el resultado de tu trabajo será libre y abierto a los aportes de una gran comunidad. En los 5 meses que llevo como becario OpenNews, trabajé junto a los periodistas, analistas de datos y diseñadores de La Nación y el fantástico equipo de La Nación DATA en muchos proyectos con gran impacto. Entre otros:  

Ser parte de una comunidad

Como becario OpenNews, vas a ser parte de una gran red de periodistas y programadores interesados en los medios, con ganas de compartir experiencias y colaborar en proyectos muy diversos. Ese fue el caso de Tabula, surgido de la combinación de un proyecto personal y el trabajo previo de la agencia ProPublica (uno de los media partners de OpenNews para este año y el próximo). Tabula es una herramienta que intenta resolver uno de los problemas omnipresentes en el periodismo de datos: la extracción de información atrapada en archivos PDF. Por desconocimiento o intención de obturar el acceso, algunos organismos publican su información en este formato muy poco apropiado para el intercambio de datos estructurados. Comencé con el proyecto a principios de este año y poco tiempo después decidimos aunar esfuerzos con ProPublica, que estaba trabajando en una herramienta similar. Luego de anunciarlo en Source (el sitio oficial del programa OpenNews), Tabula fue adoptado por redacciones y activistas de todo el mundo que necesitan disponer de datos atrapados en tablas dentro de archivos PDF. Algunos casos:  

¿Todavía no te convencí?

Además de trabajar junto a periodistas y constriuir herramientas open source, los becarios OpenNews viajamos por todos lados. Durante los primeros 5 meses de mi beca tuve la suerte de visitar  

Ya es momento de aplicar

Este año, los medios participantes del programa son New York Times, La Nación, The Texas Tribune, ProPublica y Ushahidi+InterNews Kenya. Los 5 becarios del año próximo van a pasar 10 meses trabajando en esas redacciones. Si te gusta el periodismo, los datos, la visualización de información, el software open source y querés pasar 10 meses trabajando en todo eso, mientras viajas por todos lados y conectás con una comunidad generosa; no dudes: respirá hondo y completá el formulario de admisión.

July 24, 2013 04:46 PM

July 23, 2013

Sonya Song

Why I Love OpenNews Fellowship and Why it's a Great Opportunity for Graduate Students

Knight Blog: Knight-Mozilla fellows strive for global impact in journalism. Photo credit:Knight-Mozilla Fellows
I have always been an intellectual drifter and the OpenNewsfellowship has been the best reward for my adventures. When I first saw the post calling for 2013 applicants, I was so surprised and also excited to know such an opportunity was being created for the people just like me.

My background is highly mixed. I am currently a doctoral candidate in an interdisciplinary program at Michigan State University. At MSU, I have been studying media economics with my advisor Steve Wildman, a world-renowned scholar and Chief Economist at the FCC, along with courses in psychology, communication, and large-scale data analysis. Prior to MSU, I studied computer science and journalism and worked in both industries.

If you are also a graduate student, you will probably enjoy the Knight-Mozilla fellowship just like I do because it provides opportunities that you may not easily find in academia.

You'll get to work on real-world problems. Through my work experience and academic training, I gained a better understanding of how people consume media content, how they behave on the Internet, and how content providers could better cater to consumers’ demands and therefore develop sustainable business models. I have been able to contribute my expertise on these topics to the Boston Globe, my newsroom host. With the support of the staff here, I conducted an empirical study of the Boston Globe's Facebook Page. When I presented my findings to colleagues, some people responded, “Thank you for sharing your findings. We didn’t know those things!” It is thrilling and satisfying to find truths and share knowledge in a practical setting.

Often as a graduate student, you may only have the privilege and support to work on problems like this during summer internships. The Knight-Mozilla Fellowship offers even more because it lasts beyond a summer and allows you to fully immerse yourself in a world-class newsroom. If you hold a similar belief that research should work toward real-world impact, definitely apply for this fellowship program and aim to make an impact on the world.

You'll have the support you need to work quickly. As you may have experienced, funding is an issue for a number of universities. You will be surprised how much support you can get from this Fellowship: a generous research budget and travel funding are among them. Moreover, a frustrating side of academia is that research results may take months or years to get published. In contrast, as an OpenNews Fellow, we can organize our own seminar or attend a workshop to reach out to a larger audience. With this support, we are able to give a louder shout to the world about what amazing things we have created or found.

You'll have the freedom and flexibility to follow where your curiosity leads.  Although Fellows often offer a helping hand to our hosts, we are not obliged to commit to any task in the newsrooms, because all our funding is from OpenNews. This independence lets us pursue our own interests without being bound by routine work that regular employees have to undertake. Meanwhile, we are encouraged to work with other Fellows and organizations. Right now, I am working with two other Fellows, Stijn Debrouwere and Brian Abelson, on measuring news impact and contributing my knowledge to ProPublica on a project related to Internet policy. We don't only collaborate remotely and virtually, but we also reunite in person on different continents, to put our heads together and hack on something.

Certainly, there are more opportunities and privileges for you to discover in this fellowship program. If you are an adventurer like me, I encourage you to step out of your ivory tower and join us to explore this fast spinning world where technology meets news.


July 23, 2013 12:29 PM

July 22, 2013

Noah Veltman

Code, the newsroom, and self-doubt

I can pinpoint the exact moment when the awesome craziness of my OpenNews fellowship sank in. I was on my way home after my first day at BBC headquarters, looking around the subway car, and I realized that fully half of the passengers were reading BBC News on their phones. Whoa. Since then, I’ve been in the newsroom when Pope Benedict resigned, when Margaret Thatcher died, and when the bombs went off in Boston. I learn so much every day that my head is spinning by lunchtime. My seven co-fellows routinely blow my mind with their work. I’ve met so many brilliant people around the world who are not just redefining how we do the news, but doing it as a team, one big journalism Justice League. I love this job.

The crazy thing is, I almost didn’t apply. I didn’t even think I was a candidate. I had never studied computer science, I just tinkered with code in my spare time because I had fun projects I wanted to try. I Googled for examples and wrote lots of really ugly code.  But I never considered myself a developer. This fiction became increasingly ridiculous, as it went from “Well, sure, I know some HTML but I’m not a developer or anything” to “Well, sure, I know some JavaScript and I can use a webserver but I’m not a developer or anything” to “Well, sure, I know Python and PHP and some C and Java and I spend all day on the command line, but I’m not a developer or anything.”

I had a serious case of imposter syndrome, and I know I’m not alone. Yesterday, Larry Buchanan, who is using D3 to develop awesome interactive graphics for The New Yorker like the NCAA Money Bracket, asked the Twitter hivemind for help working with some messy data. I lent a hand, and this was his response:

We’ve got to stop this madness.

There is no line where you suddenly cross over from non-coder to coder, or from fake developer to real developer.  There’s no high priesthood. You start learning, and then you just keep going. This is how I put it when speaking at the BBC’s recent Data Day:

image

The notion that code is this hyperspecialized thing, scary punctuation soup on a dark screen, something that someone else does, is wrong, and it’s toxic.

There are people all over the world who don’t consider how code might help them do their job, because they think it’s a big leap. It’s not. It’s thousands of tiny steps, and everyone takes them in a different direction. A little bit of code goes a long way.

People who do flirt with the idea of learning to code often get discouraged quickly. They get stuck, they get frustrated, and they look at the cool things that “real developers” are doing and decide that will never be them, so why bother? Well guess what? We were all that person. We are all STILL that person. We all get stuck. We’re all figuring it out as we go along. Welcome to the club.

People who are already doing great things with code are reluctant to teach others and share their work because they think it’s too basic or too sloppy to be useful to anyone else. It’s not true. Take your Code of Dorian Gray out of the attic.  You have much more to teach us than you realize.  

What I love most about coding in the newsroom is that the artificial divide between coders and everyone else is weak and getting weaker. Every day brilliant, passionate reporters and designers are waking up to the ways that code can help them find and tell stories, and developers are getting better at thinking as journalists. Philosophy majors are writing Rails apps and Java developers are doing investigative reporting. That blending is what makes events like NICAR and MozFest so wonderful. People with different experiences and skills come together to learn from each other and nobody gives a shit what it says on your business card. It’s not separate tribes, it’s one big family.

The newsroom is a great place to blow up this wall because we rarely get too wrapped up in code for its own sake. There are plenty of true computer scientists in the world who get their satisfaction cracking tough coding puzzles, and they don’t care whether it’s for a bank or a government or a hydroelectric dam. God bless those people—the world needs thembut I’m usually not one of them, and most of my newsroom colleagues aren’t either. We’re here because we we want to make things that teach people about the world they live in. We care about the best way to tell a story, and about what it means to our audience; we care less about whether we had the perfect algorithm under the hood. Developing on deadline will do that to you. Like Lorne Michaels said about Saturday Night Live, “the show doesn’t go on because it’s ready; it goes on because it’s 11:30.”

It’s an exciting time to be coding in a newsroom.  There’s a righteous community of journocoders who are changing the game every single day. And we’re recruiting. The OpenNews program is looking for five new fellows next year. Like telling stories? Like making an impact?  Like using code to do it? You’re crazy if you don’t apply. Come join the Justice League. Show us what you can do.

July 22, 2013 09:08 AM

July 19, 2013

Dan Sinker

OpenNews: Why Develop in the Newsroom (part 2)

The journalism development community is filled with incredible people. This community of people—broadly described as “news developers” come from different backgrounds, have different areas of expertise, and impact journalism in different ways. But all of them are fascinating, and all offer compelling arguments for why other talented coders should join them in building new forms of journalism on the web.

This week, we’ve been asking some of the leaders of this community to explain why they do what they do. This is part two of a roundup of posts from folks answering the simple question: “Why Develop in the Newsroom?” Part one was posted on Wednesday. We’re doing this because we’re looking for people who love to code to join the Knight-Mozilla OpenNews project as 2014 Fellows. As a Knight-Mozilla Fellow, you’ll spend 10 months in some of the best newsrooms in the world working on your own answer to the question we’ve asked people to answer this week.

Brian Boyer, who heads the News Applications team at National Public Radio, explains that “after nearly a decade in startups and corporate life, I grew tired of the products I was making.” He entered journalism and, he explains, the work spoke to him:

In my previous life I enjoyed the problem solving, the creativity, the process of making software. Programming is a fun and challenging career and getting to solve puzzles every day is pretty neat. But to what end? Making software to make money for people who have money to make software?

The day’s story could be a safety database or a voting guide or a warning system for floods – and for each, we ask “Who are our users? What are their needs?” We make software with a purpose, for people. The work you do in a newsroom helps people live their lives and participate in society.

Tasneem Raja, interactive editor at Mother Jones, writes about moving from traditional journalism to news applications because “I wanted to learn new tools and techniques for getting them online and in public, analyzing them at scale, and publishing the stories inside in whatever format would get the point across best.”

Ted Han, the developer on the DocumentCloud project, gives an account of using DocumentCloud to help get leaked documents onto the internet. News applications like Document Cloud are the start of something new, Han says:

Projects like DocumentCloud are part of an ecosystem of tools which have never before existed in journalism. Devs on these projects can help make journalists more powerful and dangerous, and information better accessible to citizens. While a vanguard has already proceeded us (people like Simon Willison and Adrian Holovaty of Django fame, David Nolen and Mike Bostock of the NYT, or DocumentCloud’s own emeritus Jeremy Ashkenas), there are so many more ways that developers can help improve the news and the civic sphere. Our forbearers have blazed a trail, but there’s still a civilization to build.

Ben Welsh, a database producer at the Los Angeles Times, writes about the opportunities available in journalism in an appeal straight to computer programmers:

Computer programmers, I believe you have it in you. You’re a curious, inventive, free-thinking lot. And there’s a way you can apply yourself that is more morally ambitious than a ridiculously violent video game or an empty money chase.

That is speaking truth to power. And it’s a career path that, with your skills, is so open to you today it might be shocking.

Greg Linch, who manages data and tech projects at the Washington Post, breaks down the “near limitless opportunities to solve interesting problems” in news. “Whether they’re hard journalistic problems or even hard computer science problems,” he says, “I’m ridiculously excited just thinking of all the possibilities.”

Ryan Pitts, who ran web development at The Spokesman-Review, and now works on data tools for CensusReporter (and is the developer of our own project, Source), tells an amazing story of the night code he wrote helped a elderly woman survive a snowstorm:

It still breaks my heart in all sorts of ways. I’m not so good at building things with my hands and I’m too spooked to change the oil in my own car, but this one time I made a thing on the internet that brought two human beings together, and it made both of their lives better.

I still chase after that feeling. I get to do that every day.

Jacqui Maher, assistant editor of interactive news at the New York Times, gives a bullet-pointed list of the challenges she thinks engage developers in the newsroom. In the end, she makes the case as simply as I’ve ever seen: “Develop in the newsroom because you like challenges that come with rewards. Do it because you’ll love it.”

Loving what you do is important; passion counts. Developing in the newsroom is a chance for people that love to code to make a real impact on the world. And in 2014 we are looking for five people to spend ten months doing exactly that. As Knight-Mozilla Fellows you get to engage in the real challenges of journalism, get to spend time in the best newsrooms in the world, and get to write open-source code that makes an impact on the news and the world at large. Are you ready? Apply today.

PS. In New York City and curious to learn more about the fellowships? We’ll be holding an infosession next Friday, July 26 in Brooklyn. Come join us.

July 19, 2013 09:57 PM

July 18, 2013

Erika Owens

Open source and OpenNews in NYC

OpenNews open source science fair

Open source development can often involve much solo toiling, but next week in New York there will be two in-person opportunities to learn about awesome open source projects.

  • Open Source Science Fair 2.0: The New York Times is hosting a science fair of dozens of open source projects looking for contributors. The event will include lightning talks and presentations from Dan Sinker, Hilary Mason, and Haleigh Sheehan. The two NYC-based Knight-Mozilla Fellows, Brian Abelson and Mike Tigas, will also be exhibiting projects during the fair. It's a great chance to visit a news organization and learn more about open source contributions as well as the intersection between open source and journalism.
    Stop by on Thursday, July 25 from 6:15-10:30pm at The New York Times Building  (242 W 41st St, New York, NY 10036)

  • Knight-Mozilla Fellowship info session: There's a month left to apply to become a 2014 Knight-Mozilla Fellow. This is a chance to ask questions about the Fellowship in person and learn more about the open source work created by Fellows and the rest of the journalism tech community. Come say hi and share a drink with the OpenNews crew.
    Stop by on Friday, July 26 from 5-7pm at Building on Bond (112 Bond St, Brooklyn, NY 11217)

It has been a really exciting week in the Fellowship search process, where we've heard from some of the best news developers in the business about why they do what they do. Their posts are inspiring and hilarious, and they make a compelling case for why if you love to code and you want to see the immediate impact of your work, you should join them in the newsroom. If these stories have you intrigued and they've raised some questions for you, please stop by next Friday. If you can't make it then, email me and let's set up a call or grab coffee.

Journalists can be a quirky, brilliant bunch. If that sounds like your kind of people, come learn about open source development and the newsroom.

July 18, 2013 04:07 PM

July 17, 2013

Dan Sinker

OpenNews: Why Develop in the Newsroom (part 1)

One month from today, our search for our 2014 Knight-Mozilla Fellows come to a close. In the thirty days between now and then, news will break that will change the way you understand the world around you. It may be local, it may be national, it may have global reach. As with most news, you don’t know it’s coming until it’s upon you, and the way it breaks is unpredictable. That immediacy and unpredictability poses a challenge for the talented developers that work in newsrooms, but they wake up every day ready to face it.

As we talk with people about the fellowships, one question keeps coming up: I’m a talented coder—why develop in the newsroom? So this week, one month out from our deadline to apply, we’ve posed that very question to people who do this for a living. Each answer is unique, and worth a full read on its own—but a few important callouts are featured here.

For Derek Willis, who works on political and election-related applications and APIs New York Times, journalism offered “a beautiful opportunity, an underbelly filled with ever-changing stories and challenges, and a chance to make an impact beyond the web.” Willis continues:

If you’re interested in contributing to our shared civic life, where we learn about the issues that define us and our future, there are few better places to be. We are not campaigners in the usual sense, but our mission is a better-informed and active citizenry, and newsrooms have a built-in platform for driving that effort. We do things that are not popular in the conventional sense but are necessary for a free society or shed light on an important issue. Newsrooms are about war and peace, laughter and pain and every aspect of our world.

Miranda Mulligan, who heads up the KnightLab at Northwestern University, sees opportunity for innovation in news today: “There are few moments in time more innovative, entrepreneurial and exciting than right now in the news industry,” she writes.

Chris Keller, a data journalist and technologist for Southen California Public Radio, has found that newsroom development has taught him lessons beyond the code itself:

These lessons have very little to do with classes, functions and syntax and everything to do with helping to reinforce the core mission of journalism: hold those in power accountable, help people make sense of the world around them and celebrate their place in it.

Alan Palazzolo, who codes at the local news organization MinnPost, came into the newsroom after being a more traditional developer. It was the growing journalism-code community’s commitment to open-source that attracted him: “One of the biggest things that drew me to the newsroom was seeing some of the amazing code that folks were producing in newsrooms and putting up on Github,” Palazzolo explains.

Michelle Minkoff, a journalist-programmer at the Associated Press, writes an extensive lists of reasons to develop in the newsroom. One is the colleagues you’ll find in the newsroom itself:

Journalists are some of the most interesting personalities I know — people who literally learn, and impart that knowledge to others, for a living. The subject matter is always shifting, and never boring, so the people aren’t boring. Your colleagues will provide fascinating conversation, and stretch your mind. They’ll embrace your eccentricities if you embrace theirs. Life is more fun this way.

Tiff Fehr, a UX engineer at the New York Times, compares newsrooms to startups and finds that “lack of diversity in tech startups is a ongoing challenge.” Working in news, Fehr writes, “means being part of an industry that currently does a better job discussing, valuing and hiring diverse backgrounds and people.”

Finally, the entire development team at ProPublica wrote lengthy and GIF-laden argument for coding in the newsroom. Every one of their points makes a clear case, but the most powerful is the simplest:

Why not use your powers for good? Make apps that hold doctors accountable, show inequality in schools and reverse-engineer political targeting. Help readers make sure the nursing home they’re considering doesn’t have years and years of deficiencies. Or help voters look up whether their representative is for or against SOPA and PIPA. Let other journalists and researchers easily see how nonprofits spent their money.

These are just a few of the reasons to write code in the newsroom. We’ll excerpt more pieces as the week progresses.

As a 2014 Knight-Mozilla Fellow, as you spend 10 months at some of the greatest news organizations in the world, writing code, hacking with collaborators, and helping to push journalism forward, you’ll answer this question in your own way. So apply today—on August 17th, the window to apply closes for good.

PS. If you’re in New York City next Friday, we’re hosting an in-person infosession about the fellowships and you should totally come.

July 17, 2013 10:08 PM

July 11, 2013

Mike Tigas

Things in the pipeline

I’ve been a terrible blogger lately, but I’m going to be catching up on writing very soon. (Yes, I always say that.) I’ve got about as many “blog posts yet to write” as I have projects — and I’ve got an interesting plethora of those as of late.

…Actually, let’s talk about some of those for a minute. I plan on discussing a few of these at length over the next couple weeks (as big news is on the horizon for some), but here’s a collection of some things I’ve been working on since I last wrote:

I’ve probably forgotten something in there… But you’ll hear back from me soon enough.

(Are you a developer-ish person? Do these projects sound interesting to you? You should check out the Knight-Mozilla OpenNews Project, get involved with the gang, and apply to be a 2014 fellow. Seriously, check it out.)

July 11, 2013 04:20 PM

July 08, 2013

Manuel Aristarán

Knight-Mozilla OpenNews Fellowships 2014 — Abierta la inscripción

Está abierta la inscripción para la fellowship OpenNews 2014 patrocinada por la Knight Foundation y Mozilla Foundation Es el tercer año que se desarrolla este programa, del que soy becario para el 2013. Si les interesa pasar un año trabajando en la intersección del periodismo+programación+datos+visualización, viajando a conferencias interesantes y compartiendo experiencias con periodistas y técnicos de muchos medios del mundo, les recomiendo que apliquen! Según Dan Sinker (el director del programa): “It’s 10 months to write open-source code with impact, travel the world, engage in dynamic communities, and create tools and projects that help the world learn more about itself.” Las redacciones que participan el año que viene son: The New York Times (NYC), ProPublica (NYC), The Texas Tribune (Austin, Texas), La Nación (Buenos Aires), Ushahidi+Internews Kenya (Nairobi). Cualquier cosa, me preguntan.

July 08, 2013 01:02 AM

July 01, 2013

Friedrich Lindenberg

Notes: Crunching text documents for fun and knowledge

One exciting development at Spiegel is the recent introduction of a weekly data journalism workshop that brings together reporters, fact checkers and designers from both the print and online sections of the organisation.

This week's workshop will focus on dealing with large collections of documents, so I took some time on Monday to experiment with a few different text mining components. My goal was to find usable tool that include an accessible interface rather than pure APIs and libraries. My interest in this was greatly enhanced after meeting Jonathan Stray at the Civic Media Conference last week and learning about his Overview project.

As a working dataset, I chose the parliamentary documents - bills, transcripts and other business - of the German Bundestag; a set of about 22,000 PDF files covering a wide range of topics and formal structures. Using a German dataset made for an additional challenge, as many linguistics toolkits support only Spanish, French and, of course, English.

Content extraction with Tika

After downloading, I hoped to use Apache Tika to mass convert the documents to plain text for further processing. While Tika supports a wide range of different formats, it appears to be focussed on converting individual files rather than crawling folders. Its user interface is fun to play with, but I'm not sure it has any real world applications. And while Tika has a server mode, it's based on piping data in via raw TCP/IP. I was unable to have it convert any documents. Starting (nay, booting) Tika for each document seemed like a waste of time.

Stanbol to the rescue

My dilemma was eventually solved by Apache Stanbol, which fellow fellow Manuel had recommended I should try out. This project seems to have the goal of using linked data to glue up as many natural language processing libraries as they can fit into a single Java container. As part of this software smörgåsbord, the maintainers have included a REST API for Tika which can return either a document's plain text or its metadata.

While hardly a non-techie solution, this allowed me to script up a CSV file containing each document's title, text, source URL and modification date. I'm still hoping to try this type of bulk conversion out on a set of documents in more diverse formats, but I'm very optimistic about Tika's ability to crack open some Word documents.

Stanbol also integrates a wide range of other language processing and entity extraction tools via a set of configurable processing pipelines. I'm not sure the benefits of a REST API on top of these services really makes up for the additional integration work required by its RDF output format.

Jigsaw: Entities, visualized

Jigsaw is a visual analytics tool developed by researchers at Georgia Tech, which I'd heard about from Sebastian. While the software allows imports from a range of formats, its scalability seems to be quite limited. I had to shrink my document set down to about a hundred Bundestag documents to achieve an acceptable level of responsiveness. This may be related to document size, however, as I later had a much better experience using a set of 1000 Spiegel Online news stories.

The Jigsaw interface is the type of thing that will make you want to tear out your own eyeballs, but there is a set of tutorial videos which help to alleviate the pain. Once you get the hang of it, though, the package turns out to be fairly useful with a broad variety of visual methods for slicing, dicing and sorting the document set.

Entity extraction underlies much of Jigsaw's functionality, so the lack of support for the German language really comes to bear on this tool. Still, it supports a variety of extractors, including Reuters' OpenCalais web service. Even for English documents, I didn't see any support for the normalization of extracted entities, so "Edward Snowden", "Mr. Snowden" and "Edward J Snowden" remain separate.

The different views of Jigsaw - graphs, time lines, and various clever listings - are well thought out, but on the whole, it remains a research tool that would require some productization before being ready for day-to-day use.

Pretend its not programming

KNIME is the most comprehensive data and text processing tool I looked at, which is probably also its weakness in the face of journalists. The tool, while certainly a fully-fleged data workflow editor, seems to be based on the belief that the hard part about programming is learning the syntax. What the point and click interface enables is essentially coding, even though it comes in the shape of menus, tabs and dropdowns.

Still, I enjoyed the tools documentation sidebar, which gives nice primers on the indivdual processing nodes, including some statistical methods.

Overview

As mentioned above, I was especially interested in Overview. Like OpenSpending, the project was a winner of the Knight News Challenge in 2011 funded to build out some experimental tools used for the WikiLeaks cables inside the AP. Made for the newsroom, Overview directly integrates with DocumentCloud and features a simple and clean web interface.

Unlike Jigsaw, Overview makes no use of entity extraction and relies entirely on term frequencies in documents. Documents are visually clustered by showing characteristic terms for document groups in a tree structure. While this provides a neat way to dissect a document set, it is also the only means of navigation. In Boston, Jonathan mentioned they were about to add a second view to support time-based analysis. Still, this is a far cry from the variety of visual facets provided by Jigsaw or Nuix.

Overview's frequency-based approach is quite prone to highlight the specific lingo used in a set of documents. The Bundestag dataset, for example, clustered mostly around terms such as "paragraph", "article", "commission" and "decision". These terms are probably fairly distinctive, but they are hardly topical. The result for German Spiegel Online articles was even worse, Overview generated an almost perfect stop list for the language.

Uploading only the English-language, international section of Spiegel, on the other hand, gave me a fairly decent overview of recent political debates.

Summary

Looking at the state of these tools, it's clear that there is no silver bullet. While Overview looks likely to become a great tool to handle documents at a large scale, it doesn't yet offer the necessary range of visual analytics. Jigsaw has the right tools, but does not seem to scale very well.

Further alternatives would have been Nuix, which has been advertised quite heavily by the people involved in OffshoreLeaks, but seems rather expensive. DocumentCloud is starting to offer some rudimentary entity and timeline-based analysis views out of the box, while Solr continues to be a great solution for full-text search and faceting.

German language support, however, continues to be the biggest issue for all of these open source tools. The only freely available entity extractor appears to be a branch of Stanford NER which hasn't been integrated into any of the tools mentioned in this post. While Germany has a number of top-notch computer linguistics faculties, none of them seems to feel the need to open up their resources to the public. Let's talk about open access.

July 01, 2013 12:00 AM