Mere Words? Google’s Library Project Speaks Volumes

Google Library Project
Google's Online Library Project: 5 million+ volumes and growing.
An article published recently in Science magazine provides fascinating sociological findings based on researching the content of the growing number of books in Google’s digital library.

Google has amassed a database of some 2 billion words and phrases from more than 5 million books published over the past 200 years. Much of the news coverage about this project has been focused on the intense criticism of some publishers and authors who are concerned about copyright protections and Google’s alleged knowledge “power grab.”

But a more interesting and useful result of Google’s library project has been that linguists have been able to use this trove to measure information and trends based on the language in the books and the people and concepts that are referenced therein.

By analyzing the digitized text of the books in Google’s database in relation to when they were published, the researchers found that they can measure all sorts of trends – such as changing tastes in foods, ebbs and flows in relations between countries, and the role of religion in the world.

For example, references to “sausage” peaked in the 1940s and have dropped off dramatically since then, whereas references to “sushi” began to appear in significant volume in the 1980s.

It’s also interesting to see how references to certain “personalities” grow or decline over the decades. Revolutionary leader Che Guevara was covered widely in the 1960s but has receded since then, whereas Hollywood actress Marilyn Monroe has seen a slow, steady increase in references even decades following her death.

References to “God” have declined steadily since its peak usage in the 1840s, which likely comes as no surprise. More interestingly, references to “men” far outpaced women all through the 1800s and 1900s … until the 1980s when the two were at parity. And by 2000, references to women surpass those of men.

When evaluating emotional concepts, the researchers have found that concepts like “empathy” and “self esteem” have exploded since the 1940s and 1950s … while those of “will power,” “self control” and “prudence” have all declined.

Commenting on the importance of this academic research, Mark Liberman, a computational linguist at the University of Pennsylvania, said, “We see patterns in space, time and cultural context on a scale a million times greater than in the past.”

It turns out that Google’s digital database of books is but a small fraction of the total number of volumes published since the invention of the printing press; that figure has been estimated at ~129 million. But Google’s 5 million+ books are giving us a much more precise view of trends than what’s ever been possible before.

And an interesting ancillary finding of the research is realizing the number of completely new words that have come into use in the English language. It turns out that more than 500,000 new English words that have made their “debut” since 1950.

Google is making this data available at a time when it continues to face criticism about its online library endeavor. The initiative has faced copyright disputes, lawsuits and charges that Google is attempting to create an “information monopoly” (some of which have been sort of settled). But over the long haul, I think it’s a pretty safe bet that people will view the pluses as outweighing the minuses in Google’s library project.

College education in America: What the hey, let’s party!

The Five-Year Party, by Craig BrandonJust in time for the upcoming school year, a new book has hit the stores that launches a fierce attack against today’s college education in America. As a father of one recent college grad plus another daughter just beginning her sophomore year, The Five-Year Party: How Colleges Have Given Up on Educating Your Child, by Craig Brandon (ISBN #ISBN-13: 978-1935251804) caught my eye.

Brandon is a former education reporter and college writing instructor. What’s his main beef? That college administrators have taken advantage of government loan largesse and other programs to create a campus environment that’s hardly conducive to the disciplined intellectual labor of learning. The way Brandon sees it, college administrators are more interested in students’ pocketbooks than their intellects.

Brandon trains most of his firepower on liberal arts colleges, many of which he characterizes as “education-free zones” where quaint traditional notions of learning – like attending classes and doing assigned homework – have gone by the boards. He cites statistics that only ~30% of students enrolled in liberal arts institutions graduate in four years … and that fully 60% take six years or more to get their undergraduate degrees.

The book outlines the conditions that contribute to these sorry statistics. Extensive student loan and grant programs mean that few if any students ever pay the “book rate” tuition at a private college or university. This has made it easier for institutions to raise tuition rates far in excess of the inflation rate.

Brandon claims it has also led school administrators to tolerate – even abet – the extra years students spend on campus. After all, it’s more money for them to pay officials their lucrative salaries … not to mention bankrolling the country-club like student centers and new athletic facilities that seem to be on every college’s wish list.

And during that extended time on campus, it’s “party on!” Never mind the lower educational standards … it’s easier and far more lucrative for colleges to give students what they want, rather than what they need, to build meaningful careers afterwards.

And what about the instructors? They may well lament the decline in educational standards. But they’ve found out the hard way that to enforce rigorous educational standards in the classroom invites a flurry of negative reviews on student evaluation forms (that are easily accessible online) – reviews that are often linked to tenure and promotion decisions. It’s easier to go with the flow, provide reasonably entertaining lectures … and give out decent grades to all but the worst performers.

But what does this mean for those students who decide to work hard during their college years and to graduate on time? They may end up with a degree that’s devalued in the eyes of employers.

Moreover, the general decline in the value of a college degree affects even those schools that have tried hardest to maintain the traditional rigors of education that once characterized nearly all liberal arts schools – practices such as requiring students to take extensive coursework in subjects that go beyond their chosen field of study.

Colleges like Davidson in North Carolina, Hillsdale in Michigan and Rhodes in Tennessee may make studying and achieving top grades a huge challenge for their students … but their regional reputations mean that those degrees don’t carry much cachet beyond a 300-mile radius of the school.

Meanwhile, students who attend some of the nation’s better-known ivy league universities or “near ivy” institutions sail on through, cafeteria-style, taking only coursework that is easiest or of greatest interest to them.

Who’s the bigger chump then?

It’s a bit painful to read The Five-Year Party … and hard to finish it without feeling pretty depressed about the state of liberal arts education in America. Besides, does anyone know of a liberal arts school or university that has actually gotten a good handle on controlling its spending? I can’t think of one.

This book makes it easier to recognize the merits of America’s community colleges, which help kids start out their higher education in ways that allow them to explore different areas of interest without the distractions of the “party hearty” campus atmosphere – or breaking the family bank for that matter. Institutions like Chesapeake College in my home area on Maryland’s Eastern Shore are doing yeomen work in this regard, and they deserve better recognition for it.

Not All is Well in the World of Wikipedia

A few months ago, I blogged about “Wikipedians” – the hordes of people around the world who write and edit for the online encyclopedia sensation. In the “free information” world of the Internet, in just a few short years Wikipedia has effectively knocked the more traditional encyclopedias like World Book and Britannica off of their perch as the denizens and purveyors of broad knowledge.

With its self-described goal to be “the sum of all human knowledge,” Wikipedia has become the world’s fifth most popular web site, attracting more than 325 million visits per month – a 20% increase in traffic from a year ago. All this success, even as there have been well-publicized incidences of “rogue” information incorporated into Wikipedia article entries, either through honest error or deliberate insertions of false material; the report on Ted Kennedy’s Wikipedia entry of the senator’s death months before its actual occurrence is but one recent example.

Indeed, among the strengths of Wikipedia’s information model has been the idea of crowd-sourcing, with its legions of editors who have kept an eagle eye on Wikipedia entries to police them and aggressively remove incorrect, non-cited or otherwise suspect information.

But just last week, The Wall Street Journal published a front-page story reporting that Wikipedia is losing its volunteer force at a much steeper rate than ever before. Writers and editors have been departing Wikipedia faster than new ones joining – and the net decline has become particularly pronounced in 2009, to the tune of a net loss ~25,000 editor volunteers every month.

What’s causing this? A few reasons that have been suggested are:

As Wikipedia has grown, the number of topics not yet covered has diminished, making for fewer opportunities for new writers and editors to come on board with entries or to improve on existing articles.

Well-publicized problems with the veracity of some articles have caused Wikipedia to tighten its editorial and submission guidelines. The not-for-profit Wikipedia Foundation has adopted a plethora of rules (spelled out over dozens of web pages) that make it much harder for editors who are less familiar with the software to successfully post new stories – causing some articles to be removed within mere hours of being posted. It’s no secret that the burnout rate for writers is going to be higher when they have to continually debate and defend the content and format of their articles.

A reduction in the “passion” factor – Wikipedians are a bit of a different breed, driven as much by altruism as a generous dose of ego (“I’m better than the rest of you”) when it comes to information knowledge. They’ve even created their own special social world, with the Wikipedia Foundation hosting annual get-togethers of volunteers in such exotic locales as Buenos Aires where they can collectively bask in the aura of their shared “specialness.” But, people being people, over time the thrill abates except for the most passionate contributors.

The downturn in the economy probably doesn’t help, either. More free time might be available to devote to Wikipedia for those newly out of work … but in such times, more attention and effort is understandably going to be placed elsewhere.

Recognizing the need to reverse recent trends, the Wikipedia Foundation is working on creating streamlined instructions to help novice editors contribute articles that adhere to the proper submission guidelines more successfully. It is also working to actively recruit writers and editors from the scientific community, a historically soft spot in terms of Wikipedia’s information coverage.

Whether these steps will be enough to make a difference is an open question. Encouraging more volunteers from the world of science for what is essentially a volunteer mission with little or no peer recognition has been (and likely will continue to be) a tall order. And the jury is still out on the potential effectiveness of the new streamlined submission guidelines, because they haven’t been put into practice yet.

Still, there’s no denying that Wikipedia has had a major impact on the way people research information … and it has accomplished this over the span of only a few short years.

Who are ‘Wikipedians’ … and what makes them tick?

Wikipedia logoBy now, most web surfers have had first-hand experience with Wikipedia, the online encyclopedia to which anyone can contribute. As a quick resource for gaining knowledge, it’s hard to beat; it’s fast,and it’s comprehensive.

Speaking personally, fewer than 10% of my queries on Wikipedia come back empty. So I find it a great resource for getting a quick handle on most any topic.

Of course, it would be unwise to consider Wikipedia an unimpeachable resource because its content is not vetted in the traditional way. Volunteers author the articles, and it’s up to the community of Wikipedia readers to call out and correct errors or omissions to the entries.

Just who are these volunteers, and what motivates them to devote time to Wikipedia? As it turns out, while there’s no pay, there’s a strong mixture of altruism and ego gratification associated with being a Wikipedia contributor.

This is borne out in an international research survey conducted jointly by the Wikimedia Foundation (the not-for-profit group that operates Wikipedia) and United Nations University’s MERIT tech research project. A whopping 175,000 responses were collected. Of these, approximately one third reported that they contribute Wikipedia content in addition to consuming it.

So what makes those people want to become a Wikipedia contributor? Three-fourths of them agreed with the statement that they “like the idea of sharing knowledge and want to contribute to it.” Two-thirds also reported that they “saw an error I wanted to fix.” Nearly half would contribute more often “if I knew there were specific topic areas that needed my help.”

The survey also uncovered some fascinating demographic statistics regarding Wikipedia contributors. It comes as no real surprise that the median age of Wikipedia contributors is in the mid- to upper-twenties … or that one-fourth of them have post-graduate degrees.

But the gender breakdown is curious. In fact, the survey found that only 13% of Wikipedia contributors are women – a startling finding. And even among those who have edited others’ entries rather than contributing full articles of their own, only 31% are women. The researchers steered clear of suggesting any reasons for the gender skew, which was more than likely a cop-out.

And what are the reasons why people don’t contribute to Wikipedia? Predictably, “time constraints” were cited by many respondents. Another factor cited was not knowing how to create or edit the Wikipedia pages. But a substantial percentage (~25%) cited being fearful of making a mistake and “getting in trouble” for it.

So one takeaway from the survey is that it takes certain traits to be a Wikipedia contributor — like being a self-proclaimed “informed expert” looking for validation, affirmation and recognition … or even being narcissistic?

Come to think of it, perhaps it’s not so surprising after all that Wikipedia contributors are over 85% male!