The Big Dig: Scraping and Scooping the Web

Data ScrapersI’ve blogged before about how the Internet is making people’s lives pretty much an open book.

Most people who are online are pretty aware of how their reputation can be affected by their Facebook or MySpace pages and other public or quasi-public online information. But The Wall Street Journal has been publishing a series of stories on how much more pervasive than that digital snooping has become.

The series is titled “What They Know” … and it’s well-worth checking out. The most recent article appeared on the front page of the October 12, 2010 edition of the WSJ, and focuses on the phenomenon of “data scraping.”

For those who aren’t familiar with the term, “scraping” is a method by which sophisticated software is used to access and scoop up information that has been posted anonymously on sites that are supposed to be closed to prying eyes. One example cited in the WSJ article of a site that has been scraped is PatientsLikeMe, which has message boards and forums dealing with mental disorders, depression and other issues that most people would prefer to keep private.

People who post on discussion forums like these do so using pseudonyms, and the identity of the posters is carefully guarded by the host sites.

But it turns out that these sites are little match for the sophisticated IT capabilities of companies like Nielsen and PeekYou, who are in the business of matching psychographics as well as demographics to individual people for purposes of serving up relevant advertising — and goodness knows what else.

Think of it as the “lifestyle” direct mail lists of yesteryear – but now on steroids.

PeekYou has applied for a patent on a system whereby it matches real people to the pseudonyms used on forums, blogs, Twitter and other social media outlets. Taking a “peek” at the company’s patent application reveals the great lengths their systems go to ferret out and cross-analyze small, innocuous bits of information that, taken together, find the “needle in a haystack” match to the actual individual:

 Birthday match
 Age match
 First name match
 Nickname match
 Middle name match
 Middle initial match
 Gender match
 e-Mail address match
 Phone number match
 Physical address match
 Username match

When you consider that the same type of powerful computers that are used to analyze and process search engine queries are the ones processing millions or billions of information bits and instantaneously testing and slotting them based on relational patterns … it’s not hard to understand how, over time, eerily accurate portraits of individuals can be drawn that not only correctly reflect the “demographics” of the person, but also a host of psychographic and behavioral aspects such as:

 Shopping habits
 Recreational pursuits
 Personal finance profile
 Health information
 Political leanings
 Hobbies and interests
 Spirituality/religiosity
 Sexual preference or sexual proclivities

The WSJ articles detail how web sites are attempting to stay one step ahead of the “scrapers” by employing software that alerts them to suspicious “bot” activity on forums and other password-protected areas. It’s often a losing battle … and is that particularly surprising?

These days, not even the Orthodox monks at Mount Athos are protected, probably!

Smartphones surge … and phone apps follow right behind.

Smartphones surge in the marketplace ... phone apps right behind them.Media survey firm Nielsen is reporting that as of the end of 2009, about one in five wireless subscribers in the U.S. owned a smartphone. That’s up significantly from the ~14% who owned them at the end of 2008, and adoption is only expected to accelerate in the coming months.

So what’s going on with phone apps, now that a larger chunk of the population is able to download and use them? Nielsen is seeing about 15% of mobile subscribers downloading at least one app in a 30-day period.

Perhaps not surprisingly, those who own iPhones are more apt to download apps compared to people who own Android phones, Palms or BlackBerrys. Far more apps have been developed for the iPhone, although Android is feverishly trying to catch up.

Which apps are most popular? It goes without saying that games – free and paid – are quite popular. But the four most popular apps are Facebook, Google Maps, the Weather Channel and Pandora.

And where are the news apps in all this? Not even on the radar screen, it turns out.

… Seems people are getting more than enough news blasted out to them 24/7/365 without needing to sign up for a special app to deliver more of it — thank you very much.

The Mobile Web: Great Promise + Growth Pains

It’s clear that the mobile web is a big growth segment these days. Proof of that is found in recent Nielsen statistics, which have charted ~34% annual growth of the U.S. mobile web audience, now numbering some 57 million visitors using a mobile device to visit web sites (as of late summer 2009).

And now, a new forecast by the Gartner research firm projects that mobile phones will overtake PCs as the most common web access devices worldwide … as early as 2013. It estimates that the total number of smartphones and/or browser-enhanced phones will be ~1.82 billion, compared to ~1.78 billion PCs by then.

Gartner is even more aggressive than Morgan Stanley’s prediction that the mobile web will outstrip the desktop web by 2015.

So, what’s the problem?

Well … consumer studies also show that web surfing using mobile phones continues to be a frustrating experience for many users. In a recent survey of ~1,000 mobile web users, web application firm Compuware/Gomez found that two out of every three mobile web users reports having problems when accessing web sites on their phones.

Because people are so used to fast broadband connections – both at home and at work – it’s only natural that their expectations for the mobile web are similarly high. To illustrate this, Gomez found that more than half of mobile phone users are willing to wait just 6 to 10 seconds for a site to load before moving on.

And what happens after they give up? Sixty percent say they’d be less likely to visit the site again. More importantly, ~40% report that they’d head over to a competing site. As for what would happen if the mobile web experience was as fast and reliable as on a PC, more than 80% of the respondents in the Gomez study claim they would access web sites more often from their phones.

For marketers, this means that to maximize their success in the mobile world, they should reformat web sites to conform to the small-form factor of handheld devices. And Gartner also notes that “context” will be the king of the hill in mobile – more than just “search” – in that it will deliver a personalized user experience. New functionalities such as Google’s “Near Me Now” are providing information on businesses, dining and other services that are in the proximity of a mobile user’s location. These and other innovations are opening up whole new dimensions to “seeking and finding” in the mobile web world.

Finally, PBS Gets on the Nielsen Bandwagon

It took three or four decades, but the PBS network has finally signed up for full Nielsen demographic ratings for its TV programs. Now, for the first time, marketers will be able to access and review full demo data on who’s watching what on the Public Broadcasting System – information that has been crucial in making decisions on where best to promote products on broadcast TV.

And it’s about time. For far too long, advertisers could rely only on educated guesswork to weigh the effectiveness of promoting their products and services on PBS’s leading programming fare.

Of course, PBS doesn’t present advertising the same way as do other networks, because it’s ostensibly commercial-free programming. But even though PBS is a commercial-free broadcasting service, in recent years it has offered sponsorship deals with major advertisers in the form of comprehensive messaging that is broadcast before and after the shows air.

Indeed, veteran watchers of PBS programming have noticed more extensive promo messages that have gotten awfully close to out-and-out commercials – even though they aren’t ads in the “traditional” sense.

And up until now, PBS has not officially released any extensive form of demographic data, making promotional efforts more of “crap shoot” for advertisers than anything else.

But now PBS has signed up with Nielsen for full demos. The new rating service began on PBS with the Ken Burns series on national parks earlier in the year. According to Nielsen, that documentary scored an overall household average audience rating of 3.5, with an average of 5.5 million viewers tuned in per episode. And the internals provided some interesting clues as to the age, income and educational characteristics of viewers — older, more affluent, and better educated.

Which programs are on tap for Nielsen demo ratings going forward? PBS staples, of course – Masterpiece Theater, Antiques Roadshow, NOVA, Nature and Frontline. They’ll all have weekly demographic rating information, along with several of PBS’s famed kids programs including Sesame Street and Sid The Science Kid.

What’s a little ironic about the latest news is that, after all these years, PBS has finally gotten on the Nielsen bandwagon … just at a time when when broadcast TV audience stats are mattering less and less. The ever-growing non-TV alternatives provided by the Internet have seen to that. And coupled with that, the overall audience for PBS programming has been shrinking.

Social Media and the Internet: Click … or Clique?

All of the hype about social marketing and social media might make you believe that people are flocking to this new form of communications in droves.

Well, if you think this … you’re right. And now we have the stats to prove it. The Nielsen Company has just released web statistics for the month of August that report that time spent on social networks and blogging sites accounted for ~17% of all time spent on the Internet.

Compared to August 2008, this figure is nearly triple the percentage of time spent on social networks and blogging sites just one short year ago. Seeing as how there is an upper-limit ceiling on the total amount of time available to spend online in any given day, the increased attention on social media is coming at the expense of the more traditional use of the web as an informational tool.

This is not to say that text and video content don’t remain central to the online experience, because that is clearly the case. But the ability consumers have now to use platforms like Facebook and blogging sites to “connect, communicate and share” is what’s driving much of the continuing growth of the web and online engagement.

Because of this new emphasis, is it any wonder more online advertising dollars are chasing social media than ever before? Nielsen pegs advertising on social media sites as representing ~15% of total online ad spending in August 2009. That’s more than double its proportion a year earlier.

Along with the shift in online ad revenues to social media sites, we’re also experiencing a major change in clickthrough behavior as it pertains to online display ads. Research published recently by comScore shows that the percentage of people who clicked on one or more display ads during a monthly period of Internet interaction – in this case March 2009 – was only ~16%.

How does that result compare with earlier surveys? It’s dramatically lower, and dropping. Just two years ago in 2007, ~32% of people online clicked on at least one online display ad over a month-long period – twice the proportion as today.

What’s more, the comScore analysis reveals that a very small portion of viewers represent the vast majority of the clickthrough activity. Specifically, only ~8% of the people are responsible for ~85% of all clicks. Of course, we can be sure that the robust clickthrough behavior of these 8% translates into equally robust product sales … NOT!

Clearly, any company that is attempting to promote products and services over the Internet needs to carefully study the composition of its market and the behavior of its online audience targets before making extensive online advertising program commitments.

The reality is, with the dynamics we’re seeing such as the behaviors noted above, it’s more likely an online promo effort will fail rather than succeed unless a dispassionate review of the situation is done beforehand and a practical, realistic program put into place.

But that’s so unlike many of the web advertising programs we’ve seen implemented up to now, which could be best characterized as: “Throw a bunch of advertising at the web and hope some of it sticks.”