The Big Dig: Scraping and Scooping the Web

Data ScrapersI’ve blogged before about how the Internet is making people’s lives pretty much an open book.

Most people who are online are pretty aware of how their reputation can be affected by their Facebook or MySpace pages and other public or quasi-public online information. But The Wall Street Journal has been publishing a series of stories on how much more pervasive than that digital snooping has become.

The series is titled “What They Know” … and it’s well-worth checking out. The most recent article appeared on the front page of the October 12, 2010 edition of the WSJ, and focuses on the phenomenon of “data scraping.”

For those who aren’t familiar with the term, “scraping” is a method by which sophisticated software is used to access and scoop up information that has been posted anonymously on sites that are supposed to be closed to prying eyes. One example cited in the WSJ article of a site that has been scraped is PatientsLikeMe, which has message boards and forums dealing with mental disorders, depression and other issues that most people would prefer to keep private.

People who post on discussion forums like these do so using pseudonyms, and the identity of the posters is carefully guarded by the host sites.

But it turns out that these sites are little match for the sophisticated IT capabilities of companies like Nielsen and PeekYou, who are in the business of matching psychographics as well as demographics to individual people for purposes of serving up relevant advertising — and goodness knows what else.

Think of it as the “lifestyle” direct mail lists of yesteryear – but now on steroids.

PeekYou has applied for a patent on a system whereby it matches real people to the pseudonyms used on forums, blogs, Twitter and other social media outlets. Taking a “peek” at the company’s patent application reveals the great lengths their systems go to ferret out and cross-analyze small, innocuous bits of information that, taken together, find the “needle in a haystack” match to the actual individual:

 Birthday match
 Age match
 First name match
 Nickname match
 Middle name match
 Middle initial match
 Gender match
 e-Mail address match
 Phone number match
 Physical address match
 Username match

When you consider that the same type of powerful computers that are used to analyze and process search engine queries are the ones processing millions or billions of information bits and instantaneously testing and slotting them based on relational patterns … it’s not hard to understand how, over time, eerily accurate portraits of individuals can be drawn that not only correctly reflect the “demographics” of the person, but also a host of psychographic and behavioral aspects such as:

 Shopping habits
 Recreational pursuits
 Personal finance profile
 Health information
 Political leanings
 Hobbies and interests
 Spirituality/religiosity
 Sexual preference or sexual proclivities

The WSJ articles detail how web sites are attempting to stay one step ahead of the “scrapers” by employing software that alerts them to suspicious “bot” activity on forums and other password-protected areas. It’s often a losing battle … and is that particularly surprising?

These days, not even the Orthodox monks at Mount Athos are protected, probably!

Newspapers Turn on Each Other

Dinosaurs in Disney's FantasiaLast week, the Associated Press reported that U.S. newspaper advertising revenues declined dramatically in 2009, bringing ad receipts to the lowest level recorded in nearly 25 years.

In fact, newspaper publishers’ total advertising revenues last year came in below $28 billion, down $10 billion from 2008. According to the Newspaper Association of America, annual ad revenues have now fallen by nearly $22 billion – a whopping 44% — since 2006.

And now, amid this toxic environment comes word that The Wall Street Journal has declared an all-out war on The New York Times for local advertising. In mid-April, the Journal — up to now focused almost exclusively on national and international news — is set to introduce a New York-focused section as part of its paper. Outside observers believe this will put as much as ~20% of the New York Times’ retail advertising revenues at risk.

And this isn’t a minor foray on the part of the WSJ, either. It will be spending upwards of $15 million to produce the new 12-page section which will cover local business, real estate, sports and cultural events. The financial outlay includes salaries for ~35 editorial writers – surely one of the few instances of new editor jobs actually becoming available.

The WSJ action couldn’t come at a worse time for the Times, which has experienced sharper ad revenue declines than the industry average. It’s responding by launching a major trade marketing campaign of its own, touting its audience strength with female readers and “high culture” afficionados.

But just how effective this countermove will be is debatable, as recent moves by the paper haven’t exactly telegraphed a continuing commitment to the local news scene. In the last few years alone, the Times has consolidated weekly sections covering specific regions of the New York metro area (Long Island, Westchester, Northern New Jersey), as well as axing its stand-alone “City” and “Metro” sections.

Over the coming months, it’ll be interesting to see how effective the WSJ is with its new local-focused section – whether or not it’ll land a major blow on its rival.

Either way, the vision of two venerable newspapers engaged in a Herculean struggle, fighting over an ever-shrinking advertising pie is isn’t exactly a pretty sight.

It reminds me of the famous scenes in the Disney movie Fantasia of the huge dinosaurs furiously going after one other – even as the world’s changing ecosystem is rendering the entire species extinct.

It’s official: Clickthrough advertising effectiveness on mobile devices is somewhere south of atrocious.

As usage of the Internet on mobile devices like the Apple iPhone has become more prevalent, many businesses have been wondering how important it is for them to cater to these users through the creation of web sites that are optimized for mobile display.

Although creating a mobile version of a web site doesn’t have to be a major undertaking, it is “yet another task” to add to the marketer’s never-ending to-do list. So, just how important is it?

Chitika, Inc., a Massachusetts-based online advertising network, has analyzed the behaviors of “mobilists” and found some interesting results when it comes to their viewing of advertising and taking action. In tracking more than 92 million ad impressions served up by browsers, it turns out that mobile internet users clicked through at a far lower rate than those viewing ads on desktop machines.

How much lower? The overall clickthrough rate for mobilists was 0.48%, compared to a clickthrough rate of 0.84% for non-mobile users. That’s a serious difference, and gets about as far in the basement as you can go.

But why are the numbers so abysmal? More than likely, several factors are at work. First, consider the ways people use their mobile devices. It’s usually because they want to know something immediately … and it’s at times like those that folks are less inclined to get sidetracked by clicking on advertising links. By contrast, the “immediacy” factor with non-mobile devices often isn’t as acute.

Also, consider the load time on mobile devices – rather much slower. For that reason, mobile web content tends to be less informationally rich — or compelling in its appearance — thus decreasing its “stopping” power.

What this means for advertisers is that the key for succeeding in the mobile space is catching consumers at just the right time, not happening to catch them at any time. Easy enough in theory … but would anyone care to volunteer for putting this into practice? Best of luck to you.

From the perspective of the media purveyors, the Chitika findings certainly won’t make their task of attracting additional advertising revenues in the mobile sector any easier. Perhaps that’s why The Wall Street Journal announced last week that, beginning in November, it will be charging mobile users a weekly fee to access its content on mobile devices – and those fees will be charged to WSJ subscribers and non-subscribers both.

It’s further proof that relying on display advertising revenue simply isn’t cutting it as a practical business model in the mobile environment.