From More Than 4,500 Sources, Just a Dozen Account for Most Google News Stories?

[UPDATE: Many of the Google’s senior engineers were attending the Search Engine World conference in San Jose, California when his posting appeared. Within ten days of this posting, Google appeared to have adjusted its news algorithm. Was that a coincidence or a result of publicity from this posting? Only they know. The choices of sources that Google News’ algorithms now use appear more journalistically balanced, although questions are still being raised.]

For some unofficial Web sites that we’re launching at several eastern U.S. universities this autumn, we had to find feeds of unusual categories of news stories, the quirky types of stories that are popular on campus. Our initial inclination was Google News, but we analyzed it and were surprised by its predominant choices of news sources.

Although Google spiders more than 4,500 news sources, only about dozen account for the vast majority of stories on Google News. And two of those dozen predominant sources are owned and operated by the U.S. and Chinese governments. [UPDATE: Here is an example.]

For instance, here is an analysis of the sources of the top two stories on the main Google News page one day last month:

Reuters    175 stories    18% of all
New York Times    80 stories    8% of all
Voice of America    67 stories    7% of all
Xinhua    67 stories    7% of all
Bloomberg    61 stories    6% of all
Washington Post    61 stories    6% of all
ABC News    49 stories    5% of all
Boston Globe    26 stories    2% of all
CNN    22 stories    2% of all
San Francisco Chronicle    17 stories    1% of all
CNN International    17 stories    1% of all
Christian Science Monitor    15 stories    1% of all
Toronto Star    13 stories    1% of all
Seattle Post Intelligencer    13 stories    1% of all
United Press International    12 stories    1% of all
USA Today    10 stories    1% of all
Houston Chronicle    10 stories    1% of all
FOX News    10 stories    1% of all
Newsday    10 stories    1% of all
The Globe and Mail    9 stories    0% of all

top 5 sources are 48%
top 10 sources are 66%
top 25 sources are 83%
top 100 sources are 98%

From more than 4,500 sources, is it possible that 48 percent of stories should be coming from only five sources? Or that Xinhua and the Voice of America, official news sources respectively of the Peoples’ Republic of China and the United States of America, are the third and fourth most prevelant sources of Google News? All that doesn’t seem plausible, but the data shows that is how Google News is operating.

The situation isn’t that much better when all of Google News’ categorical news pages (top stories plus all 8 of the news sections) are analysed. For instance, here’s a typical snapshot:

Reuters    1058 stories    8% of all
New York Times    646 stories    5% of all
Xinhua    482 stories    3% of all
Washington Post    469 stories    3% of all
Voice of America    396 stories    3% of all
ABC News    373 stories    3% of all    322 stories    2% of all    278 stories    2% of all    242 stories    1% of all    240 stories    1% of all
USAToday    203 stories    1% of all
International Herald Tribune    180 stories    1% of all    173 stories    1% of all    157 stories    1% of all    145 stories    1% of all    139 stories    1% of all
CNN    138 stories    1% of all
Seattle Post-Intelligencer    137 stories    1% of all    132 stories    1% of all
Houston Chronicle    125 stories    1% of all

top 100 sources are 80%
top 25 sources are 54%

Sources might shift position in those rankings as news changes day by day, but not by all that much and the top 25 tend to stay in the top 25 of Google News.

23 Replies to “From More Than 4,500 Sources, Just a Dozen Account for Most Google News Stories?”

  1. From More Than 7,000 Sources, Just a Dozen Account for Most Google News Stories?

    For some unofficial Web sites that we’re launching at several eastern U.S. universities this autumn, we had to find feeds of unusual categories of news stories, the quirky types of stories that are popular on campus. Our initial inclination was Google …

  2. actually, I’d like to see more hard data from more than one day. Did you actually count stories for a sample of days over a month?

    But it doesn’t surprise me that Google has a heirarchy of stories from a certain number of media outlets. That’s the way most news services work. I think if you drilled down, you’d be surprised at how much content is actually repetition of the same AP stories anyway.

    I’m actually far more surprised at the prevalence of Reuters than the appearance of the Chinese or U.S. news outlets.

  3. I gave only one day (actually two examples, each from different days) as an example, but have run the analysis on many random days during the past six weeks. Same basic results. Looking at Google News today, things are still roughly the same. There are a few ‘wild card’ sources (Moscow News, Albawaba Middle East News, etc.) but the majority of sources are still the same 10 to 20 or less.

  4. All the News that is Fit to Google

    It’s not clear why Vin Cosbie is surprised about this:But when I analyzed its choices of news sources, I was surprised by the results. Although Google spiders more than 7,000 news sources, only about a dozen sources account for the vast majority of sto…

  5. Google News, narrow base

    Google News, narrow base: Vin Crosbie actually went out and counted syndication-vs-reporting on some of the “Top Stories” at Google… Reuters writing was 18% of all newspaper articles listed… NYT 8%, VOA and Xinhua 7%, Bloomberg and WaPo 6%, ABC…

  6. I have also noticed a lack of weighting toward sources in close proximity to the story. The assumption should be, within reason, that a local story will be reported more thoroughly to a local audience by those familiar with local circumstances.

  7. “I had to find feeds of unusual
    categories of news stories, the quirky types of stories that are popular
    on campus. My initial inclination was to use Google News” Well, I thinh it would be a wonderfull and usefull idea to publish that list of quirky-feeds! :-)

  8. These results don’t surprise me. Google may be looking at 7000 sites, but only a small percentage of those sites consistently produce original reporting. If, say, one-third of the sites carry the same Xinhua story, then Xinhua’s numbers will obviously go up.

    Reuters is considered a premier, perhaps the premier, news agency. It has a global presence and reach, an excellent reputation in the industry, and generates thousands of stories daily that are, in turn carried by thousands of sources. Other news agencies have a similar presence. This means that in many cases other news sources with fewer resources will not generate their own content about an event but will simply use Reuters or a competitor.

    In other words, there are more Reuters and Xinhua hits because there’s a lot of Reuters and Xinhua copy to hit. Understanding how the numbers fall out after that — e.g., how the VOA gets in there — seems to me dependent on understanding what Google’s aggregator is really doing.

  9. Bill, I can understand why Reuters ranks high (disclaimer: I’m a former Reuters executive): because Reuters news is available on the hundreds of news Web sites in major countries. However, the places where reports from the China’s Xinhua news agency are prominently used are mainly in Third World countries (Paraguay, Myamar, Sudan, etc.) that have few news Web sites (indeed, few Web sites of any kind). This should logically mean that Xinhua’s news should not rank highly in Google News’ source polling.

    For example, at this moment (17:49 GMT on 3 August 2004) the lead story on Google News is a Xinhua story datelined Washington, D.C., and headlined ‘Information leading to US terror alert was years old: reports’ ( Is it really logical that the most authoritative source of that story in the U.S. capitol is the Red Chinese news agency? Likewise, is it logical that this Xinhua story might be the most linked-to story that Google can find about that topic, particularly when that story is being prominently reported by all of the major U.S. news sources today?

    No, something is amiss in Google News’ algorithms. The same few sources among 7,000 tend to be used, and a some of those few sources are rather curious choices.

  10. Vin, I agree that something is amiss with Google’s algorithms. I didn’t mean to suggest otherwise.

    Logically, news sources that produce more copy carried on more sites (I overlooked that factor) ought to produce higher numbers. So, why VOA? Why not AFP? As you ask, why Xinhua? (Is Google trolling hundreds of domestic Chinese sites? If memory serves, the domestic Xinhua feed is not the same as Xinhua’s international feed.)

    Google’s tendency to highlight less-than-authoritative sources is the reason I don’t use it. It would be nice if Google would explain how stories are selected and ranked, but even better would be enabling me to set filters based on source categories, geography, ownership, affiliation, etc.

  11. Google news – some of the news that’s fit to read

    Digital Deliverance: From More Than 7,000 Sources, Just a Dozen Account for Most Google News Stories?…

  12. Doesn’t Google News index 4,500 sources, not 7,000 as mentioned here? Quote “Search and browse 4,500 news sources updated continuously.”

  13. Philipp, quite correct. I’ve picked up the ‘7,000’ figure from discussions and postings elsewhere this past week about MSN Newsbot and Topix. I’ve corrected the figure.

  14. Frequency of Source Appearance in Google News

    Digital Deliverance looked at how often sources appear on the front page of Google News. Very interesting breakdown….

  15. I suspect three things are happening.

    First, some formats are more PageRank friendly than others. If stories are passed along with a link to the original source embedded, then a given article and it’s originator are more likely to be cited in a way that Google can understand.

    Second, as PageRank friendly as your formats are, they are multiplied by the size of your syndication network. The Washington Post has content syndicated in hundreds of other newspapers and over wire services, for example. Same with VOA and Xinua. They aren’t just a publication but a distribution network too.

    Last, I don’t know if Google News looks at traffic. Probably not; it’s computationally expensive. However… If it does measure traffic to articles or news sites (perhaps via advertising or other external stats), you’d expect online traffic to be affected by other broadcast media. Of the top 20, how many are affiliated with television or radio networks? How many drive traffic via email bulletins or with journalists making guest appearances on radio and television news shows? Again, this would be a case of the 1% of the power curve getting the bigggest share of the audience.

    What’s interesting to me is that Google News manages not to be completely dominated by the power law, since half of the sources aren’t in the top 20.

    Also, you may want to investigate how the sources cited follow the sun. I’ve observed that European sources dominate for 8 hours when Europe wakes up, China and India dominate when they wake up, and the US dominates when it leads up. No hard evidence, just the fruit of insomnia.

    Great work, by the way. Thanks!

  16. Google is also a very conservative Republican outlet. Conservative stories linger, Liberal stories disappear. They also put up pictures of president Bush or his bunch next to stories that have nothing to do with them.

  17. Interesting – Igor’s comment about Google’s being “very conservative”. I’ve thought the opposite, on experience [before seeing your stats]. Those top media names are hyper-liberal in my opinion. If they want a balanced slant — FOX News, now off the list, should appear more often…..!

  18. Does Google News Have A Conservative Bias?

    Balancing Act: How News Portals Serve Up Political Stories Source: Online Journalism Review JD Lasica takes a look political coverage at Google News and Yahoo News. From the article, “Google News uses computer algorithms to identify top stories while Y…

Comments are closed.