Decrypting The Google Books Project

> Big Brother G. is watching you

Last week was broadcast a French / German documentary about Google Books entitled “Le Livre selon Google” and dedicated to analyzing the pros and cons of the massive digitization of books from all over the world started by the Big G over ten years ago. As this short movie truly caught my attention, here I offer my own recap of it, giving you guidelines to understand the stakes of this fascinating topic.

Le Livre selon Google” is fully available on YouTube :
> <

PS: I open a new TOPIC category called “Digital Culture” to host posts about various aspects of the impact of the Internet: sociology, technological evolutions, legal issues, etc…

1. Context

In 2002 Google undertook scanning the whole of world literature. And to get started on this initiative worthy of science-fiction, the Californian firm signed agreements with prestigious universities’ libraries such as Harvard or Oxford. More than 10 million books have been scanned to become numeric files inside the gigantic memory of Google…

This “Google Books” project aroused quite strong differences of opinion among concerned stakeholders – the Big G versus small books writers and publishers. You will see that this is not only about money, but also about more sociological issues such as the knowledge of humanity or even our private lives…

2. The “World Brain” myth


  • Gathering world knowledge: we can think Google is demagogue here, but it “only” aims at making the oldest dream of men come true – absolute knowledge, the memory of the world, what was evoked by H.G. Wells as the “World Brain”. This was attempted several times in human history: the library of Alexandria in Egypt, the Encyclopedia of Diderot, etc… Nowadays men get the technology to make this ambition concrete thanks to the power of computers that allows recording and managing all this huge amount of data. Indeed Google created a cheaper, faster way of digitalizing books without damaging them.

  • Improving Google Search Engine: this work of scanning books is not free, as Google has to spend between 30and 100$ per book. And the reason Google does it for free for universities and cie is because the data collected will be used to improve its main activity, its powerful Search Engine. Indeed having billions of books in hundreds of languages in its database will help improve the semantic aspect of its SE. And this could undeniably become a huge advantage compared to its competitors!

  • Developing AI – Artificial Intelligence: men always have had the dream of Artificial Intelligence, and with this project Google aims at making computers understand human language by allowing them to understand humor or irony, poetry, vocabulary shades, etc… TO give you an idea, in 2011 a computer named Watson won a Jeopardy game against two human players. In other words, Watson was able to understand the questions, to fetch and sort information on the Internet, and give a synthetic answer in human language – wow! Yet Google gives a slight difference of sense by saying Google Books is about “assisted” intelligence, not “artificial” – computers cannot have their own thoughts.


  • Instant access to data: books become available anywhere, anytime and for anyone thanks to the efficient netting of the Internet.

  • Longevity of books: we suppress the risk of losing priceless, unique ancient works, we protect them from natural disasters such as fire or floods, and we guarantee the durability of books that do not longer exist or that are not edited anymore.

  • Improved efficiency: the Google Books project would also set up an effective management of all this literary data, allowing it to be used to rapidly get information needed and so to improve productivity in specific fields.


  • “Knowledge is power”: the fear is that the monopoly of Google upon the whole of works written by men since prehistory could lead to an overwhelming domination of the firm. Let’s note that Cesar had the library of Alexandria destroyed by fearing it would give too much power to Egyptian monarchy! A single competitor being allowed to using such a massive quantity of data could be considered as a knowledge hostage-taking.

  • Commercial use: what if Google started asking for money to access these digitalized books? Indeed if the Big G had the monopoly of such an ambition, it could decide to make us pay of this content… But for now the firm makes copyright-free works available, and only pieces of the ones under copyright. Google has no right to sell any book in digital format, and its main earnings lay on advertising (80%).

  • Reworking information: this fear is not recent and appeared with the rise of digital technologies. We can refer to George Orwell’s “1984” where articles of The New-York Times are revised before being archived, or more recently in “La Ballade de Lila K” by Blandine Le Callet where in the 22th century books are forbidden by the government and parts of past newspapers are erased before digitalization and redistribution. For sure if Google is allowed to scan and give access to books, some laws might be created to prevent it from making its own censorship, which could be quite dangerous…

  • Confidentiality: by being the platform where users would access books, Google could know who reads what, for how long, etc… As usual with Google the respect of private life is a sensitive topic, and we don’t know what use they could make of such information!


Technology gives birth to many hopes as well as fears, from utopia to dystopia. We might keep in mind that the stake of huge digital projects such as Google Books lays in who has the hand on the data in question, and how they use it… Let’s not fall into “numerical paranoia”, but instead let’s remain aware of these issues and try to find a coherent line of thoughts. Technology might be used as a tool, it should never be considered as an end in itself!

3. The issue of copyright

In 2005 the Authors Guild of America along with the Association of American Publishers decided to sue Google for digitalizing books under copyright and for distributing extracts of them online. In 2008, court’s decision was in favor of Google’s virtual library project, giving it monopoly with what was called the “Google Books Settlement”. But writers and publishers remained fighting against the project, and after a long legal fight rallying people from different countries, American justice finally decided in 2011 to declare the previous statement null and void.

Indeed Google had started digitalizing books that were under copyright, putting forward the notion of “fair use” – if the firm makes not commercial use of these contents, then its action is fully lawful. What’s more Google claimed not publishing more than short extracts, giving a link to get to buy the book in question online. But here writers disagree by saying extracts lead to a loss of context: isolated fragments do not have the same meaning as a whole work, so Google might not publish more than 2-3 lines snippets.

Moreover, the 2008 statement gave the Big G the right of selling books that were not longer published but still under copyright (“orphan books”): this was the birth of the unapproved monopoly that raised anger among writers and publishers.

The question is: why would justice make an exception on copyrights issue for such a powerful firm? This infringement of copyright would indeed give a wrong message for smaller books stakeholders that don’t have enough money and/or power to fight against Google… The canceling of the Google Books Settlement was decided to avoid giving Google a dangerous competitive advantage.

And today? Google Books is the largest literal corpus in the world, with in 2012 more than 20 millions of works in 400 different languages available online. And most of these books are available to be bought on the Google PlayStore and to be read on your mobile devices – “the world’s largest eBook store”.


Copyright remains a sensitive topic today in cyber-culture: for sure Internet often leads to praising free contents – whence the problem of illegal downloading (music, movies…) that is far from being solved. We still have to define limits as well as reach of copyrights on the Internet. That way the future of numerical books is not clear yet, asking many questions about the consideration of writers, publishers and even readers…

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s