Making Books Available

Its all over the web now – the Internet Archive has opened up over 1.6 million books for the OLPC XO laptops and in general, any machine running Sugar. Before going into anything else, it makes sense to provide a more specific meaning of “opening up” here – it involves two main objectives completed at the Internet Archive end:

  • Making sure that the books are readable in the XO, keeping in mind its relative low-end hardware specs and disk-space limitations
  • Ensuring that the books are available via a standardized catalog format, so that one can find, browse and download books easily using a tool more tuned for the purpose (think of feed-readers versus blog-entries in a web-page)

Now that the books are available (not just from the Internet Archive, but from a number of other sources as well), the next step is to figure out the best possible ways to actually make these books available to the XO and Sugar users. The major constraining factor is bandwidth, we do have deployments with zero, or very limited Internet connectivity, and perhaps these are the deployments which need access to these books the most. I spent most of this week working on implementing a feature in the Get Books activity which would allow books to be distributed via what has been jokingly called a sneaker-net (or sandalnet/chappalnet, if you prefer those forms of footwear). The idea is very simple – at a centralized location with Internet access, choose a few thousand books (size of a typical book is usually a few hundred KB or less), put them in a USB pen-drive and add a OPDS catalog to the mix. Make copies of the drive, and send them to the schools without connectivity. The latest version of Get Books would recognize the drive, and let the student browse through the collection, search for books, and add whatever she wants to the Sugar Journal. Once a book is in the Journal, it can be shared among all the students using the Journal object transfer support in Sugar, or via the Read Activity directly. So essentially, you get a Library on a Stick, with thousands of books, something which, till now, in its physical form, has been largely restricted to better equipped (and usually richer) schools.
Of course, even larger collections can be distributed if a School Server (XS) is present in the mix (due to the fact that the school server can have a larger disk in it), and support for this type of distribution method involving the XS would hopefully appear within the next few releases of Get Books.

Braindump on ebooks

The inspiration for this post comes from a talk by Alan Kay, entitled Beyond the Printing Press: Computers as Learning Environments for All Children. You can view the video recording of the talk here


The development versions of Read Activity is now shipping with Epub support. This makes me excited for quite a few reason. Of course, the most obvious reason to get excited is the fast growth and adoption of Epub as a standard for e-books. However, there is more to it…
Books, once again (after Gutenberg’s time) are changing. Gutenberg brought in the transition from hand-written books to large-scale print – and now we see yet another shift, where books are transitioning from ink, paper and the printing press to bits stored inside a variety of devices. Towards the beginning of the printing press revolution, there was a strong desire and tendency to mimic the “old” format as much as possible, in terms of look and feel. Gutenberg and his associates even hand-drew illuminated decoration on the Gutenberg Bibles, to retain the similarity to the older, handwritten copies of the Bible. In what seems to be an almost eerie repetition, today, in the ebook, we see a strong desire to mimic the traditional book as much as possible. (eg ebook readers trying to retain the older “UI” paradigm, efforts to make ebooks retain the formatting niceties of traditional books, etc). This is not unusual, or wrong. We are used to the traditional book, and it is important to make the path to transition as smooth as possible.
However, what makes me really excited at this stage is something else. It is the potential new things we could do with Ebooks, things that would not have been possible with books in the old format. This weekend, I did some changes to a Epub file, and extended the Read Activity a bit to come up with a few such things:

  • Audio-visual content inside books: This is almost obvious – with the transition to books which are read on devices having audio/video capabilities, the next logical step is to embed these into books.



    (Video from the Internet Archive, text from Wikipedia)
  • An interactive shell inside a book: An interactive Python shell inside a book teaching Python, so that small examples and snippets can be tried out inside the book, right away.



    (Text from How to Think like a Computer Scientist, Python edition)
  • A full blown, interactive environment inside books: A book on digital logic can have a small sandboxing area, where readers could connect the various virtual components together, and see what happens.



    (Text from Wikipedia and the Lorem Ipsum generator, demo from the Etoys project)

Of course, this is just a proof of concept, and probably most Epub readers will simply ignore the interactive content part. Moreover, there may be security issues with such books as well (the idea of having a Python shell inside a book will make many nervous) – but I think this is where Bitfrost, and its software implementation, Rainbow (which is essentially an isolation shell) comes in.
There is another way of “interaction” which I have not covered in the above screencasts – and this is something which is already available in traditional ink and paper books, especially text-books. Ebooks need to support “exercises” like fill-in-the-blanks, multiple-choice-questions, etc. There is an urgent need to support this, and this should be done in a standardized way. The local storage standard associated with HTML5 seems to be a possible way forward, though probably there might be better ways to do this (especially if we want the ability to have teachers remotely check and evaluate exercises done on e-textbooks).

Read and Epub and beyond

For the past few weeks, I have been spending most of my time implementing Epub support for Sugar’s Read activity. Epub is gaining increasing acceptance, and a few weeks back, Project Gutenberg started distributing many of their material in the format, and Google + Sony also seem to have started to distribute a large chunk of public domain books as Epubs.

Today I finally reached the stage where the work could be tested on an actual XO, and here’s how it looks:
Read opening a Epub file on an XO

The rendering is done using WebkitGTK (the Python bindings) and I was a bit concerned about the possible performance issues on the XO-1 (which has a relatively ancient processor, slow filesystem access, only 256 MB of RAM and no swap). The biggest worry was the loading time – since it involves pre-rendering the entire book to gather metrics for pagination (most Epub books I have come across do not have clearly defined page-breaks, so that has to be figured out), but to my surprise (and relief) the load time turned out to be quite acceptable.

Right now, the viewer supports a very limited subset of the Epub standard (and works only with XHTML based Epubs), but so far it has managed to handle all the files I have tested it with. The viewer is a standalone widget used by the , which should make it possible reuse the work to develop a Epub reader for GNOME as well.

Once the Epub support in Read reaches an acceptable state, the plan is to start working on implementing support for the draft Open Publication Distribution System specs, which allows ebook distributors to distribute e-books via XML catalogues. It makes sense to support this in Read, as well as in the school server, to ease the e-books distribution process. For example, if we have a large e-book collection for a particular deployment, it may not make sense to put all of them in individual laptops – instead allowing the user to browse/search the catalogue and download the books as and when required would probably be a better option.

Updates..

This blog has not seen much activity in a while, so here goes:

  • Bought a HCL touch-screen based netbook. It’s somewhat ancient hardware, but most of the stuff works out of the box (except for the webcam, which does not even show up in lshal or lsusb). The touchscreen required a binary driver – but a Free/Open Source version seems to exist, though I could not get to calibrate the screen with the FOSS driver variant
    [Update: The webcam works - I had to press Fn-F5 to enable it. It is turned off by default to conserve battery.]
  • Taught myself (this was long overdue – but at least now I can admit that I did not know what I used not to know) how to properly write Python extensions in C. I started out with bindings for Hunspell (I’m reading up a bit on morphology nowadays, and finding it to be tremendously entertaining). There was a Python extension for Hunspell already, but it did not compile for me, and that pushed me to decide to figure out how to do this myself. One thing led to another, and so, as of now, there is (in progress) extensions for handling:
    • Hunspell. Usage instructions here
    • libgettext-po. This should be faster than the existing pure Python based PO file parsers out there. (maybe at some point, I could make Pootle/Translate Toolkit use this, and make the work of OLPC/Sugarlabs translation team members somewhat less frustrating.
    • XKB. I must admit that I took a shortcut for this, and this extension is actually based on the awesome libxklavier. The final plan is to develop a Sugar extension for managing the keyboard options and layouts using this extension. The code in the main git repository, though fairly complete in terms of what is required for Sugar at the moment, is not implemented via (py)gobject. Implementing the pygobject-based wrapper is turning out to be a bit more complicated than I initially thought, but some code for that is also available in this repository (it is somewhat easier now, since I know (at least most of of) what is happening under the hood).
  • Released a newer version of the FBReader activity, which is much more improved in terms of usability (eg: response to the game keys keys while the XO-1 is in tablet mode is much more smoother, and all the keys do something useful). People seem to be happy with the new release.
  • Coming back to the present, right now, among other things, I’m working on a few interesting (and important) enhancements for the book-reader(s). Some of them include support for long keypresses (eg: pressing the “square” game key for two seconds will show the table of contents), notification of critical power events (I realized to my horror during dogfooding, that in tablet mode, while the book reader is open in full screen, there is no way to tell how much battery-charge is left), etc. The bookmark support feature that I came up with a few months back needs a bit of polish, but I think I can make this show up in the next release of Read.

16th February, 2009

  • Pootle migration: We are moving the OLPC/Sugarlabs Pootle instance to a newer dedicated server, which should speed it up considerably. This has also given me some opportunity to fine-tune and polish our l10n workflow – things should be a bit more easier and smoother (and faster) for translators. I also managed to gather some interesting data from the log and user registration files. It turns out that we have more than 1000 translators registered with the system, among whom about half have actively contributed translations in the past one year. I’m not sure what the user statistics for other Pootle installations are like, but it seems that we are one of the larger users of Pootle out there.
  • Read hacking: I have been also spending some time hacking on Read. While Mr Super Awesome Tomeu has been pushing our Evince patches upstream, I have been working on a few interesting features for Read (we have moved to Gitorious, which is so cool):
    • Support for books from the Universal Library: Many of the scanned childrens’ book from the Universal Library Project are too graphics heavy for the XO hardware to be handled in PDF form. However, it looks like the project also stores the book as zip files with each scanned page archived inside the zip file as individual jpegs – which in other words, is very similar to the comic book archive format which Evince (Read’s backend) supports quite nicely. More importantly, this format seems to have lesser performance issues on the XO hardware (compared to graphics heavy PDF files). So I have been making sure that Read also handles this format gracefully.
      Book from the Universal Library in Read
    • Bookmarks support:This has been one of the oft requested features for Read, apart from annotations. The original design specs for Read already provided me with ideas on how the UI should look like, so with some amount of coding, I have bookmark support which mostly works :-) . I am also trying to do the implementation in such a way so that it would be easy to add support for sharing of bookmarks later on in the future. If anyone is interested in doing a project, contact me (hint.. hint ;-) )
      Bookmarks in Read

    Code for the above lives in the sayamindu-sandbox branch of Read’s Git repository. I plan to take a stab at annotations during the next few weeks – I have some ideas which, with some luck, may work. I also have some plans about a saner full-screen/ebook mode for Read – let’s see if I get the time to implement those as well.

  • This came up in one of the mailing lists a few days back. Serves as a reminder as to why the work we all do is so relevant and so important.

14th October, 2008

  • There might be a Barcamp Kolkata soon:
    Barcamp Kolkata Logo
  • Got Table of Content support working in Read Activity
    ToC Support in Read
  • Wrote a small PDF viewer tool with support for the Journal which is then used by mozplugger to show PDF files within Browse. (You can put the file in your journal if you like it)
    PDF inside Browse
  • Infoslicer is awesome. Here’s a Youtube video demo of it.