From EduTechWiki - Reading time: 14 min
ePub is a popular open e-book standard.
“".epub" is the file extension of an XML format for reflowable digital books and publications. ".epub" is composed of three open standards, the Open Publication Structure (OPS), Open Packaging Format (OPF) and Open Container Format (OCF), produced by the IDPF.” (dipf, retrieved 22:38, 26 February 2009 (UTC))
Notice: Contents of this page should be updated to Epub 3.x. Some software links also may be missing - Daniel K. Schneider (talk) 22:31, 27 September 2015 (CEST)
See also:
ePub can be authored and read with an increasing set of software. Since it is an open standard, it does have support from various vendors and publishers (see e.g. Tim O'Reilly Unplugged: The Kindle 2 And Transforming Industries).
Versions:
This article describes mostly Epub 2.0
ePub formats are defined with Relax NG but rely on other standards too. Ebup 2.x was finalised in 2010 and should work with any reader (even an "old" one). Since 2011, the recommended standard is ePub 3.x.
The ePub 2.0 Specification comes in three standards that cover two parts:
ePub contents may be DRM controlled, but must not ...
ePub 2.01 uses:
The *.epub zip file by example:
If we create an e-pub version of this page we get a file called xxx.epub. This *.epub file is an OCF zip file. Here is the structure:
That kind of packaging structure follows quite a similar philosophy as the IMS Content Packaging standard. I.e. a zip file includes a central xml file (content.opf) that includes the definition of organization (the "spine") and the metadata. It then includes all the assets needed for rendering.
The mime-type, i.e. contents of the Mimetype file is application/epub+zip.
Let us now describe some of its files:
File content.opf describes and organizes the various content elements of the epub package. It also provides metadata about the publication, fallback mechanisms when unsupported extensions are used, and a table of contents.
E.g. an example made with an automatic online converter for this page looks like this:
<?xml version='1.0' encoding='UTF-8'?>
<package xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns="http://www.idpf.org/2007/opf" version="2.0" unique-identifier="bookid">
<metadata>
<dc:title>EPub</dc:title>
<dc:identifier id="bookid">web2fb2_200904221954_3347363837</dc:identifier>
<dc:language>En</dc:language>
<dc:creator>Daniel K. Schneider</dc:creator>
<dc:type>reference</dc:type>
</metadata>
<manifest>
<item id="css" href="style.css" media-type="text/css"/>
<item id="content1" href="content1.xhtml" media-type="application/xhtml+xml"/>
<item id="i0ced13f269" href="i0ced13f269" media-type="image/png"/>
<item id="ib166f0f69c" href="ib166f0f69c" media-type="image/png"/>
<item id="i7fa52f212a" href="i7fa52f212a" media-type="image/png"/>
<item id="i33954a4ae2" href="i33954a4ae2" media-type="image/png"/>
<item id="i8be224f209" href="i8be224f209" media-type="image/png"/>
</manifest>
<spine>
<itemref idref="content1"/>
</spine>
</package>
This manifest (like in IMS content packaging) must include all files that are part of the publication in any order. It must have a structure like this according to the specification. I.e. each item must have an id, an href to a resource and a media-type. In addition, one can define fall-back elements.
<manifest>
<item id="intro" href="introduction.html"
media-type="application/xhtml+xml" />
<item id="c1" href="chapter-1.html"
media-type="application/xhtml+xml" />
<item id="c2" href="chapter-2.html"
media-type=application/xhtml+xml" />
<item id="toc" href="contents.xml"
media-type="application/xhtml+xml"
fallback="fall1" />
<item id="oview" href="arch.png"
media-type="image/png" />
<item id="fall1" fallback="fall2"
href="SomeDoc.pdf"
media-type="application/pdf" />
</manifest>
“Following manifest, there must be one and only one spine element, which contains one or more itemref elements. Each itemref references an OPS Content Document designated in the manifest. The order of the itemref elements organizes the associated OPS Content Documents into the linear reading order of the publication.” (Open Packaging Format (OPF) 2.0 specification). This spine (one could translate this to "parts") can include three different kinds of files:
Spine elements refer to resources defined in the manifest and may include a table of contents. E.g. a simple example would look like this:
<manifest>
<item id="intro"
href="intro.html"
media-type="application/xhtml+xml" />
<item id="chap1"
href="chap1.html"
media-type="application/xhtml+xml" />
<item id="chap2"
href="chap2.dtb"
media-type="application/x-dtbook+xml" />
<item id="chap3"
href="chap3.html"
media-type="application/xhtml+xml" />
<item id="f1"
href="fig1.jpg"
media-type="image/jpeg" />
<!-- ...... other multimedia assets here .... -->
<item id="toc_item"
href="toc.ncx"
media-type="application/x-dtbncx+xml" />
</manifest>
<spine toc="toc_item">
<itemref idref="intro" />
<itemref idref="chap1" />
<itemref idref="chap2" />
<itemref idref="chap3" />
</spine>
The metadata are defined using Dublin Core plus possible user-defined tags. Some of these metadata are mandatory, i.e. title, identifier and language.
The XHTML files can include various formats, e.g. binary pictures, SVG and in-line XML. All these formats can be style with a subset of CSS2.
This Open Packaging Format (OPF) 2.0 v0.9871.0 is defined as a relax NG schema.
All valid OCF Containers must include a directory called META-INF at the root level of the container file system. This directory contains the files specified below that describe the contents, metadata, signatures, encryption, rights and other information about the contained publication. (OCF 1.0 specification, retrieved 19:17, 22 April 2009 (UTC)).
The container.xml file describes in a simple case where to find the content.opf file. In our simple example it looks like this:
<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
<rootfiles>
<rootfile full-path="OEBPS/content.opf" media-type="application/oebps-package+xml"/>
</rootfiles>
</container>
An other example taken from the OCF 1.0 specification show that one could include an alternative PDF file for example:
<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
<rootfiles>
<rootfile full-path="OEBPS/My Crazy Life.opf"
media-type="application/oebps-package+xml" />
<rootfile full-path="PDF/My Crazy Life.pdf"
media-type="application/pdf" />
</rootfiles>
</container>
In addition to container.xml, there can be five other files:
The formal specification of these files in META-INF is done with a little Relax NG schema:
OPS uses a set of XHTML modules with some additional restrictions. E.g. OPS is always XHTML compatible, but no the other way round.
| XHTML 1.1 Module Name | Elements (non-normative) |
|---|---|
| Structure | body, head, html, title |
| Text | abbr, acronym, address, blockquote, br, cite, code, dfn, div, em, h1, h2, h3, h4, h5, h6, kbd, p, pre, q, samp, span, strong, var |
| Hypertext | a |
| List | dl, dt, dd, ol, ul, li |
| Object | object, param |
| Presentation | b, big, hr, i, small, sub, sup, tt |
| Edit | del, ins |
| Bidirectional Text | bdo |
| Table | caption, col, colgroup, table, tbody, td, tfoot, th, thead, tr |
| Image | img |
| Client-Side Image Map | area, map |
| Meta-Information | meta |
| Style Sheet | style |
| Style Attribute (deprecated) | style attribute |
| Link | link |
| Base | base |
Remark: EPub also can use the DTBook (DAISY/NISO standard) for markup.
Readers must support SVG 1.1. SVG animation and scripting features are not supported and must not be used by publication authors; a Reading System should not render such content. CSS styling of SVG must be fully supported.
SVG content can be used from XHML img and object elements but also within XHTML (probably in the standard way with namespaces).
You also may use your own XML both inline within XHTML and out-of-line as documents. Both can have fallback options (to be used when the contents can't be rendered by a client).
OPS Style sheets are CSS2 styles in the XML tradition, i.e. selectors and attribute names are case sensitive. Again, like for XHTML, there are some restrictions.
Epub 3.0 was introduced in 2011 and in 2014 version 3.1 was published.
According to the official manual, EPUB 3's base content format is now based on the XML serialization of HTML5 (XHTML5) [ContentDocs30], whereas EPUB2 supported two basic content types: a profile of XHTML 1.1 and DTBook [OPS2] (a semantically-enhanced markup focused on accessibility concerns) [...] the EPUB 3 XHTML Content Document definition includes both extensions to and restrictions on its HTML5 base.
In summary:
(1) One solution for authoring new e-pub contents is using an XHTML authoring tool. To create the epub archive, there are two options so far:
(2) An other solution is use an Epub editor (i.e. Sigil, introduced below)
(3) Many authors probably use a combination. Produce contents with any sort of XHTML editor and the use an authoring tool to fine tune the contents.
There exist several tools that can convert from one format to another. Mileage varies according to input. E.g. PDF is more difficult to convert than XHTML for example,
Most readers support several e-book formats and several support ePub. See also: EPUB at mobileread.com and Wikipedia's Comparison of e-book formats.
There exist several ePub capable readers. Here are just some of these:
E-books make most sense when read on specialized hardware. Several brands can read ePub documents, e.g.
See e-book.
Read Digital Editions Help first, there may be a solution without removing the DRM that fits your needs.
While I understand that editors want to sell books (as opposed to see just few persons downloading and sharing them), I still argue against DRM. I would not oppose watermarking, e.g. with a (verified) real name and email address.
- Daniel K. Schneider (talk) 17:43, 30 June 2014 (CEST)
Therefore, removing the DRM is just fine (and legal in most countries), unless you redistribute cracked books (which I don't do)
Solutions (careful ! Some of the software may be dangerous)
EPUB DRM Removal - Download] tool (not tested)
- Daniel K. Schneider 19:17, 22 April 2009. Therefore this section does need to upgraded !!
I tried to convert mediawiki contents, i.e. what could be called a wiki book.
tidy -o flash_tutorials.xhtml -asxhtml some_flash_tutorials.html
The result was sort of ok, a "442 pages" 8MB file
<div class="thumb tleft">
<div class="thumbinner" style="width: 182px;">
<a href="http://edutechwiki.unige.ch/en/Image:Flash-cs3-tools-panel-items.png"
class="image" title="Items of the Flash CS3 tools panel">
<img height="480" border="0" width="180"
alt="Items of the Flash CS3 tools panel"
src="flash_tutorials_files/180px-Flash-cs3-tools-panel-items.png"
class="thumbimage"/></a>
<div class="thumbcaption">
<div class="magnify">
<a href="http://edutechwiki.unige.ch/en/Image:Flash-cs3-tools-panel-items.png"
class="internal" title="Enlarge"/></div>
Items of the Flash CS3 tools panel</div>
</div>
</div>
Basically, I'd have to clean up the XHTML to get a better result, i.e. remove some wiki things that are really not needed.
I then also tested eCub.
The result was a also a 8.7MB file. This software is a bit easier to use and results were slightly better, but the links problem was the same. But it can't split a file into chapters, i.e. create a table of contents for a single big HTML file. Therefore, one has to import xhtml files one by one in order to get a chapter structure.
Conclusion: It is possible to create rather large e-books from mediawiki pages. But for quality results there is manual work to be done (or filtering script writing).