xHTMLaTeX - LaTeX printing of XHTML documents

(towards an XHTML based document preparation system)


What is xHTMLaTeX?

The main purpose of xHTMLaTeX is to use XHTML as a document authoring language, and automatically convert these documents to LaTeX in order to produce high quality printouts. It can be seen as a simpler alternative to the DocBook markup+stylesheet system.

Should I use xHTMLaTeX?

xHTMLaTeX is just the base of a simple document preparation system. It doesn't try to compete with widely available ones, like MS-Word, OpenOffice, etc. Its target is rather narrow. xHTMLaTeX is for you if:

What xHTMLaTeX is not

The base of xHTMLaTeX is a converter from XHTML to LaTeX. But this converter is just an internal tool, not intended to be used by itself. So it will mostly fail if used to:

Overview

A Document Preparation System, also called Word Processor, is a tool that helps creating documents by editing their contents and generating a human intelligible version of them, usually printed. So the main concerns of such a system are:

Separating contents and presentation

Primitive word processors were just text formatters. Documents were encoded as text fragments interspersed with formatting marks: line breaks, indentation, bolding, font size, etc. This approach has evolved, and the current recommendation is to represent documents by just using a structural or semantic markup. The formatting process is based on a style sheet, that associates specific typographical styles to specific structural marks.

Separating contents and presentation has evident advantages:

In practice, most word processors use a mix of structural and formatting markup. It is difficult to find tools that fully separate form and contents. Naive users of widely used systems (MsWord, OpenOffice, web page creators, etc.) frequently use presentational markup instead of structural or semantic markup, thus destroying some advantages of using a modern computer-based document preparation system.

Document representation

xHTMLaTeX really tries to separate the representation of the document contents from its formatting style. To achive this, the document representation is based on the strict XHTML markup. This markup can be seen as a revised version of HTML 4.0, without style marks like font, color, borders, text styles, etc. Some residual style markup, like the 'style' attribute, is further ignored by the XHTML to LaTeX converter.

An additional advantage of this approach is that the created documents are themselves web pages. They can be directly published in the web, with the help of an appropriate CSS stylesheet.

The xHTMLaTeX toolchain

The xHTMLaTeX toolchain

The figure shows the overall organization of xHTMLaTeX. Documents can be edited with any XHTML editor, and later converted to PDF by a series of steps. The specific tools of xHTMLaTeX are the xhtml2latex converter and the LaTeX style files that emulate some non trivial HTML marks.

Installing xHTMLaTeX

Requisitos

Descarga

Instalación Windows/Linux

Document preparation

Lenguaje de marcado

Editores XML

Editores HTML

Validación

Converting to LaTeX

xhtml2latex

Selección de clase de documento

Correspondencia de marcas

Generating PDF

XeLaTeX

Paquetes requeridos

Resolución de referencias

Customizing the output style

Parámetros de estilo

Interfaz gráfica

Código LaTeX de usuario

Importing HTML content

Convertir HTML->XHTML

Eliminar extensiones Microsoft

Eliminar marcado de estilo

Reparar marcado no válido


Copyright © 2010 Manuel Collado