XMLgawk: adding XML capabilities to gawk

What is gawk?

AWK is a simple, yet powerful language for processing text data line by line. Gawk is an extended GNU implementation of AWK.

The behaviour of gawk is to read and process input text files, one record at a time. A record is, by default, a line of text.

What is XMLgawk?

XMLgawk is an enhanced version of gawk that can process XML files as well as text files. XMLgawk uses the Expat parser to read input XML files. Each token parsed by Expat is passed to the user's code as an input record.

XMLgawk consist of two different parts. The XMLgawk core is a set of patches to gawk sources for interfacing with Expat. The other part is a set of awk code libraries that facilitates XML token processing.

XMLgawk is being developed by Jürgen Kars (core) and Stefan Tramm (libraries)

Using XMLgawk

XMLgawk is used exactly the same way as gawk. It's backward compatible with gawk, so the enhanced version can be used for regular gawk processing of text files.

To process a XML input file the new predefined variable XMLMODE must be set to "1". Doing so, the next processed file will be parsed as XML instead of plain text. XMLMODE is usually set in a BEGIN clause. To learn more about XMLgawk usage, please read the supplied documentation.

Getting XMLgawk

XMLgawk is free software, distributed under the GPL license. It can be downloaded from the developers' websites:

The main XMLgawk distribution is just a patch to the original gawk sources. Building and installing XMLgawk implies getting the orginal gawk sources and the patch, applying the patch, and then following the same procedure as the original gawk to compile the patched sources. In addition, XMLgawk requires both Expat and iconv libraries. They must be installed in advance. Lucky users can find them already installed in their systems, mostly iconv.

Some prebuilt binary distributions are also available. This site hosts binaries for Windows, based on DJGPP, Cygwin and MinGW. There are distribution archives for three different installation types:

Download the appropriate archive, depending on your platform and preference:

Detailed instructions for building XMLgawk on Windows are also available.

Getting help

Depending on the specific topic, questions related to XMLgawk can be posted on these newsgroups:

Please avoid cross-posting whenever possible. Please put a clear identifier (like [XMLgawk]) at the beginning of the subject line.

Other XMLgawk stuff

I've written some additional awk libraries to handle XML stuff:

This code is still experimental.


Copyright © 2004 Manuel Collado: mcollado@fi.upm.es
Last updated: 2004-11-08