CAMiLEON home page About CAMiLEON BBC Domesday CAMiLEON Reports Preservation Research



Implementing an SVG input module

The vector graphic Migration on Request tool currently supports the input of Draw and WMF formats, and the output of Draw, WMF and SVG formats. An SVG input module would allow more reversible migration tests to be performed.


What is needed?

There are three targets to meet in order to integrate an SVG input module into the Migration on Request tool.

  • a) Parsing the SVG/XML.

    Of course this applies to any application wishing to use SVG. Routines for doing this already exist, although they probably do not follow a C-- approach. We must be wary of reinventing the wheel.

    This is a labour intensive problem, and is discussed below.

  • b) Taking the parsed XML structure and extracting the content, storing it in the portable internal structure.

    This is where the main effort should be put, since this is specific to the MoR tool.

  • c) Providing backwards compatibility from SVG to Draw and WMF.

    The SVG format includes a lot of features that the others do not, such as opacity, animation, filters, masking, line markers, quadratic beziers, graduated fills, styles, named colours...

    Some of these, such as named colours, styles, etc, would be easy to convert. Others such as masking would be much more difficult.

    It might not be necessary to convert all the features from a new and complex format to an old and simple format, depending on how the MoR tool is to be used. If only SVG files containing simple features are to be read in, there is little point implementing the complicated ones. This circumstance would occur if the SVG input is only to be used for a reversible migration test of original WMF or Draw files via an SVG migration step.

    It is also worth considering the Mobile SVG Profiles: SVG Tiny and SVG Basic. These introduce constraints on content, attribute types, and properties.


Parsing SVG documents

According to the W3C documentation: "In a Conforming SVG Interpreter, the XML parser must be able to parse and process all XML constructs ... complete support for the XML 1.0 specification ... complete support for inclusion of non-SVG namespaces within SVG content".

In order to provide a thorough and correct implementation of SVG input, an XML parser is required. This would have to be written with software longevity in mind; for example using C--.

There are a number of approaches that could be taken:

  • Use an existing XML parser, such as libxml.

    The disadvantage of this is that the code may not be written for longevity. The advantage is that little work would be required in the short term. Future work or bug fixes, would be carried out by the open source community.

  • Take an existing XML parser and alter it to follow software longevity guidelines.

    Depending on the original source, this could be a lot of work. When the original is updated (or bug-fixed) these changes would have to be reflected in the altered version.

  • Write an XML parser from scratch.

    This would allow the greatest control over the code, but would require a lot of initial work. The responsibility of maintaining the code as XML develops has to be considered.


Using/adapting an existing XML parser

Using an existing library might be possible although this would sacrifice the C-- aspects. Libxml, the gnu/gnome tool, is open source and written in C, which makes it suitable.

If an existing library is used it could be hidden behind wrapper functions, so that it could be replaced with a home-grown C-- version at a later date without having to modify any input modules.


Implementing and using an SVG parser

A proper XML parser would be a large amount of work, if implemented from scratch. It would, however, be useful in the future if other XML formats are to be incorporated into the tool.

Most of the structure of an SVG file is reasonably accessable. If the tool were limited to interpreting simple files with simple methods of styling (ie, specified as attributes of each element) there would be few problems.

However, there are more complicated issues to consider:

  • Non-standalone documents

    eg. referring to external files or objects -- which may be stored locally with the main document, or externally (eg on a web site).

  • CSS

    Again this can be embedded in the main document or in a separate file. Code to parse the CSS, cascade it, and so on, would be required.

          <defs>
            <style type="text/css"><![CDATA[
              rect {
                fill: red;
                stroke: blue;
                stroke-width: 3
              }
            ]]></style>
          </defs>
          <rect x="200" y="100" width="600" height="300"/>
    
  • XSL

    This allows style sheets to be created, extended, modified.

          <!-- Add styling to all 'rect' elements -->
          <xsl:template match="rect">
            <xsl:copy>
              <xsl:copy-of select="@*"/>
              <xsl:attribute name="fill">red</xsl:attribute>
              <xsl:attribute name="stroke">blue</xsl:attribute>
              <xsl:attribute name="stroke-width">3</xsl:attribute>
            </xsl:copy>
          </xsl:template>
        
          <!-- default is to copy input element -->
          <xsl:template match="*|@*|text()">
            <xsl:copy>
              <xsl:apply-templates select="*|@*|text()"/>
            </xsl:copy>
          </xsl:template>
    

Storing this sort of information in a portable, cross-format, preservable way, would require a lot of work. It was easier to lose some features of WMFs (such as the create/select/delete object model for attributes), and a similar approach could be taken with these tricky SVG elements. It is not an impossible task to build into the tool a 'definitions' structure, which can represent rules and instructions such as those above. With enough detail this might allow reversible migration.


A simplified SVG parser

Looking at the example files provided with the SVG recommendation documents, such as this one:

<?xml version="1.0" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 20010904//EN" 
  "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd">
<svg width="12cm" height="4cm" viewBox="0 0 1200 400"
     xmlns="http://www.w3.org/2000/svg">
  <desc>Example rect01 - rectangle with sharp corners</desc>

  <!-- Show outline of canvas using 'rect' element -->
  <rect x="1" y="1" width="1198" height="398"
        fill="none" stroke="blue" stroke-width="2"/>

  <rect x="400" y="100" width="400" height="200"
        fill="yellow" stroke="navy" stroke-width="10"  />
</svg>

it seems that a 'simplified' interpreter could be written. This wouldn't use the DTD to interpet the file, instead the SVG definition would be hardwired into the code. This is obviously much more limiting, but would be easier in the short term.

Many SVG files may be unreadable using a simplified interpreter. An examination of SVG files exported from applications in use (such as Corel Draw or Adobe Illustrator) could identify commonly used features of the format. This could be used to specify what should be supported by the simplified interpreter. Additional features could be added in the future.


Conclusion

Implementing an SVG input module would be possible, but labour intensive. A specialised SVG interpreter may be easier to develop in the short term, but a full XML parser would be more useful for implementing additional formats and migration tools. However, the time spent in development could be saved by using an existing open source XML parser.


Further reading

IndexReturn to index

Back to the vector graphic toolBack to the vector graphic tool


About CAMiLEON BBC Domesday CAMiLEON Reports Preservation Research
CAMiLEON home page