An overview of

XML

<Value Pressure> 6.21535 </Value Pressure>

XML is becoming increasingly important in the world of Industrial Automation. It’s a “Perfect Storm” of functionality and requirements. At the same time as there are more and more intelligent devices capable of producing data there is increasing demand for that data in database, analytic, archiving and other IT-like applications.

This article describes eXtensible Markup Language (XML) and how and why it is being used to drive these factory floor data applications on the factory floor.

Let’s Go Back in Time…

In the earliest days of computing there was a computer war of sorts. There were two big gorillas duking it out; Intel and Motorola. The battleground was how to represent 16-bit data in a computer’s memory.

We’ve advanced pretty far since those early days but at that time there wasn’t any web, there weren’t any HMIs (Human Machine Interfaces) and there weren’t even very many types of input devices. In some of those early computers people would hand toggle programs into the computer by setting switches.

The move from 8-bit to 16-bit was momentous. A quantum leap in technology. But the two companies differed in how to represent that data in memory. 8-bit data was organized by 8 data lines. One line was the 1st bit and the eighth line was the 8th bit. Everybody agreed on that. But what to do about 16-bit data. Was the first group of 8 bits the high part of the 16-bit value or was the second group of 8 bits the high part of the 16-bit value? In the early days of microprocessors this was indeed momentous.

Of course, they couldn’t agree and the systems built with Motorola components used the first 8 bits as the high part of the 16 bit value while systems built with Intel used the second 8 bits. And as time went on, there were further disagreements about other data types, how many bits comprised a floating point number, how many bits in a real number and how to order ASCII characters in memory. The word “HELLO” is still encoded in some systems as E H L L _O, where underscore represents the ASCII Space character.

As the need to move data from one system to another grew, people started writing translators. If you knew how that system 1 treated floating points using 64-bits with this many bits for the mantissa and so on and that the other systems used 32-bits with its format you could write a translator. Expensive? Yes. Time consuming? Absolutely. Effective? Sort of. Efficient? No way.

Obviously that wasn’t going to work for very long. So the idea came into being that everybody recognizes ASCII characters. When I walk in the bank with my paycheck and it has the six characters “$10.27” the teller recognizes those characters and deposits my ten dollars and twenty-seven cents.

So the thought was to use ASCII as everybody understands ASCII characters. If we just send a stream of ASCII characters like this from one system to another everybody can understand what we are trying to communicate. A 64-bit floating point on one system with the value “125.8904” is sent as eight ASCII characters and properly stored by the receiving system as a 32-bit floating point, its native floating point format.

And the data language for sending these ASCII characters from one system to another became known as eXtensible Markup Language. And XML was born!

Why XML in Automation?

Let’s face it – we really wouldn’t choose to move XML data around the factory floor if we had our druthers. I know what you’re thinking. Can’t we find something to use other than ASCII?

Well, I agree with you. ASCII certainly wouldn’t be my first choice. It’s expensive for automation devices in lots of ways. Typically, automation applications are low cost. Like really low cost. I/O vendors fight each other over pennies. It’s ruthless and cutthroat. Sending ASCII data means that you have to have a whole bunch of RAM to hold all those ASCII strings. You can’t send them until you build them and when you build them you have to have a place to store them. And RAM costs money; parts, size and assembly. It’s not free.

And there’s the ancillary cost of code to generate the XML files and processing power to move all those ASCII characters around. It’s kind of a nightmare for an automation device. But yet I’m advising you that XML is what you should plan on using. Why is that?

It’s simple. Like it or not, XML is that standard used by the IT folks worldwide. And it’s the IT folks and their standard that are being pushed down to the factory floor. The IT people use XML because they don’t care about RAM. They don’t care about processor bandwidth. They just upgrade to another platform when one runs out of gas.

All their standard offerings from Microsoft, for example, like Word, Excel and the rest, are XML centric. Office Open XML (also informally known as OOXML or OpenXML) is an XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents. Starting with Microsoft Office 2007, the Office Open XML file formats have become the default target file format for all Microsoft Office products.

Since the great preponderance of Microsoft applications, database engines, and analytics programs use XML to send and receive data, we automation guys don’t really have a choice but to play ball with the big boys. Especially in an area where we have so much to gain.

So What is XML – Quick Overview for XML Phobics

XML is a metamarkup language. That means that data in an XML document is surrounded by text markup that assigns tags to the data values. Each data value together with its distinguishing tag name is an XML element, the basic, defining unit of an XML document. The entire collection of elements forms the XML document.

Unlike any number of other document standards an XML document has no specific set of required tags. Instead the tags are defined by the document creator. A Chemist may create XML Elements for chemical names while a Lawyer may create XML Elements relating to a court case. Creators of XML documents invent them as they need them.

While the names for XML Elements have few restrictions, XML documents are comprised of a very specific grammar. The grammar specifies where XML Elements can be placed, how child elements are specified, how child elements are associated with parent elements and how attributes are attached to elements. The grammar can be summarized like this:

  • XML documents must have a root element
  • XML elements must have a closing tag
  • XML tags are case sensitive
  • XML elements must be properly nested
  • XML attribute values must be quoted

XML documents have to be specified with enough precision to make it is possible to easily develop parsers that can interpret a standard well-formed XML document and render the data values conveyed in the document. A well-formed XML document is a document that meets the XML specification and can be interpreted by a parser. Documents that don’t meet the standard are rejected by parsers.

The XML Elements can be restricted to a pre-defined Element set if the document is part of an application. For example, chemists exchanging chemical formulae may predefine a specific set of tags that communicate chemical composition. Documents that have elements not associated with that particular application are not well-formed and would be rejected by the parsers used by the chemists in their application.

The markups (Elements) allowed in a particular application are defined in an XML Schema. A Schema defines all the valid elements of a document and allows a generic parser to determine if an XML document is well-formed for a particular application. A document can be well-formed for one application (chemist composition) and invalid for another application (court case).

XML is sometimes confused with HTML, the descriptive language used for displaying web pages. The two are related and at the surface appear very similar. Though they have a similar syntax, they each have extremely different purposes. An HTML document is always used to communicate how data items should be displayed. It is all about screen location, formatting and data presentation. XML is simply for moving data from one system to another. It communicates no information on how to display data.

XML is also sometimes thought of as a programming language. It is not. There is no XML compiler that can read an XML document and generate executable code. An XML document by itself does nothing.

XML is certainly not a database or a way to store data. An XML device can form a document and send data but that data is not stored unless either the sender or the receiver stores the data. A Meter that is monitoring energy usage can provide an XML document to a requester with the current energy data, but that data is lost when the next iteration of data is generated; unless the requester or the sender save each particular iteration of data.

Sometimes people think of XML as a communication protocol. It is not. A communication protocol is a specific set of characters that accomplish the movement of a series of data bytes from one system to another system. XML does not facilitate the transfer of information between two systems. Once there is a link with an appropriate communication protocol, an XML document can be sent across that link. XML is simply the content sent on that link and has nothing to do with the specifics of how those two systems manage moving content from the sender to the receiver.

Like what you’re reading?

Subscribe to our Automation Education email series to learn the ins and outs of the top industrial protocols in a byte-size weekly format!

How to use XML Files

XML documents are standard text documents that can be created and edited with any text editor or a word processing program like MSWORD. There are XML editors that understand the creation of a document but while helpful, they are not required to create a valid, well-formed XML document. These editors assist you by identifying invalid and improperly structured Elements.

Once an XML document exists, it can be transported in any number of ways from a sender to a receiver. In many cases, a receiver can trigger the transmission of an XML document by simply referencing a URL for the XML document. For example, in the RTA Modbus to XML Gateway, a set of Modbus registers is encoded as XML. The current values for those registers can be received by simply referencing the following URL where 192.168.0.10 is the current TCP/IP Address of the device: .

Typing that web page into a browser initiates transmission of the web page and display of the XML document in the browser as shown in Figure 1.

Instead of using your browser and manually logging the XML data values by hand, you could reference that URL (XML file) from a number of common applications. In Windows, you can use Microsoft Word or Excel to display data in a tabular format. Lots of other programs or even applications you develop can easily receive and process an XML document. It really is a universal way to exchange data.

Another mechanism often used to transfer XML documents is FTP (File Transfer Protocol). Some devices store a series of data files as XML in their local storage and make those documents available over FTP. FTP is both easy to use and commonly available on many different systems.

The Basics on XML for Newbies

XML Documents follow a very specific grammar. The basic unit of XML is an Element. An Element is formed by a start-tag, an ASCII string and an end-tag. All tags are enclosed in angle brackets like <…tag…>. End-tags signify that they are end-tags by preceding the tag name with a slash such as . A few well-formed XML elements follow:

<name> Emily Wild </name>
<sentence> Where is the family dog? </sentence>
<temperature> 22.53 </temperature>

All of these are well-formed XML elements with a start-tag, a value and an end tag. Note that the surrounding whitespace before, after and in between the words is part of the data field. The receiver can elect to trim leading, trailing or embedded whitespaces or keep them. In a very simple system, any one of these could be the entire XML Document transmission from a sender to a receiver. There is no requirement that a lot of data be transferred.

XML imposes no restrictions on the element names other than the obvious ones. Element names are case sensitive and you can’t use special characters or white space. In addition, you can’t start your element names with a number or the letters “xml”.

A very important feature of XML Elements is that they can embed other elements:

<oven_status>
<temperature>22.5</temperature>
<mode>Cooling</mode>
<error_code>0</error_code>
</oven_status>

By nesting elements you can create very powerful relationships and communicate a lot of tabular information.

All XML documents contain an optional XML Declaration and a required root element. The XML Declaration, if it is included, must be the first line of the XML file. The XML Declaration does nothing more than identify the file as an XML document and the version number of XML supported, the type of character encoding in the file and if it can be processed as a standalone document. The Declaration is required to follow a very specific format:

      <?xml version=”1.0” encoding=”ASCII” standalone=”yes”?>

The root element is “the parent” of all other elements in the document. You can think of an XML document as a tree of elements. The tree starts at the root element and branches to the lowest level of the tree. Any element can have sub elements so the tree is theoretically infinitely long.

<root>
<child>
<subchild>…..</subchild>
</child>
</root>

The terms parent, child, and sibling are used to describe the relationships between elements. Parent elements have children. Children on the same level are called siblings (brothers or sisters).

A concept that often stymies those new to XML is Attributes. Attributes are additional information that can be added to an element. Attributes use the same name/value pair that is used for Elements. Attributes are placed within the field of the start-tag for the Element. See Figure 2 for example of moving attributes.

Why do we need attributes you might ask? Isn’t the XML document in Figure 3 adequate?

Both accomplish the same thing, don’t they? The answer lies more in personal preferences than in anything else. Attributes or child-elements? It’s really just up to you and how you want to encode your data. There are few rules in XML.

Advanced XML Concepts

For those of you who really want to know more than basic XML there are few advanced concepts that you really need to understand.

Schemas

A receiver who sends an XML file without any additional information means that receiver must assume that every value is text. There is no way to know how to interpret and store a data value. Schemas solve this problem.

Schemas are the guidebook for an XML file. Just like a good guidebook on Madrid can tell you what you’re going to find when you get there, an XML Schema guides you through what you are going to find when you open the XML file.

Schemas describe all the parent-child element relationships and most importantly, the data types of all the elements. In most cases you are going to want to store the data for an XML element in the same data type that the creator of the file used. Using the schema you will know that the oven temperature is a Floating Point value while you’ll need a Double Integer to store the cycle count.

A Schema allows an XML file to be “validated”. Well-formedness is different from Validation. An XML file is well-formed if it meets all the basic XML rules for syntax. Validation meets that while it is not only well-formed, it meets the relationship and data type restrictions imposed by a Schema. An XML file can be well-formed but not valid.

There are many file formats used to implement Schemas. One of the most commonly used is the XML Schema Definition Language (XSD). The .XSD file for the RTA Modbus to XML gateway is listed in Figure 4.

This is a relatively simple Schema as XSD Schemas go. More sophisticated schemas can have annotations, define minimum numbers of occurrences of elements, group elements to form complex types and specify element sequences. Table 1 specifies some of these XSD characteristics.

Besides the XML Schema Definition Language described above there are two other commonly used Schema Definition Languages; Document Type Definitions (DTDs) and Relax NG.

Name Spaces

Unlike programming languages XML Schemas don’t have a lot of restricted keywords and don’t put a lot of restrictions on the element tags. Since names are defined by the developer of the XML file, element names can be confusing to a parser. For example:

<table>
<style>Wood Grain Table</style>
<size>8 Foot</size>
</table>
<table>
<tr>
<td>Manufacturing Cell 1</td>
<td>Manufacturing Cell 2</td>
<td>Manufacturing Cell 3</td>
</tr>
</table>

In two different applications these XML file fragments present no problem. But combining them into the same file would terribly confuse a parser. A way to solve this is to attach a prefix to the elements like this:

<e:table>
<e:style>Wood Grain Table< e:style>
<e:size>8 Foot< e:size>
<e:table>

<w:table>
<w:tr>
<w:td>Manufacturing Cell 1< w:td>
<w:td>Manufacturing Cell 2< w:td>
<w:td>Manufacturing Cell 3< w:td>
< w:tr>
< w:table>

A namespace defines the prefix. Namespaces are defined by an :

<w:table padding-left: 30px;”><table padding-left: 30px;”>XML Document without a CSS file: http://www.w3schools.com/xml/cd_catalog.xml
XML Document with a CSS file: http://www.w3schools.com/xml/cd_catalog_with_css.xml

Use a MS Office Program – It was a revelation to me a few years ago but Microsoft Office extensively uses XML to store documents. In fact, you can pretty easily reference an XML document in an Excel spreadsheet and have new row added to that spreadsheet at whatever data rate you want. It’s a really easy way of archiving data from an XML-enabled embedded device.

Load a Database – Many databases including SQL Server, Oracle and others are able to load XML documents. The specific procedures vary with the database but in general the database “triggers” the device to send an XML file by referencing a particular URL (web page) on the target device.

Build a Proprietary Application – Many integrators simply build application programs in Java, C++ or another language to receive and decode the XML file. They then display it, manipulate it, accumulate it or store it or pieces of it in a database.

FTP XML Documents From A Device – Some of the newer automation device for monitoring and archiving automation data use local storage to save device data in files. These files can be Comma Separated Value (CSV) files or XML files. If you have a device with that kind of local storage you can move data from the remote device with the data to your server using File Transfer Protocol (FTP). Once you have that data on your server you can open the files with a standard application like a Microsoft Office program, a database program, or a custom application, and process the data.

Enabling XML for Non XML-Enabled Devices

In the Automation world we use things for a long time. We really want forever but we’ll take 20 or 30 years. So, there’s a lot of stuff out on the factory floor that isn’t XML enabled. But it might have data that you need.

To begin with, there’s a bunch of controllers that have data you might want to display in a spreadsheet or pull into a database. Our company, Real Time Automation, has a product for moving user specified tags out of Rockwell PLCs and sending it out as XML documents. Those documents can then be delivered to Excel, databases, browsers or any other place capable of parsing an XML document.

Here are some of the key functional capabilities of this device:

  • PLC Tags to include in the XML document are user specified
  • Current data values can be retrieved by accessing the TCP/IP Address of the device with the extension current such as “10.1.1.16current.xml”
  • PLC Tag data is stored locally in records in a series of files
  • A PLC Tag can be used as a trigger Tag to trigger a stored record or the record can be stored on a time cycle
  • A user can specify when new files are created based on number of records, time, time of day and other means
  • Files are available for FTP transfer in XML or CSV format

For more information on the 460ETCXML product please visit the RTA catalog web page at https://www.rtautomation.com/products/ and see the section on XML products.

RTA is equipping the entire line of protocols (Modbus RTU, Modbus TCP, EtherNet/IP, Profinet IO, DeviceNet and the rest) to archive embedded device data into files that can be delivered to a user over FTP.

XML is a Key Component of OPC with Universal Access (OPC UA)

OPC-UA is the replacement technology for OPC Classic. It uses XML as one of its two encoding mechanisms. An encoding mechanism is the way that the individual data bytes are formatted on the wire. Before I discuss how UA uses XML, let’s take a quick look at what OPC-UA is.

Over the last thirty years business systems were largely built on open, standardized platforms where data can be easily shared (loosely coupled). Automation systems were largely built on closed, proprietary platforms where controlling the production process was the priority and the ability to easily share data has never been a requirement (tightly coupled).

Open Process Control with Universal Access (OPC UA), which I refer to as UA, is the next generation OPC technology. UA is a secure, open, reliable mechanism for transferring information between automation systems and the business systems. UA provides a very flexible and adaptable mechanism for moving data between enterprise-type systems and the types of controls, monitoring devices and sensors that interact with real world data.

Why a totally new communication architecture? Current technologies are limited and not well suited for today’s requirements to move data between enterprise/Internet systems and the systems that control real processes that generate and monitor live data. Some of these limitations include:

  • Platform dependence on Microsoft – current technology is built around DCOM (Distribution COM), an older communication technology that is being de-emphasized by Microsoft
  • Insufficient data models – Current technology lacks the ability to adequately represent the kinds of data, information and relationships between data items and systems that are important in today’s connected world
  • Inadequate security – Microsoft and DCOM are perceived by many users to lack the kind of security needed in a connected world with sophisticated threats from viruses and malware

UA is the first communication technology built specifically to live in that “no man’s land” where data must traverse firewalls, specialized platforms and security barriers to arrive at a place where that data can be turned into information. UA is designed to connect databases, analytic tools, Enterprise Resource Planning (ERP) systems and other enterprise systems with real-world data from low-end controllers, sensors, actuators and monitoring devices that interact with real processes that control and generate real-world data.

When UA is delivering factory floor information to IT-type devices it is convenient in that kind of system to deliver data that the IT application can easily process. Since XML is that kind of universal standard, the OPC Foundation selected UA as one of its encoding mechanisms.

Want to learn more or have a project you’d like to discuss with us?