Getting Started with XML

This page will become a helpful place to get started in modding Civilization IV. It is layed out in a semi-logical format (as much as possible in wiki), starting from the basics of tweaking things using XML.

Getting Started With XML
A large part of the configuration of Civ4 is done through XML. All of the text (in different languages), unit stats, building costs and more are encoded in XML and are saved in plain text files, usually with the extension .xml.

That's all great, but what does XML look like, and how does it work? If you are comfortable with HTML, or even the VB code used in your posts, then learning XML should be no problem. It consists of tags like … that wrap around pieces of information to describe them. To see a good example of XML in use, look no further than your PMs! You can download PMs in XML format- there is an option in the menu where you delete them.

As an example, here is a PM that I sent to myself:

<?xml version="1.0" encoding="ISO-8859-1" ?> 
  <!-- Civilization Fanatics' Forums;http://forums.civfanatics.com --> 
  <!-- Private Message Dump for User Weasel Op; Oct 25, 2005 09:31 AM --> 
  <privatemessages> 
    <folder name="Inbox"> 
      <privatemessage> 
        <datestamp>2005-10-25 09:31</datestamp> 
        <title>XML in Civ4</title>
        <fromuser>Weasel Op</fromuser> 
        <touser /> 
        <message>Hey, did you know that Civ4 uses XML? :D</message> 
      </privatemessage> 
    </folder> 
  </privatemessages>

Now that probably looks awfully complicated, especially if you have never seen HTML, but don't worry. I'll explain how it works.
<?xml version="1.0" encoding="ISO-8859-1" ?> 
  <!-- Civilization Fanatics' Forums;http://forums.civfanatics.com --> 
  <!-- Private Message Dump for User Weasel Op; Oct 25, 2005 09:31 AM -->

That part just tells the program reading the file, and you the coder, what this is. The first tag informs the program that this is an XML file, written in version 1.0, and that it uses the Western European character set. The next 2 lines are comments- notice the <!— and —>. Everything between those is a comment, and is ignored by the program. It's just there for you, so you know what this is.

Next comes the good stuff:

<privatemessages> 
    <folder name="Inbox">

The first tag that really does something is the <privatemessages> tag. This tag tells the computer that everything from this tag through the closing tag (a tag with a / before the name, in this case </privatemessages>) is an element. An element is a piece of information. It can contain other elements, text, or both. In this case, <privatemessages> begins an element that contains all of the private messages in the file. (There is only 1 here, but if I had saved my entire Inbox they would all be in there.)

Next is the <folder> tag. Notice that this tag does not only have a name, but a description. This description is called an attribute. An attribute is a little piece of information included in an element tag. It gives the element a characteristic, in this case the name "Inbox." This is an example of the flexibility of XML. "Inbox" could have been a separate element within "folder" like this:

<privatemessages> 
    <folder> 
      <Inbox>

Here, "Inbox" is an element within "folder." However, "Inbox" is better as an attribute because logically, "Inbox" is a specific folder, not a part of a folder.

Are you getting the hang of it yet? Let's move on. This is where things start getting interesting:

<privatemessage> 
    <datestamp>2005-10-25 09:31</datestamp> 
    <title>XML in Civ4</title> 
    <fromuser>Weasel Op</fromuser> 
    <touser /> 
    <message>Hey, did you know that Civ4 uses XML? :D</message> 
  </privatemessage>

That's a big chunk of code, but if you look closely it folloes the same pattern. This is the PM itself, the actual information that I sent myself. Notice the <privatemessage> tag. This is NOT the same as <privatemessages>. That tag defined everything in it to be a group of private messages, but this tag defines itself to be a single PM. Look at the bottom line. See that closing tag? It has the same name, but a slash before it: </privatemessage>. This is the closing tag for <privatemessage> indicating the end of that element. Everything in between, including the tags themselves, are part of the element.

The second line gives us the date and time the message was sent. The <datestamp> tags wrap around it, indicating that it is also an element. Similarly, the next line is the title of the PM, and the line after that indicates who the message is from. Keep in mind that each of these elements are part of the <privatemessage> element, which is part of the <folder> element, which is in turn part of the <privatemessages> element.

The next line looks a little different. This tag combines the opening and closing tags into one. Since the line is blank, there is no need for two separate tags. Adding the slash to the end after the slash is like xml shorthand- it opens and closes the element with the same tag. The "touser" element is blank becuase this is my own PM. If it were addressed to someone else, it wouldn't be in my inbox!

The next element is the heart of it all. This is the actual text of the PM, enclosed in the <message> tags. Depending on the PM, this element could inclde several lines of text, VB code, etc. But the important thing to note is that it's all text. There is nothing magic about XML. There is no compiler, no functions, no special characters. Everything is saved, sent, and edited in text format only.

As I mentioned, the last line indicates the end of the <privatemessage> element.

Now let's look at the last piece of code:

</folder> 
</privatemessages>

By now you can probably guess what these do. </folder> closes the "folder" element. Notice that the attribute "Inbox" is not included here. The attribute is a part of the element, not an element itself. In fact it can't be closed because it was never opened. It's just a characteristic of the "folder" element. If there's no more "folder", it can't be an "Inbox". The final line closes the <privatemessages> element.

Now you may be wondering, why are there 3 elements inside of each other like that? If you look back at the entire piece of code, you will see that the 3 major elements open and close together:

<privatemessages> 
  <folder name="Inbox"> 
    <privatemessage> 
    . . . 
    </privatemessage> 
  </folder> 
</privatemessages>

Why couldn't those all be one element? Actually, the only reason they appear that way is because I only saved 1 PM. If I had downloaded all of my PMs in one file, it would be different. <privatemessages> would still contain everything, because that is the root element. That just means that is the primary, basic element of the file. The whole file is essentially one element, <privatemessages>. Inside <privatemessages> would be two <folder> elements: <folder name-"Inbox"> and <folder name="Sent Items">. After the first </folder> tag, the second folder would then begin:
</folder> 
<folder name="Sent Items">

The same would apply to <privatemessage>. Each PM would be a separate element, and after each closed the next would begin, until the end of the folder.

Now that wasn't too hard was it? Once you understand the basic concepts of how XML works, it's a simple process.

Now I know you're probably wondering, how does that help me mod Civ4? PMs have nothing to do with it. That's right, but you will be surprised at how similar it will be.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License