Parser Pandemonium

Sometimes it is taken for granted that we have high-level programming languages, and all our fancy, thousand-line catastrophes are happily fed through a compiler that reads and interprets the information, then spits out byte code that only a computer could understand as its final form. The thing is, by understanding how to do such a thing yourself (at least, in a sense of it), you become a better and more knowledgeable programmer. Understanding how to make your own compiler and/or interpreter in the programming language of your choice is also critical for a game developer. An interpreter allows the creation of external resources without having to outsource the tools or use existing stuff. While it’s always a good idea to use the things others have made (for example, libraries; why write your own when someone has already done it, and possibly better?), making your own interpreter gives you insight into how one works, and besides, without your own interpreter it may be difficult to adjust it to read the input that you are designing!

So, let’s get to the meat of it all: how to write it. Well, I’ll be using the STL library in my examples but feel free to use your own implementations (as long as they support the operations I’m going to demonstrate). Note that these are simply examples/samples to look at and learn and understand how a parser works and the fundamentals of it all. This is likely by no means a copy-paste tutorial that’ll work out of the box (especially since some of these examples will be coming from my own personal work)

So say you have something like this for input:

value = 22

Essentially, what you are trying to do is get the programming equivalent:

int myvar = value  // value would be 22

The first thing you have to do, of course, is actually obtain your input. You can use input streams or your own file reading methods if you so choose, but I prefer the good ol’ C library FILE typedef and it’s associated methods. In some basic code:

std::string mystoragestring;
FILE *mytextfile;
 
mytextfile = fopen("mytext.txt");
 
while(!eof)
{
	mystoragestring += mytextfile.Read();
}

Ok, so now you have mystoragestring, which contains the contents of your input. What next? Simple, you read each character of the storage string, character by character, and assemble “tokens” from those. Each completed token should represent a word, a special character (such as =, +, etc.), or a self-contained string (“like this one”). Essentially, you do this:

class Parser // Design a class to encapsulate all your parsing actions
{
	char *cursor;			// Position of the 'cursor' in a string (essentially
					// the array index of a C-string)
 
	std::string mystoragestring;	// This is really where mystoragestring should go
 
	int storagelen;			// When parsing in your storage string, record the length
					// here and check against it to prevent any buffer
					// overruns or errors in general (and so you know you're
					// at the end of the stored contents)
 
	char *ctoken;			// A C-style token holder for compatibility with C
					// conversion methods
 
	std::string GetToken();		// This reads in a "token"
	std::string PeekToken();	// I won't define this one in the example here, but this
					// basically does a GetToken() but doesn't modify the
					// cursor position, essentially allowing you to preview
					// the upcoming token
 
	int GetInt();			// This returns the integer equivalent of a
					// token's contents
};
 
std::string Parser::GetToken()
{
	std::string ret_token = ""; // The token we'll return when finished
 
	// Check for a white space to stop reading at, OR stop reading if we are at
	// the end of file
	while(PeekToken() != IsWhiteSpace() && cursor < storagelen)
	{
		ret_token += mystoragestring[current];
		current++;
	}
 
	return ret_token;
}
 
int Parser::GetInt()
{
	// Get token
	std::string token = GetToken();
 
	// Allocate the C-style token's size just large enough to fit the token
	ctoken = new char[token.size()];
 
	// Dump the token we received into a C-style token
	strcpy(ctoken, token.c_str());
 
	// Return integer value
	return atoi(ctoken);
}

So, with this in mind, what happens next? Well, you use the parser to read in the original example, and if you were to implement some debug logging or use something like the MSVC debugger and slowly step through the program, what you’d do is check until the token reads a specific keyword, and then make sure the next token is an “=” sign, and then set your variable to the next token read. You can obviously implement your own type-safe checking with this if you please.

Remember that this sample is extremely bare bones. It’s best to make your parser able to interpret many different kinds of data types, allow dynamic navigation of tokens and the storage space, and many other robust features that can be implemented. But this example should give those new to writing such a piece of a program a general idea of what they need to do for it to start working.

Hope you enjoyed this little article, feel free to comment on it if you have something to mention!

Leave a Reply