Advanced Golang Tutorials: Dynamic JSON Parsing without Structs



Hello everyone,

In today's blog post, we are going to talk about a widely used format called JSON and will dynamically parse it on the fly. JSON (JavaScript Object Notation) is a commonly used HTTP format which allows structuring requests/responses. It's important to be able to extract data from this notation, for instance, let's say you are working on an HTTP server that handles JSON content and you are expecting a variety of requests in different forms. Being able to dynamically parse JSON gives you the flexibility you need without defining a specific struct for each request body.

First of all, let's start with some random JSON data to better visualize how to parse each and every property.

// JSON array that contains multiple objects
const jsonArray string = `[
	{
		"id": 1,
		"name": "oranges",
		"kind": "fruit",
		"amount": "3000kg",
		"origin": {
			"city": "Tacoma",
			"state": "Washington",
			"country": "USA",
			"suppliers": [ "Best Produce Co.", "FreshCo", "Walmart" ]
		}
	},
	{
		"id": 2,
		"name": "strawberries",
		"kind": "fruit",
		"amount": "1000kg",
		"origin": {
			"city": "Watsonville",
			"state": "California",
			"country": "USA",
			"suppliers": [ "Berry Berry Co.", "Greenery Co.", "Walgreens" ]
		}
	},
	{
		"id": 3,
		"name": "broccoli",
		"kind": "vegetable",
		"amount": "300kg",
		"origin": {
			"city": "Guadalajara",
			"state": "Jalisco",
			"country": "Mexico",
			"suppliers": [ "Delish Co." ]
		}
	}
]`

In the example above, we have a root level JSON array with 3 objects in it that represent orders for various fruits and vegetables. Let's start with a bad example that manually parses this JSON starting from the top. First object type we expect is an array:

func main() {
	// Empty interface of type Array
	var results []map[string]interface{}

	// Unmarshal JSON to the interface.
	json.Unmarshal([]byte(jsonArray), &results)

	// For each array object:
	for _, result := range results {
		// If you know the type of the field, directly use type casting (unsafe):
		id := result["id"].(float64)
		name := result["name"].(string)
		kind := result["kind"].(string) // Change this to a wrong type(e.g. int) and see the program crash.
		amount := result["amount"].(string)
		origin := result["origin"].(map[string]interface{})
		city := origin["city"].(string)
		state := origin["state"].(string)
		country := origin["country"].(string)
		suppliers := origin["suppliers"].([]interface{})

		fmt.Println("\nDirect Type Casting:")
		fmt.Println("\tid: ", id)
		fmt.Println("\tname: ", name)
		fmt.Println("\tkind: ", kind)
		fmt.Println("\tamount: ", amount)
		fmt.Println("\torigin: ")
		fmt.Println("\t\tcity: ", city)
		fmt.Println("\t\tstate: ", state)
		fmt.Println("\t\tcountry: ", country)
		fmt.Println("\t\tsuppliers: ")

		for index, v := range suppliers {
			fmt.Println("\t\t\t["+strconv.Itoa(index)+"]: ", v.(string))
		}
   }
}

As you can see, it's quite cumbersome and error prone to manually parse each and every single field and guess the type of each property. If you can't get it right, the program will crash. This opens up your HTTP server to all sorts of DDOS attacks. In order to prevent this guess-work, we can use type switches:


		// But if you don't know the field types, you can use type switching to determine (safe):
		// Keep in mind that, since this is a map, the order is not guaranteed.
		fmt.Println("\nType Switching: ")
		for k := range result {
			switch t := result[k].(type) {
			case string:
				fmt.Println("\t"+k+": ", t) // t has type string
			case bool:
				fmt.Println("\t"+k+":", t) // t has type bool
			case float64:
				fmt.Println("\t"+k+":", t) // t has type float64
			case map[string]interface{}:
				// Nested object, continue type switching for this object as well.
				// You can use a function here for code reuse.
				fmt.Println("\t" + k + ":")
				for kNested := range t {
					switch tNested := t[kNested].(type) {
					case string:
						fmt.Println("\t\t"+kNested+": ", tNested) // t has type string
					case bool:
						fmt.Println("\t\t"+kNested+":", tNested) // t has type bool
					case float64:
						fmt.Println("\t\t"+kNested+":", tNested) // t has type float64
					case map[string]interface{}:
						// If there is another level of nesting, do the same here.
						// At this point you should be able to see the necessity for a function reuse.
					case []interface{}:
						// In case the nested object contains an array as one of the fields, continue parsing here.
						fmt.Println("\t\t" + kNested + ":")
						for index, v := range tNested {
							switch tNestedArrayField := v.(type) {
							case string:
								fmt.Println("\t\t\t["+strconv.Itoa(index)+"]: ", tNestedArrayField) // t has type string
							case bool:
								fmt.Println("\t\t\t["+strconv.Itoa(index)+"]: ", tNestedArrayField) // t has type bool
							case float64:
								fmt.Println("\t\t\t["+strconv.Itoa(index)+"]: ", tNestedArrayField) // t has type float64
							case map[string]interface{}:
								// And so on...
							case []interface{}:
								// And so on...
							}
						}
					}
				}
			case []interface{}:
				// In case there is a nested array in this object, repeat the parsing here...
			}
		}

		fmt.Println("------------------------------")
	}
}

The program is a little better now, but still not great, since we had to manually add a type switch into each and every level of the JSON. This makes the program really hard to maintain and makes it less readable. If you haven't seen the pattern yet, let me tell you: There seems to be a pattern for each nested level and it looks like it's repeating itself. Well, if that's the case, we can go ahead and place the repeating part into a function:


func handleJSONObject(object interface{}, key, indentation string) {
	switch t := object.(type) {
	case string:
		fmt.Println(indentation+key+": ", t) // t has type string
	case bool:
		fmt.Println(indentation+key+": ", t) // t has type bool
	case float64:
		fmt.Println(indentation+key+": ", t) // t has type float64 (which is the type used for all numeric types)
	case map[string]interface{}:
		fmt.Println(indentation + key + ":")
		for k, v := range t {
			handleJSONObject(v, k, indentation+"\t")
		}
	case []interface{}:
		fmt.Println(indentation + key + ":")
		for index, v := range t {
			handleJSONObject(v, "["+strconv.Itoa(index)+"]", indentation+"\t")
		}
	}
}

Isn't recursion wonderful? We can achieve the same behaviour as the previous code block with this recursive function and it even scales much better than the previous one. As the levels of JSON gets deeper, you'd need to add more and more code to parse it. Having recursion removes this maintenance nightmare and simplifies our lives. If we call this function from a main function:


func main() {
	// Empty interface of type Array
	var results []map[string]interface{}

	// Unmarshal JSON to the interface.
	json.Unmarshal([]byte(jsonArray), &results)

	// For each array object:
	for _, result := range results {
		// But if you don't know the field types, you can use type switching to determine (safe):
		// Keep in mind that, since this is a map, the order is not guaranteed.
		fmt.Println("\nType Switching: ")
		for k := range result {
			handleJSONObject(result[k], k, "\t")
		}

		fmt.Println("------------------------------")
	}
}
Go ahead and run it yourself and see the results: https://play.golang.org/p/OZsyA-RQoue. I tried to cover most of the types in the recursive function but if you notice any type missing, feel free to send a PR in: https://github.com/herrberk/go-parse-dynamic-json!

I can almost hear you saying, what did we accomplish here, we just printed a JSON given to us onto the console. Yes, we did, however we managed to successfully parse each and every field without knowing anything about the input. Which is almost always the case for HTTP servers, since the user request cannot be trusted. 

Hope you enjoyed reading as much as I enjoyed writing this post! Stay safe and keep coding!

Cheers,
Author:

Software Developer, Codemio Admin

Disqus Comments Loading..