Advanced Golang Tutorials: Introduction to Concurrency

Hello folks,

After a long break, I have decided to put together a set of Advanced Go tutorials to help fellow gophers to learn more about this powerful programming language. This post assumes that you have a working knowledge with Go and you are here to learn even further. Today, we are going to delve into one of the most important topics in any programming language, Concurrency.

Go is both a low-level systems and a great general purpose language and one of its primary strengths is the built-in concurrency model and tools. Many programming languages have third-party libraries for concurrency, but inherent concurrency is what makes Go stand out, since it is a core feature of its design.

Goroutines

In Go, the concurrency is handled via a method/function called Goroutine. Goroutines can be thought of as lightweight threads. The cost of creating a Goroutine is very small compared to creating a thread. Therefore, it's quite common for a Go application to have thousands of Goroutines running at the same time. Goroutines use a built-in mechanism called Channels. Channels by design prevent race conditions from happening when accessing shared memory using Goroutines. Channels can be thought of as a pipe using which Goroutines communicate. Let's take a look at some examples and their outputs to better understand how Goroutines work:


package main

import (
 "log"
 "time"
)

func print(s string) {
 for i := 0; i < 5; i++ {
  log.Println(s, i)
  time.Sleep(100 * time.Millisecond)
 }
}

func main() {
 // Let's create a boolean channel to prevent main func from terminating.
 waitc := make(chan bool)

 // This new goroutine will execute concurrently with the calling one.
 go print("goroutine")

 // You can also start a goroutine for an anonymous function call.
 go func(msg string) {
  log.Println(msg)
 }("Running")

 // Timer to end the program after 3 seconds.
 go func() {
  time.Sleep(time.Second * 3)
  log.Println("Exiting")
  waitc <- true
 }()

 // Regular function call.
 print("function call")

 <-waitc
}

In this example, we create a function called "print" and we call this function in a Goroutine and as a regular function. Being asynchronous, when a goroutine is invoked, it waits for the blocking code to complete before concurrency begins. This is the reason of seeing "function call 0" printed out first, even though it's placed below the goroutines in the main function.

Unlike regular function calls, the control does not wait for a Goroutine to finish its execution. It returns immediately to the next line of code after the Goroutine call and any return values from the Goroutine are ignored. The main Goroutine has to be running for any other Goroutines to be able to continue running. If the main Goroutine terminates then the program terminates along with all the running Goroutines. In order to prevent the main function from terminating, we created a channel called "waitc" in the example above that we read from at the end of the main function (Basic sends and receives on channels are blocking). This allows us to terminate the program whenever we want and we did terminate it using a timer in a separate Goroutine. Here is a sample output(yours will be likely different):


2018/02/25 21:48:02 function call 0
2018/02/25 21:48:02 goroutine 0
2018/02/25 21:48:02 Running
2018/02/25 21:48:02 function call 1
2018/02/25 21:48:02 goroutine 1
2018/02/25 21:48:02 function call 2
2018/02/25 21:48:02 goroutine 2
2018/02/25 21:48:02 goroutine 3
2018/02/25 21:48:02 function call 3
2018/02/25 21:48:03 goroutine 4
2018/02/25 21:48:03 function call 4
2018/02/25 21:48:05 Exiting

Channels

Channels are a typed conduit through which you can send and receive values with the channel operator, <-. Read and write operations on channels are blocking by default and need to be handled with care. It's also possible to create buffered channels to be able to do multiple read-writes concurrently. Let's take a look at this example to understand how channels work:


package main

import (
 "log"
)

func fibonacci(n int, c chan int) {
 x, y := 0, 1
 for i := 0; i < n; i++ {
  c <- x
  x, y = y, x+y
 }
 close(c)
}

func main() {
 // We create a buffered channel with capacity 10
 c := make(chan int, 10)
 go fibonacci(cap(c), c)
 // Range receives values from the channel repeatedly until it is closed.
 for i := range c {
  log.Println(i)
 }
}

Channels aren't like files; you don't usually need to close them. Closing is necessary when the receiver must be told there are no more values coming, such as to terminate a range loop like in our example above. Here is the output of the program:


2018/02/25 22:14:34 0
2018/02/25 22:14:34 1
2018/02/25 22:14:34 1
2018/02/25 22:14:34 2
2018/02/25 22:14:34 3
2018/02/25 22:14:34 5
2018/02/25 22:14:34 8
2018/02/25 22:14:34 13
2018/02/25 22:14:34 21
2018/02/25 22:14:34 34

Before we move on to the next section, I'd like to share a beautifully written post named Visualizing Concurrency. The author brilliantly puts together different examples of concurrency and provides a visual representation of the program flow. I strongly recommend you to take a look at it.

Locks

We've seen how goroutines use channels to communicate with each other, but what if we need different goroutines to modify a data structure or write to a stream concurrently. Do you see the problem here? If more than one goroutine attempts to update a data structure, they will overwrite each others' changes. How can we prevent this behaviour?

We can prevent this behaviour using Mutexes. A Mutex is used to provide a locking mechanism to ensure that only one Goroutine is running the critical section of code at any point of time to prevent race condition from happening. Mutex is available in the sync package. There are two methods defined on Mutex namely Lock and Unlock. Let's see an example use:


package main

import (
 "log"
 "sync"
 "time"
)

// SafeMap is safe to use concurrently.
type SafeMap struct {
 data map[string]string
 mu  sync.Mutex
}

// Write writes to the SafeMap.
func (s *SafeMap) Write(key string, value string) {
 // We lock so only one goroutine at a time can access the map
 s.mu.Lock()
 defer s.mu.Unlock()
 s.data[key] = value
}

// Read prints the current value of the map for the given key.
func (s *SafeMap) Read(key string) {
 s.mu.Lock()
 defer s.mu.Unlock()
 log.Println(s.data[key])
}

func main() {
 waitc := make(chan bool)

 s := SafeMap{data: make(map[string]string)}

 go s.Write("index1", "value1")
 go s.Write("index2", "value2")
 go s.Write("index3", "value3")
 go s.Write("index4", "value4")
 go s.Write("index5", "value5")

 time.Sleep(time.Second * 1)

 for k := range s.data {
  go s.Read(k)
 }

 go func() {
  time.Sleep(time.Second * 3)
  waitc <- true
 }()

 <-waitc
}

In this example, we created a SafeMap implementation that allows multiple goroutines to write and read on it using Mutexes. If you would like to see the race condition when there is no locking mechanism used in this example, simply comment out Lock() and Unlock() lines in the "Write" method. If we would like to improve this example a bit, we can use a RWMutex. RWMutex can be used instead of Mutex when reading and writing operations are done exclusively by different parts of our code. The above example we write to the SafeMap using the "Write" method and read from the "SafeMap" using the "Read" method so we can definitely use a RWMutex instead of a Mutex:


type SafeMap struct {
 data map[string]string
 mu  sync.RWMutex    // sync.RWMutex provides more granular control over sync.Mutex
}

// Notice how the methods start with R when using RWMutex for read operations.
func (s *SafeMap) Read(key string) {
 s.mu.RLock()
 defer s.mu.RUnlock()
 log.Println(s.data[key])
}

Defer

On a final note, I'd like to talk about the defer statement. Defer is a very powerful control flow mechanism that is unique to Go. It allows you to postpone a function call to the end of the currently executing function. This is useful in many scenarios like ours above but some cases require some caution such as creating a server after a defer statement. In most cases starting a server is a blocking operation and for this reason return statement may not be reached at all. In these cases, defer statement is not going to be reached and this might cause many issues in your code. Here is an example of the misuse of defer:


func main() {
  db := NewDB('database')
  defer db.Close()
  http.HandleFunc("/", handler)
  http.ListenAndServe(":8080", nil)
}

I hope this post helps you to understand the fundamentals of Go's concurrency features. Please play with the examples and try to understand the behaviour of the code. Feel free to drop a line below if you have any questions or comments. Cheers and happy coding!