Hypercipient

Reading Files With Go

While programming for work or as a hobby, you are very likely going to need to read the contents of a file stored on the filesystem where your program is running. In most programming languages there will be several ways to do this. Go is no exception. Which way you choose to read a file will depend on the specific use case. There are, however, some common scenarios that will make one approach better than another. This guide details the different ways to read a file, and under which circumstances that method is appropriate. When there are cases where the method is absolutely the wrong method, that will also be explained for less experienced programmers.

The Ultra Basic

The most basic way to read a file is to use os.ReadFile. This function reads the entire contents of the file into a byte array. Let’s say you have a file containing some JSON named author.json with the following.

{
 "id": "12345",
"name": "Donald Knuth"
}

The following code will unmarshal the file to a struct. The contents of the file is loaded with os.ReadFile.

package main

import (
	"encoding/json"
	"log"
	"os"
)

type author struct {
	Id   string `json:id`
	Name string `json:name`
}

func main() {

	var author author
	filename := "author.json"

	authorJson, err := os.ReadFile(filename)

	if err != nil {
		log.Printf("Could not read file %s", filename)
		os.Exit(1)
	}

	if err := json.Unmarshal(authorJson, &author); err != nil {
		log.Printf("invalid json")
		os.Exit(1)
	}

	log.Println("id: ", author.Id)
	log.Println("name: ", author.Name)
}

This will work so long as you can be sure the size of the input file can fit into your system’s available memory. This is not always the case, so let’s explore a couple of other methods to read the contents of a file.

Reading a File Line-By-Line

Another common requirement is to read a file where each line represents a distinct item. For example, each line could contain a data point collected by some sensor. Let’s say you have a file called sensor-data.txt with the following contents.

12.34
42.67
23.55
33.87
53.22
error
30.10

Note the line that contains the text error. Perhaps the sensor temporarily failed at when attempting to read and wrote error to the file rather than a numerical value. Let’s also assume you need to calculate the average of the values. This can be accomplished by using os.Open, which returns a pointer to a File, then bufio.NewScanner, which takes a Reader and creates a new Scanner. Here is an example that demonstrates the use of these constructs.

package main

import (
	"bufio"
	"fmt"
	"os"
	"strconv"
)

func main() {

	reader, err := os.Open("sensor-data.txt")

	defer reader.Close()

	if err != nil {
		fmt.Println("Could not read input file")
		os.Exit(1)
	}

	lineReader := bufio.NewScanner(reader)
	sum := 0.0
	lines := 0

	for lineReader.Scan() {
		val, parseErr := strconv.ParseFloat(lineReader.Text(), 64)

		if parseErr != nil {
			fmt.Println("could not parse value ", lineReader.Text())
			continue
		}

		sum += val
		lines++
	}

	if readErr := lineReader.Err(); readErr != nil {
		fmt.Println("Could not read all of input file")
		os.Exit(1)
	}

	fmt.Printf("Average: %f", sum/float64(lines))
}

The first line of the main() function opens the file.

reader, err := os.Open("sensor-data.txt")

The next lines of interest create the Scanner and initialize the values necessary to calculate the average, sum and lines.

lineReader := bufio.NewScanner(reader)
sum := 0.0
lines := 0

The Scan() method is called within a for loop until it returns false. The value at the scanner’s current line is obtained via the method lineReader.Text() and is converted from a string to a float64.

for lineReader.Scan() {
	val, parseErr := strconv.ParseFloat(lineReader.Text(), 64)

}

Because the sensor wrote error on one of the lines rather than a valid floating point number, the line is skipped and not added to the total number of data points. Otherwise the value of the line is added to the sum and the number of lines is incremented.

if parseErr != nil {
	fmt.Println("could not parse value ", lineReader.Text())
	continue
}

sum += val
lines++

The last thing to do before calculating the average is to check if there are were any errors reading the file. If there were, the program is exiting with an error.

if readErr := lineReader.Err(); readErr != nil {
	fmt.Println("Could not read all of input file")
	os.Exit(1)
}

If the file was read completely without error, the average can be calculated and printed.

fmt.Printf("Average: %f", sum/float64(lines))

This approach works well when there is a file that is formatted in a way that it can be read one line at a time, and you need to perform an operation that is amenable to processing as such. Not everything that needs to be processed is plain text, however, in which case a different approach is required. One such example is explored in the next section.

Reading a Binary File

In some cases a binary file will need to be processed. Additionally, it will often be necessary or at least useful to be able to process the file without reading the entire contents into memory. One example is when a cryptographic hash, such as sha512, needs to be calculated. Like the second example, it might be impractical to read the entire contents into memory. Unlike the second example, binary files cannot be read line-by-line. For this case, a different approach is required. Most programming languages provide a way to implement this capability in a way that is idiomatic regardless of the language. That is, to open a file, read a portion of the file into buffer (that is, an array of bytes), and process the buffer that was just read. This process continues until there is nothing left to read. Go provides such functionality, as demonstrated in the following example.

First, create a binary fine using the following command.

echo "create a sha512sum" > gzip - > hello.txt.gz

This will create a file called hello.txt.gz with the compressed contents of create a sha512sum.

Here is the code to calculate the hash.

package main

import (
	"crypto/sha512"
	"encoding/hex"
	"fmt"
	"io"
	"os"
)

func main() {

	file, fileErr := os.Open("hello.gz")

	defer file.Close()

	if fileErr != nil {
		fmt.Println("Could not open file")
		os.Exit(1)
	}

	hash := sha512.New()

	// Normally this value is larger, something like 4096. A value of 2 is used
	// to illustrate multiple reads.
	buf := make([]byte, 2)

	for {
		n, err := file.Read(buf)

		if err == io.EOF || n < 0 {
			break
		}

		if err != nil {
			fmt.Println("error")
			break
		}

		hash.Write(buf[:n])
	}

	sum := hex.EncodeToString(hash.Sum(nil))
	fmt.Println(sum)
}

As before the file is opened using os.Open. The hash and the buffer to store the contents of each read are then created.

hash := sha512.New()

// Normally this value is larger, something like 4096. A value of 2 is used
// to illustrate multiple reads.
buf := make([]byte, 2)

Within the for loop, the contents of the file are read into the buffer.

n, err := file.Read(buf)

The returned values are the number of bytes read, and whether or not an error was returned. The number of bytes will be used to indicate the range of the slice to write to the hash. If the end of the file is reached, or fewer than zero bytes are read, break out of the loop. Otherwise write the contents read from the file to the hash.

if err == io.EOF || n < 0 {
	break
}

hash.Write(buf[:n])

Once out of the loop, calculate the sum, convert it to hex and write the results.

sum := hex.EncodeToString(hash.Sum(nil))
fmt.Println(sum)

To see how the size of the buffer affects performance, experiment with different sizes using large files you may have on your system.

Conclusion

There are different ways to read files using Go. If you are new to Go but experienced in other languages you have likely encountered these methods before. Regardless, it is critical programmers know which method is appropriate for a use case, as different methods have different performance characteristics. Some methods will be inappropriate regardless of performance, and knowing which approach is essential to write correct and safe software.

Tags: