Reading Files With Go
While programming for work or as a hobby, you are very likely going to need to read the contents of a file stored on the filesystem where your program is running. In most programming languages there will be several ways to do this. Go is no exception. Which way you choose to read a file will depend on the specific use case. There are, however, some common scenarios that will make one approach better than another. This guide details the different ways to read a file, and under which circumstances that method is appropriate. When there are cases where the method is absolutely the wrong method, that will also be explained for less experienced programmers.
The Ultra Basic
The most basic way to read a file is to use os.ReadFile
. This function reads
the entire contents of the file into a byte
array. Let’s say you have a file
containing some JSON named author.json
with the following.
{
"id": "12345",
"name": "Donald Knuth"
}
The following code will unmarshal the file to a struct
. The contents of the
file is loaded with os.ReadFile
.
package main
import (
"encoding/json"
"log"
"os"
)
type author struct {
Id string `json:id`
Name string `json:name`
}
func main() {
var author author
filename := "author.json"
authorJson, err := os.ReadFile(filename)
if err != nil {
log.Printf("Could not read file %s", filename)
os.Exit(1)
}
if err := json.Unmarshal(authorJson, &author); err != nil {
log.Printf("invalid json")
os.Exit(1)
}
log.Println("id: ", author.Id)
log.Println("name: ", author.Name)
}
This will work so long as you can be sure the size of the input file can fit into your system’s available memory. This is not always the case, so let’s explore a couple of other methods to read the contents of a file.
Reading a File Line-By-Line
Another common requirement is to read a file where each line represents a
distinct item. For example, each line could contain a data point collected by
some sensor. Let’s say you have a file called sensor-data.txt
with the
following contents.
12.34
42.67
23.55
33.87
53.22
error
30.10
Note the line that contains the text error
. Perhaps the sensor temporarily
failed at when attempting to read and wrote error
to the file rather than a
numerical value. Let’s also assume you need to calculate the average of the
values. This can be accomplished by using os.Open
, which returns a pointer to
a File
, then bufio.NewScanner
, which takes a Reader
and creates a new
Scanner
. Here is an example that demonstrates the use of these constructs.
package main
import (
"bufio"
"fmt"
"os"
"strconv"
)
func main() {
reader, err := os.Open("sensor-data.txt")
defer reader.Close()
if err != nil {
fmt.Println("Could not read input file")
os.Exit(1)
}
lineReader := bufio.NewScanner(reader)
sum := 0.0
lines := 0
for lineReader.Scan() {
val, parseErr := strconv.ParseFloat(lineReader.Text(), 64)
if parseErr != nil {
fmt.Println("could not parse value ", lineReader.Text())
continue
}
sum += val
lines++
}
if readErr := lineReader.Err(); readErr != nil {
fmt.Println("Could not read all of input file")
os.Exit(1)
}
fmt.Printf("Average: %f", sum/float64(lines))
}
The first line of the main()
function opens the file.
reader, err := os.Open("sensor-data.txt")
The next lines of interest create the Scanner
and initialize the values
necessary to calculate the average, sum
and lines
.
lineReader := bufio.NewScanner(reader)
sum := 0.0
lines := 0
The Scan()
method is called within a for
loop until it returns false
. The
value at the scanner’s current line is obtained via the method
lineReader.Text()
and is converted from a string
to a float64
.
for lineReader.Scan() {
val, parseErr := strconv.ParseFloat(lineReader.Text(), 64)
}
Because the sensor wrote error
on one of the lines rather than a valid
floating point number, the line is skipped and not added to the total number of
data points. Otherwise the value of the line is added to the sum
and the
number of lines
is incremented.
if parseErr != nil {
fmt.Println("could not parse value ", lineReader.Text())
continue
}
sum += val
lines++
The last thing to do before calculating the average is to check if there are were any errors reading the file. If there were, the program is exiting with an error.
if readErr := lineReader.Err(); readErr != nil {
fmt.Println("Could not read all of input file")
os.Exit(1)
}
If the file was read completely without error, the average can be calculated and printed.
fmt.Printf("Average: %f", sum/float64(lines))
This approach works well when there is a file that is formatted in a way that it can be read one line at a time, and you need to perform an operation that is amenable to processing as such. Not everything that needs to be processed is plain text, however, in which case a different approach is required. One such example is explored in the next section.
Reading a Binary File
In some cases a binary file will need to be processed. Additionally, it will often be necessary or at least useful to be able to process the file without reading the entire contents into memory. One example is when a cryptographic hash, such as sha512, needs to be calculated. Like the second example, it might be impractical to read the entire contents into memory. Unlike the second example, binary files cannot be read line-by-line. For this case, a different approach is required. Most programming languages provide a way to implement this capability in a way that is idiomatic regardless of the language. That is, to open a file, read a portion of the file into buffer (that is, an array of bytes), and process the buffer that was just read. This process continues until there is nothing left to read. Go provides such functionality, as demonstrated in the following example.
First, create a binary fine using the following command.
echo "create a sha512sum" > gzip - > hello.txt.gz
This will create a file called hello.txt.gz
with the compressed contents of
create a sha512sum
.
Here is the code to calculate the hash.
package main
import (
"crypto/sha512"
"encoding/hex"
"fmt"
"io"
"os"
)
func main() {
file, fileErr := os.Open("hello.gz")
defer file.Close()
if fileErr != nil {
fmt.Println("Could not open file")
os.Exit(1)
}
hash := sha512.New()
// Normally this value is larger, something like 4096. A value of 2 is used
// to illustrate multiple reads.
buf := make([]byte, 2)
for {
n, err := file.Read(buf)
if err == io.EOF || n < 0 {
break
}
if err != nil {
fmt.Println("error")
break
}
hash.Write(buf[:n])
}
sum := hex.EncodeToString(hash.Sum(nil))
fmt.Println(sum)
}
As before the file is opened using os.Open
. The hash and the buffer to store
the contents of each read are then created.
hash := sha512.New()
// Normally this value is larger, something like 4096. A value of 2 is used
// to illustrate multiple reads.
buf := make([]byte, 2)
Within the for
loop, the contents of the file are read into the buffer.
n, err := file.Read(buf)
The returned values are the number of bytes read, and whether or not an error was returned. The number of bytes will be used to indicate the range of the slice to write to the hash. If the end of the file is reached, or fewer than zero bytes are read, break out of the loop. Otherwise write the contents read from the file to the hash.
if err == io.EOF || n < 0 {
break
}
hash.Write(buf[:n])
Once out of the loop, calculate the sum, convert it to hex and write the results.
sum := hex.EncodeToString(hash.Sum(nil))
fmt.Println(sum)
To see how the size of the buffer affects performance, experiment with different sizes using large files you may have on your system.
Conclusion
There are different ways to read files using Go. If you are new to Go but experienced in other languages you have likely encountered these methods before. Regardless, it is critical programmers know which method is appropriate for a use case, as different methods have different performance characteristics. Some methods will be inappropriate regardless of performance, and knowing which approach is essential to write correct and safe software.