This is a series of post where I learned, in a hard way, important things.
Today I experienced multiple production problems:
- Memory leaks
- Pods constantly restarting (OOMKilled)
- No more available connections to database
- Clients error, deadlines exceeded
- And more.
The main problem: connections not being closed.
An gRPC server was not closing database cursors. Each request was kept alive and so the database connection. After some requests, all pods started to log errors of missing connection availability to the database and panicing. Others started memory leaking and gets killed (OOMKilled).
I am going to list all cases that I remember.
File descriptors
Close opened files.
func main() {
file, err := os.Open("foo.txt")
if err != nil {
log.Fatal(err)
}
file.Close()
}
HTTP Requests
This is a common one. Close the request body reader.
func main() {
res, err := http.Get("http://google.com.br")
if err != nil {
log.Fatal(err)
}
defer res.Body.Close()
// do something with body request
}
Database cursors
Close database cursors and connections.
func main() {
db, err := sql.Open("mysql", "mysql://host:port/database")
if err != nil {
log.Fatal(err)
}
defer db.Close()
rows, err := db.QueryContext(context.Background(), "SELECT * FROM table")
if err != nil {
log.Fatal(err)
}
defer rows.Close()
// do something with rows
}
gRPC clients
This is a special case, after you doing everything close the connection. Unless you need it kept alive. Same as database.
func main() {
var conn *grpc.ClientConn
conn, err := grpc.Dial("localhost:9000", grpc.WithInsecure())
if err != nil {
log.Fatalf("did not connect: %s", err)
}
defer conn.Close()
c = hello.NewHelloServiceClient(conn)
}