Build Cache from Scratch: 6. observe our service

Table of Contents

wtf cache - This article is part of a series.

Part : This Article

Part : Build Cache from Scratch: 5. TCP cache server

Part : Build Cache from Scratch: 4. HTTP cache server

Part : Build Cache from Scratch: 3. Cache core implementation

Part : Build Cache from Scratch: 2. initialize project

Part : Build Cache from Scratch: 1. introduction

Download source code:
https://github.com/challenai/wtfcache

observe our service
#

If we meet some performance issue, we need some information of our server to help us improve.
To know the current state of our servers, we need to collect some metric.
In another word, we need to collect some internal information of the server to expose metrics.
The count of keys, the size of our data, the count of requests, the delays, and something else,
we don’t collect too much information about our server since it’s a tutorial,
if you can collect 2 metrics, you can collect 200 metrics.

how to collect
#

we decide to count the keys and compute the total storage size as example.
No matter what interface and protocol we want to expose, the internal implementation should be the same.
Therefore, we make the stat feature a shared one,
in this case, we only collect cache metrics, so we place the metrics into cache folder directly.

However, we should remember that we used sync.Map to replace internal hashmap to avoid thread unsafe issue,
Therefore, we use atomic package to synchronise our statistics progress here.

// stat.go
type Stat struct {
	// entries count
	keys int64
	// size of all the entries
	sz int64
}

func (ms *Stat) CountKeys() int64 {
	return atomic.LoadInt64(&ms.keys)
}

func (ms *Stat) GetSz() int64 {
	return atomic.LoadInt64(&ms.sz)
}

func (ms *Stat) IncrKey(num int64) int64 {
	return atomic.AddInt64(&ms.keys, num)
}

func (ms *Stat) IncrSz(sz int64) int64 {
	return atomic.AddInt64(&ms.sz, sz)
}

update count when set and delete operation appears
#

When we set a key, there’re 2 cases: add or update.
If we meet an add operation, we need to add key count and size of entry.
If we meet an update operation, we need to compute the difference between the existed key and new one.

For a delete operation, it’s just check whether the key exists and decide whether we need to decrease count.

func (mc *xxCache) Set(k string, v []byte) error {
    // test if the key exists.
	if !exist {
		mc.Stat.IncrKey(1)
		mc.Stat.IncrSz(int64(len(k) + len(v)))
        return nil
    }
    mc.Stat.IncrSz(int64(len(v) - len(v_.([]byte))))
    // ...
}

However, no matter what case we program in our 2 modify operation, we can find there’s a race issue,
For example, if 2 requests to set a key arrived concurrently, the “check if the key exists” progress can get a true reply at the same time, and the key count would be inaccurate,
There are different methods to handle this problem.
The easiest one is to lock the critical area and try to make it synchronised.

func Set() {
    mutex.Lock()
    defer mutex.Unlock()

    // ...
    if exist {
        // ...
    }
}

Since it’s a high performance cache application, the cost of a lock in hot code area is not acceptable,
we can use cas idea to store the original state, then we set the key if there’s no confliction, and retry if it conflicts.

// positive lock
func Set() {
    for {
        count := mc.Stat.CountKeys()
        // ...
        if exist {
            // ...
        }

        mc.Stat.CompareAndSetKeyCount(count, int64(len(v) - len(v_.([]byte))))
    }
}

To make the tutorial as understandable as possible, we don’t actually use and lock,
However, in the real world, the application is not always correct, but runs as normal,
and trade-off is some thing works here, to trade off between performance and accuracy of keys count, we choose to sacrifice the accuracy of keys count,
because it’s just a metrics, 12.64k keys and 12.63k keys are totally no difference, and this trade-off improve performance, make it easy to maintain too.