The filesystem cache (also known as the page cache) is a critical part of unix and unix-like operating systems. Because it is so much faster to access RAM than it is to access a hard-drive, every file that is read from a hard-disk or solid-state drive is first read into memory managed by the filesystem cache. Once it is there, the file contents are provided to application that performed the read. Because files that were accessed recently tend to be accessed again soon after, the operating system keeps the file contents in the filesystem cache just in case it is needed again.
After you have installed vmprobe, you can begin to control the filesystem cache on linux machines directly.
Going back to vmtouch, the term we use for accessing a file so as to force it to be loaded into the filesystem cache is "touching" the file (or portion of a file). Similarly, indicating to the operating system to remove a file (or portion of a file) from the filesystem cache to make room for more files is called "evicting".
vmprobe provides 2 sub-commands under the cache
command called cache touch
and cache evict
. These commands allow you to touch and evict files and directories:
Bring path into memory:
$ vmprobe cache touch /path/to/touch
Kick path our of memory:
$ vmprobe cache evict /path/to/evict
vmprobe provides a flexible cache show
command:
$ vmprobe cache show manual.pdf
==== mincore ====
31M/359M (8.8%) ▁▁▁▅▁▁▁▁▁▁▁▁▁▁▄▃▁▁▁▁▁▄▁▁▁
Total: 31M/359M (8.8%)
In the above command, we see that a 359 megabyte file has about 9% of its pages, amounting to 31 megabytes, currently in memory. The mincore
header indicates that these pages are resident in memory according to the mincore system call.
The unicode bar characters indicate which parts of the file are actually in memory. The larger the bar, the more of the represented part of the file is in memory.
If you wish to see a more detailed break-down of this, you can specify the number of "buckets" with the --width N
or -w N
option:
$ vmprobe cache show -w 1000 manual.pdf
==== mincore ====
31M/359M (8.8%) ███▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁▁▁▁ ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅██████████▆▁▁▄█████▂▁▃█████▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁▁▁ ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁▁▁▁ ▁▁▁▁ ▁▁▁ ▁▁▁▁▁▁▁▁▁▁ ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃███████████████████████▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆███████████████▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂▄ ▁█ ▅ ▁█▂
Total: 31M/359M (8.8%)
In addition to displaying the mincore
state, you can also inspect other linux-specific properties of pages. These flags are described in more detail in the linux kernel pagemap documentation
$ vmprobe cache show -f active,mincore,referenced manual.pdf
==== active ====
20M/359M (5.7%) ▁▁▁▂▁▁▁▁▁▁▁▁▁▁▃▂▁▁▁▁▁▃▁▁▁
Total: 20M/359M (5.7%)
==== mincore ====
31M/359M (8.8%) ▁▁▁▅▁▁▁▁▁▁▁▁▁▁▄▃▁▁▁▁▁▄▁▁▁
Total: 31M/359M (8.8%)
==== referenced ====
17M/359M (4.8%) ▁▁▁▄▁▁▁▁▁▁▁▁▁▁▂▂▁▁▁▁▁▂▁▁▁
Total: 17M/359M (4.8%)
Note that in order for this to work, you need to be able to sudo
without a password, like you can by default in cloud providers like AWS. To set this up, if your user is ec2-user
, run sudo visudo
and add the following line:
ec2-user ALL = NOPASSWD: ALL
There is a --refresh [N]
or -r [N]
option that will cause vmprobe
to re-issue this command every N
seconds until stopped with a control-c:
$ vmprobe cache show -r 0.5 manual.pdf
==== mincore ====
31M/359M (8.8%) ▁▁▁▅▁▁▁▁▁▁▁▁▁▁▄▃▁▁▁▁▁▄▁▁▁
Total: 31M/359M (8.8%)
==== mincore ====
31M/359M (8.8%) ▁▁▁▅▁▁▁▁▁▁▁▁▁▁▄▃▁▁▁▁▁▄▁▁▁
Total: 31M/359M (8.8%)
^C
This can be useful for monitoring the changes of a file over time.
If a directory is passed in, all files that are partially or fully in memory are displayed, sorted by the amount that is in memory.
For example, here is the output after running a git log
in a previously all paged out repository:
$ vmprobe cache show .
==== mincore ====
148K/264K (56.1%) ████▅ ▂ /.git/objects/pack/pack-ea799cc69c90b5ac41886febabd280d88ca88ca2.pack
24K/24K (100.0%) ▇ /vmtouch.c
12K/12K (100.0%) ▄ /.git/objects/pack/pack-ea799cc69c90b5ac41886febabd280d88ca88ca2.idx
4K/4K (100.0%) ▂ /.git/packed-refs
4K/4K (100.0%) ▂ /.gitignore
4K/4K (100.0%) ▂ /.git/refs/stash
4K/4K (100.0%) ▂ /.git/refs/remotes/origin/support-block-devices
4K/4K (100.0%) ▂ /.git/refs/remotes/origin/master
4K/4K (100.0%) ▂ /.git/refs/remotes/origin/HEAD
4K/4K (100.0%) ▂ /.git/refs/remotes/bucaneer/rangefix
4K/4K (100.0%) ▂ /.git/refs/remotes/bucaneer/master
4K/4K (100.0%) ▂ /.git/refs/remotes/bucaneer/hp-ux-support
4K/4K (100.0%) ▂ /.git/refs/heads/support-block-devices
4K/4K (100.0%) ▂ /.git/refs/heads/rangefix
4K/4K (100.0%) ▂ /.git/refs/heads/master
4K/4K (100.0%) ▂ /.git/HEAD
4K/4K (100.0%) ▂ /.git/objects/ed/a947c05a032f9c858ab74b939d6ecc58abbd17
4K/4K (100.0%) ▂ /.git/objects/e0/75205342dc78828e7d374d861e4a585cb21112
4K/4K (100.0%) ▂ /.git/objects/dc/006844c4ce4110d13195662f12a7897dd033a6
4K/4K (100.0%) ▂ /.git/objects/b8/965f6e633c0db6a54137889945980213cd3716
4K/4K (100.0%) ▂ /.git/objects/9d/ea9a62bc86d20a5b488a1e4c45b71a752de8f4
4K/4K (100.0%) ▂ /.git/objects/49/2e38f2b9f47ca50a20455684676f4aaba66bc7
4K/4K (100.0%) ▂ /.git/objects/26/2ef16e1d55814d8e9294811cf449d929fcc57c
4K/4K (100.0%) ▂ /.git/info/exclude
4K/4K (100.0%) ▂ /.git/index
4K/4K (100.0%) ▂ /.git/config
Total: 276K/1M (22.5%)
You can limit the files that are displayed to a minimum size in memory with the --min [size]
or -m [size]
option:
$ vmprobe cache show -m 10k .
==== mincore ====
148K/264K (56.1%) ████▅ ▂ /.git/objects/pack/pack-ea799cc69c90b5ac41886febabd280d88ca88ca2.pack
24K/24K (100.0%) ▇ /vmtouch.c
12K/12K (100.0%) ▄ /.git/objects/pack/pack-ea799cc69c90b5ac41886febabd280d88ca88ca2.idx
Total: 276K/1M (22.5%)
Or the --num [N]
or -n [N]
option to provide a specific limit:
$ vmprobe cache show -n 2 .
==== mincore ====
148K/264K (56.1%) ████▅ ▂ /.git/objects/pack/pack-ea799cc69c90b5ac41886febabd280d88ca88ca2.pack
24K/24K (100.0%) ▇ /vmtouch.c
Total: 276K/1M (22.5%)
If any additional arguments are passed to cache show
they will be interpreted as a command to run. The filesystem cache state will be displayed before and after the command.
For example, here we can see if we run the md5sum
command on our manual it will read all of the pages in:
$ vmprobe cache show manual.pdf -- md5sum manual.pdf
==== mincore ====
31M/359M (8.8%) ▁▁▁▅▁▁▁▁▁▁▁▁▁▁▄▃▁▁▁▁▁▄▁▁▁
Total: 31M/359M (8.8%)
Running command 'md5sum manual.pdf'
882fae40954e2c1211f633ac91ddac42 manual.pdf
==== mincore ====
359M/359M (100.0%) █████████████████████████
Total: 359M/359M (100.0%)
Commands can be combined with a refresh rate to get an idea about the progress occuring throughout a command:
$ vmprobe cache evict manual.pdf
$ vmprobe cache show -r 0.2 manual.pdf -- md5sum manual.pdf
==== mincore ====
Total: 0K/359M (0.0%)
Running command 'md5sum manual.pdf'
==== mincore ====
49M/359M (13.8%) ███▄
Total: 49M/359M (13.8%)
==== mincore ====
100M/359M (28.1%) ███████▁
Total: 100M/359M (28.1%)
==== mincore ====
153M/359M (42.7%) ██████████▆
Total: 153M/359M (42.7%)
==== mincore ====
205M/359M (57.2%) ██████████████▃
Total: 205M/359M (57.2%)
==== mincore ====
258M/359M (71.9%) ██████████████████
Total: 258M/359M (71.9%)
==== mincore ====
310M/359M (86.5%) █████████████████████▅
Total: 310M/359M (86.5%)
882fae40954e2c1211f633ac91ddac42 manual.pdf
==== mincore ====
359M/359M (100.0%) █████████████████████████
Total: 359M/359M (100.0%)
vmprobe allows you to export a raw snapshot with the --raw
option to cache show
. It will print the filesystem cache state as a special binary file to standard output. This can then be saved to a snapshot file:
$ vmprobe cache show --raw manual.pdf > 3-chapters.snapshot
Later on, to restore this virtual memory state, you can use the cache restore
command which reads a snapshot file from its standard input:
$ vmprobe cache restore manual.pdf < 3-chapters.snapshot
cache restore
can also accept expressions (see below):
$ vmprobe cache restore manual.pdf [expression goes here]
In addition to displaying the current state to the screen and saving raw snapshots to files, vmprobe also allows you to save snapshots to a database on your filesystem.
By default this database is stored in your home directory in a directory called .vmprobe/
. To initialize this directory, run the following command:
$ vmprobe db init
vmprobe db initialized: /home/doug/.vmprobe
Now you can store snapshots by using the --save
or -s
option to the cache show
command:
$ vmprobe cache show manual.pdf --save
Probe id: jZn7ArGAj2vKV0n5X4vNbI
Entry: 1464884200626337
The "probe id" is a randomly generated identifier for this particular invocation of the show command, and the entry is a timestamp of the single sample that was collected. If you use --refresh
and/or commands, you will get multiple samples grouped into the same probe:
$ vmprobe cache show -r 1 manual.pdf --save
Probe id: EQqbpemnlKokBUDDDQb2vL
Entry: 1464884323567012
Entry: 1464884324572236
Entry: 1464884325578871
Entry: 1464884326585498
^C
You can list the probes that have been saved to the DB with the db probes
command:
$ vmprobe db probes
EQqbpemnlKokBUDDDQb2vL
jZn7ArGAj2vKV0n5X4vNbI
The --long
or -l
option will print more information for each probe:
$ vmprobe db probes -l
EQqbpemnlKokBUDDDQb2vL
Created: Thu Jun 2 12:18:43 2016 (2m ago)
Updated: Thu Jun 2 12:18:46 2016 (2m ago)
Params: path=/home/doug/manual.pdf refresh=1 type=cache
jZn7ArGAj2vKV0n5X4vNbI
Created: Thu Jun 2 12:16:40 2016 (4m ago)
Updated: Thu Jun 2 12:16:40 2016 (4m ago)
Params: path=/home/doug/manual.pdf type=cache
Similarly, the entries associated with a probe can be listed with the db entries
command (it also supports --long
/-l
):
$ vmprobe db entries EQqbpemnlKokBUDDDQb2vL
1464884326585498
1464884325578871
1464884324572236
1464884323567012
Later on, you can inspect the entries with the db show
command:
$ vmprobe db show 1464884326585498
31M/359M (8.8%) ▁▁▁▅▁▁▁▁▁▁▁▁▁▁▄▃▁▁▁▁▁▄▁▁▁
The db show
command in fact accepts expressions. An entry is a very simple form of expression, but there are much more sophisticated possibilities, as described in the next section.
An expression is a way to specify a snapshot as a composition of one or more snapshots in the database.
For example, let's load manual.pdf
into memory with the md5sum
command:
$ vmprobe cache show -r 0.1 -s manual.pdf -- md5sum manual.pdf
Probe id: 1Tar0d3xM6jDJF6QTEU0Yr
Entry: 1464894939589591
Running command 'md5sum manual.pdf'
Entry: 1464894939694158
Entry: 1464894939800251
Entry: 1464894939906655
Entry: 1464894940013178
Entry: 1464894940120339
Entry: 1464894940227235
Entry: 1464894940334272
Entry: 1464894940442320
Entry: 1464894940550102
Entry: 1464894940658847
Entry: 1464894940767649
Entry: 1464894940876587
Entry: 1464894940985945
882fae40954e2c1211f633ac91ddac42 manual.pdf
Entry: 146489494103779
Since md5sum
starts reading at the start of the file and reads the whole way through, every entry will have more and more of the file loaded (assuming we have enough free memory to hold the file and our OS doesn't start freeing pages):
$ vmprobe db show 1464894939694158
23M/359M (6.7%) █▆
$ vmprobe db show 1464894939906655
77M/359M (21.5%) █████▃
$ vmprobe db show 1464894940550102
238M/359M (66.5%) ████████████████▅
We can combine these snapshots with boolean operators. For example, here we are computing all the pages that are in the snapshot 1464894940550102
but are not in snapshot 1464894939906655
:
$ vmprobe db show '1464894940550102 - 1464894939906655'
161M/359M (45.0%) ▆██████████▅
And here we show how snapshots can arbitrarily be combined by adding in 1464894939694158
:
$ vmprobe db show '(1464894940550102 - 1464894939906655) + 1464894939694158'
185M/359M (51.7%) █▆ ▆██████████▅
Note that cache restore
also accepts expressions.
The following table describes the currently supported operators:
Operator | Name | Description |
---|---|---|
| | Union/Addition | All pages present in either or both input snapshots. |
+ | Union/Addition | Identical to the | operator. |
& | Intersection | Only pages that are present in both of the input snapshots. |
- | Subtraction | All pages present in the first (left-side) input snapshot, as long as they don't appear in the second (right-hand) input snapshot. |
^ | Delta | All pages that are different in the two input snapshots. |
For long running probes, you may wish to refer to the latest (or first) snapshot of the probe. In this case, simply specify the probe id and access the last
or first
methods:
$ vmprobe db show 1Tar0d3xM6jDJF6QTEU0Yr.last
359M/359M (100.0%) █████████████████████████
Finally, if you capture more than one flag in an entry, you need to specify the flag you are interested in with the flag
method:
$ vmprobe cache show -f active,mincore,referenced manual.pdf -s
Probe id: CeRqln5r0Gv7PbNG0kftxf
Entry: 1464895801826439
$ vmprobe db show '1464895801826439.flag(referenced)'
212M/359M (59.1%) ███▅ ▆██████████▅