Sinks: where collected data is sent
Chalk allows you to configure reports (what key data we collect), sinks (where reports are sent), and custom keys (what entirely new data can we enable reports to collect).
This guide is about sinks (where reports are sent).
The default_out sink
Reports are sent by default to the default_out sink which appends
entries (JSON-newline format) to ~/.local/chalk/chalk.log.
Unsubscribing from default_out
You can disable this by creating a .c4m file and unsubscribing from the "report" topic.
$ cat null.c4m
unsubscribe("report", "default_out")
And having Chalk load it.
chalk load ./null.c4m
This isn’t very useful on its own since now we’ll just not have reports written anywhere at all. But it’s useful if we want to disable the default and have reports go somewhere else.
The file sink type
We can send reports to another path by creating a new sink and
subscribing "reports" to it.
$ cat myfile.c4m
sink_config my_file_out {
enabled: true
sink: "file"
filename: "/tmp/chalkdata.jsonl"
}
subscribe("report", "my_file_out")
If we chalk load ./myfile.c4m and that was the only config we had
loaded, Chalk reports would get sent both to the default
(~/.local/chalk/chalk.log) and to /tmp/chalkdata.jsonl.
File-specific sink parameters in detail
| Parameter | Type | Required | Description |
|---|---|---|---|
filename | string | yes | The file name for the output. |
log_search_path | list[string] | no | An ordered list of directories for the file to live. |
use_search_path | bool | no | Controls whether or not to use the log_search_path at all. Defaults to true. |
The log file consists of rows of JSON objects (the jsonl format).
The log_search_path is a list of paths that the system will march
down, trying to find a place where it can open the named sink,
skipping directories where there isn’t write permission. If no value
is provided, the default is ["/var/log/", "~/log/", "."].
If the filename parameter has a slash in it, it will always be tried
first, before the search path is checked.
If nothing in the search path is openable, or if no search path was given, and the file location was not writable, the system tries to write to a temporary file as a last resort.
If use_search_path is false, the system just looks at the filename
field; if it’s a relative path, it resolves it based on the current
working directory. In this mode, if the log file cannot be opened,
then the sink configuration will error when used.
The rotating_log sink type
If you’d like to write to files of bounded sizes you can use the
rotating_log sink type. This sink behaves like a ring buffer.
Here’s an example to show how it works (though you won’t ever want the
truncation_amount so small in the real world).
$ cat myrotatinglog.c4m
sink_config my_rotating_log_out {
enabled: true
sink: "rotating_log"
filename: "chalkdata.jsonl"
max: <<10kib>>
log_search_path: ["/tmp/"]
}
subscribe("report", "my_rotating_log_out")
Rotating log-specific paramters in detail
| Parameter | Type | Required | Description |
|---|---|---|---|
filename | string | true | The name to use for the log file. |
max | Size | true | The size at which truncation should occur. |
log_search_path | list[string] | false | An ordered list of directories for the file to live. |
truncation_amount | Size | false | The target size to which the log file should be truncated. |
When the file size reaches the max threshold (in bytes), it is
truncated, removing records until it has truncated truncation_amount
bytes of data. If the truncation_amount field is not provided, it is
set to 25% of max.
The log file consists of rows of JSON objects (the jsonl
format). When we delete, we delete full records, from oldest to
newest. Since we delete full reocrds, we may delete slightly more than
the truncation amount specified as a result.
The deletion process guards against catastrophic failure by copying
undeleted data into a new, temporary log file, and swapping it into
the destination file once finished. As a result, you should assume you
need 2x the value of max available in terms of disk space.
max and truncation_amount should be Size objects (e.g., << 100mb >>).
The s3 sink type
We can also send reports to S3-compatible systems.
Let’s grab rclone, a single binary that will give us a local S3 API.
Create a directory for rclone to serve from, and a bucket within that directory.
mkdir ./data # Directory to serve from
mkdir ./data/myorg # A bucket
And run rclone with auth settings to mimic a production API.
rclone serve s3 --auth-key "mykey,mysecret" ./data
Now in another window create a config for an S3 sink.
$ cat mys3.c4m
sink_config my_s3_out {
enabled: true
sink: "s3"
endpoint: "http://localhost:8080"
uri: "s3://myorg/chalkdata/report.json"
uid: "mykey"
secret: "mysecret"
region: "us-east-1"
}
subscribe("report", "my_s3_out")
Load it up in Chalk and Chalk a binary.
$ chalk load ./mys3.c4m
$ cp $(which cp) cp2
$ chalk insert ./cp2
$ ls ./data/myorg/chalkdata
1778607576222-BF1YCE5Y6PRQ9BDCD6PP1VB3P0-report.json
Amazon S3 vs S3-compatible APIs
If you are publishing to an S3-compatible API you must set the
endpoint field. Set it to the base URL of the service and keep uri
in the normal s3://bucket-name/object-path form.
The region field is still required by the AWS SigV4 signing
algorithm; for most S3-compatible stores any non-empty value
(e.g. "us-east-1") works.
AWS environment variables
While the S3 sink will not automatically read your existing AWS
environment variables, you can forward them within the Chalk config
with the env() builtin.
sink_config s3_sink_config {
enabled: true
sink: "s3"
region: env("AWS_REGION")
uri: env("AWS_S3_BUCKET_URI")
uid: env("AWS_ACCESS_KEY_ID")
secret: env("AWS_SECRET_ACCESS_KEY")
}
S3-specific sink parameters in detail
| Parameter | Type | Required | Description |
|---|---|---|---|
uid | string | true | A valid AWS access key ID |
secret | string | true | A valid AWS secret access key |
token | string | false | AWS session token |
uri | string | true | The URI for the bucket in s3: format; see below |
region | string | true | The AWS region (or any non-empty string for S3-compatible stores) |
extra | string | false | A prefix added to the object path within the bucket |
endpoint | string | false | Custom S3-compatible endpoint URL (e.g. http://localhost:9000 for MinIO or http://localhost:8080 for rclone serve s3). When omitted, the standard AWS endpoint is used. |
To ensure uniqueness, each run of chalk constructs a unique object name. Here are the components:
- An integer consisting of the machine’s local time in ms
- A 26-character cryptographically random ID (using a base32 character set)
- The value of the
extrafield, if provided. - Anything provided in the
urifield after the host.
These items are separated by dashes.
The timestamp goes before the timestamp to ensure files are listed in a sane order.
The user is responsible for making sure the last two values are valid; this will not be checked; the operation will fail if they are not.
Generally, you should not use dots in your bucket name, as this will thwart TLS protection of the connection.
The post HTTP sink type
We can also send reports via HTTP Post method.
Let’s create a simple Python server that accepts POST requests at
/upload and writes the request body to disk with a unique name.
$ cat upload_server.py
import http.server as h, time
class H(h.BaseHTTPRequestHandler):
def do_POST(self):
if self.path != "/upload":
self.send_response(404); self.end_headers(); return
open(f"report-{time.time_ns()}.json","wb").write(
self.rfile.read(int(self.headers["Content-Length"])))
self.send_response(200); self.end_headers()
h.HTTPServer(("127.0.0.1",8000),H).serve_forever()
Now create a data directory for the server and start the server in the data directory.
rm -rf data && mkdir data
cd data && python3 ../upload_server.py
Now create a Chalk config.
$ cat myhttp.c4m
sink_config my_http_out {
enabled: true
sink: "post"
uri: "http://127.0.0.1:8000/upload"
}
subscribe("report", "my_http_out")
Load the Chalk config and Chalk a binary.
$ chalk load ./myhttp.c4m
$ cp $(which cp) cp2
$ chalk insert ./cp2
$ ls data
report-1778609825252755000.json
HTTP-specific sink parameters in detail
| Parameter | Required | Description | |
|---|---|---|---|
uri | string | true | The full URI to the endpoint to which the POST should be made. |
content_type | string | false | The value to pass for the “content-type” header |
headers | dict[string, string] | false | A dictionary of additional mime headers |
disallow_http | bool | false | Do not allow HTTP connections, only HTTPS |
timeout | Duration | false | Connection timeout in ms |
pinned_cert_file | string | false | TLS certificate file |
prefer_bundled_certs | bool | false | Whether to prefer chalk bundled root CA certs |
auth | string | false | Auth configuration for the API |
The post will always be a single JSON object, and the default
content-type field will be application/json. Changing this value
doesn’t change what is posted; it is only there in case a particular
endpoint requires a different value.
If HTTPS is used, the connection will fail if the server doesn’t have
a valid certificate. Unless you provide a specific certificate via the
pinned_cert_file field, self-signed certificates will not be
considered valid.
The underlying TLS library requires certificates to live on the file system. However, you can embed your certificate in your configuration in PEM format, and use config builtin functions to write it to disk, if needed, before configuring the sink.
If additional headers need to be passed (for instance, a bearer
token), the headers field is converted directly to MIME. If you
wish to pass the raw MIME, you can use the mime_to_dict builtin.
For example, the default configuration uses the following sink
configuration:
sink_config my_https_config {
enabled: true
sink: "post"
uri: env("CHALK_POST_URL")
if env_exists("TLS_CERT_FILE") {
pinned_cert_file: env("TLS_CERT_FILE")
}
if env_exists("CHALK_POST_HEADERS") {
headers: mime_to_dict(env("CHALK_POST_HEADERS"))
}
}