About

Chalk supports collecting a wide range of metadata. The type of data to be collected, the directives that define when collection will happen, and the specification of where the collected metadata will be getting sent to are defined in chalk configurations (configs). In this section we will cover the core components of a chalk config and how they come together.

Reports

Reports are at the core of chalk as its the mechanism using which chalk informs about all the metadata it collects. We ask chalk to collect metadata we care about, and that metadata always ends up in a report, which is always in JSON format.

Think of the report as a document or binary object that is sent to an output destination: it can be embedded in an artifact (e.g., injected in an executable), sent to a web endpoint, or stored to a local or remote filesystem.

A report might be getting emitted under different conditions – this is most often done during core chalk operations, such as an insert, exec, etc. – but reports can also be configured to be emitted periodically or when a condition is met.

The sections below discuss how exactly we can configure reports to be emitted and what data ends being part of a report.

Report Templates

The exact metadata that will be included in a report are defined in templates, which are collections of metadata keys (with optional conditions on when said metadata should be getting emitted). The same template can be re-used across many reports. However, each of the different reports making use of the template could have different trigger/generation conditions and different destinations.

Here is an excerpt from the template used by default for any metadata extracted upon a chalk insert operation:

report_template insertion_default {
  shortdoc: "The default template for insertion operations"
  [...]
  if not in_container() {
    key._OP_ALL_PS_INFO.use                   = false
  }
  key.CHALK_VERSION.use                       = true
  key.DATE_CHALKED.use                        = false
  key.TIME_CHALKED.use                        = false
  key.TZ_OFFSET_WHEN_CHALKED.use              = false
  key.DATETIME_WHEN_CHALKED.use               = false
  key.EARLIEST_VERSION.use                    = false
  [...]
  # Runtime host keys.
  key._ACTION_ID.use                          = true
  key._ARGV.use                               = true
  key._ENV.use                                = true
  key._TENANT_ID.use                          = true
  key._OPERATION.use                          = true
  key._TIMESTAMP.use                          = true
  [...]
}

We define a report template using the report_template type definition, followed by the template name (in this case insertion_default). Note that the template contains definitions about what metadata keys to export (set to true), and which to avoid (set to false) and under which conditions. For instance, if we are not within a docker container, _OP_ALL_PS_INFO metadata will not be emitted.

For all default report template definitions, see the base report templates file.

This guide will not cover individual metadata keys in depth. All you need to know is that we can define whether or not we care about a particular key inside a template.

Chalkmark Templates

Chalkmarks are always embedded in an artifact (e.g., an ELF file, a python script, or a docker container). We consider an artifact that has a chalk mark to be “chalked”, and chalk marks are included as part of a chalk report if reporting on a chalked artifact.

Contrary to regular reports, there are restrictions on what metadata can be included in a chalkmark. In particular, no metadata that is collected at runtime (such as network connections or currently running processes) can be included in chalkmarks.

Templates that define which keys are included in a chalk mark are of the type mark_template. For instance, here is the “minimal” mark_template which comes as a built-in with chalk:

mark_template minimal {
  shortdoc: "Used for minimal chalk marks."
  doc: """
This template is intended for when you're durably recording artifact
information, and want to keep just enough information in the mark to
facilitate other people being able to validate the mark.

This is the default for `docker` chalk marks.
"""
  key.DATETIME_WHEN_CHALKED.use               = true
  key.CHALK_PTR.use                           = true
  key.SIGNATURE.use                           = true
  key.INJECTOR_PUBLIC_KEY.use                 = true
  key.$CHALK_CONFIG.use                       = true
  key.$CHALK_IMPLEMENTATION_NAME.use          = true
  key.$CHALK_LOAD_COUNT.use                   = true
  key.$CHALK_PUBLIC_KEY.use                   = true
  key.$CHALK_ENCRYPTED_PRIVATE_KEY.use        = true
  key.$CHALK_ATTESTATION_TOKEN.use            = true
}

For all default chalk mark template definitions, see the base chalk templates file.

In chalk, metadata keys that start with an _ denote that the metadata is collected at runtime. For instance, _TIMESTAMP corresponds to the timestamp at the time of the chalk operation (the time at which chalk insert or chalk docker build was run). These keys will show up in chalk reports but they will never appear in the embedded chalk marks.

Metadata keys starting with $ denote keys that are used by chalk internally. These keys must be embedded in the chalk mark for certain chalk features, such as attestation, to work properly.

The report templates and mark templates associated with supported chalk operations can be viewed here.

Chalk Configurations

A chalk configuration is a collection of specifications that define when reports are to be created (what will be the condition for publishing the reports) and where reports are to be sent (what will be the sinks for the reports). Moreover, they contain information on what templates are to be used for the different reports.

Sinks

A report can be sent to one or more destinations, known as output sinks, such as the local filesystem, an S3 bucket, or an API. For instance, the following snippet defines a sink named log_file_sink, which denotes that reports sent to it will be getting stored in local disk at ~/test_sink.log:

sink_config log_file_sink {
  sink: "file"
  filename: "~/test_sink.log"
}

Note that simply loading a configuration with log_file_sink defined will not write any chalk reports to ~/test_sink.log on chalk operations. To push output to any sink, the sink must subscribe to the types of reports that it wants to monitor.

Virtually all output in Chalk is handled through a ‘pub-sub’ (publish-subscribe) model. Chalk actions “publish” data to “topics”, then sinks listen to (“subscribe”) those topics. For instance, to send all reports to your newly created log_file_sink you may specify

subscribe("report", "log_file_sink")

Chalk comes with a set of sinks already configured for both chalkmarks and reports, and different chalk operations send data to different sinks by default. In particular, note that chalk reports sent to terminal will often be an abbreviated version of the full report that is written to a log file or pushed to s3.

For a full list of what sinks are available by default, see here.

Beyond this document, there’s an extensive amount of reference material for users:

Name	What it is
Writing Custom Configs	An guide on customizing configs.
Collecting Custom Keys	An guide on collecting custom metadata keys
Frequently Asked Questions	Frequently asked questions about configuration.

Using Chalk Exec Reports Collecting Custom Metadata Keys