We believe empowering engineers drives innovation.

An Introduction to Osquery

By Michael Ell
January 5, 2024

An Introduction to Osquery

What is osquery?

Osquery is an incredibly powerful and flexible endpoint detection and observability tool. It exposes an OS as a relational database that can be queried using a SQL-like query language. Using this there is a huge amount of security, compliance, and even sysadmin tasks that can be empowered by it.

How do I run osquery?

Osquery generally supports two modes of operation, an interactive shell mode (as the executable osqueryi) or as a daemonized service (as the executable osqueryd). In interactive mode the user can directly query the host from the command line, it is generally used for troubleshooting and developing queries.

Osquery’s daemonized mode (as the executable osqueryd) is the much more interesting mode of operation, in this form osquery will call to enroll with a server and can run scheduled or ad-hoc queries, as well as log the results of those back to the server (or directly out to S3 or Firehose) helping to tie it into a SIEM and/or SOC directly.

To get started on a Mac you can easily install osquery using homebrew with brew install --cask osquery and then launch osqueryi, for Linux and Windows installation you can refer to the packages and installation instructions provided by the Linux Foundation @ osquery.io

How do I use osquery?

Osquery relies on queries being written in an SQL-like language, it uses a variant of the SQLite3 dialect of SQL which makes it relatively easy for developers and engineers to write queries with even minimal SQL experience. A query to get some hardware information from a host may look something like this:

SELECT
   system_info.hostname as hostname,
   system_info.cpu_type as cpu_type,
   system_info.cpu_brand as cpu_brand,
   system_info.cpu_logical_cores as core_count,
   system_info.physical_memory / 1073741824 as physical_memory_gb,
   mounts.blocks_free * mounts.blocks_size / 1073741824 as free_space_gb
FROM
   system_info,
   mounts
WHERE
   mounts.path = "/";

In this case we are calling two of osquery’s tables the system_info and mounts tables, selecting the specific columns we want to see in the output as well as aliasing those column names for readability, filtering one table with a where statement, even doing some in-line arithmetic to present some readable data! All standard SQL operations working as expected and when running the query in osqueryi we get a result that looks like this:

| hostname         | cpu_type | cpu_brand | core_count | physical_memory_gb | free_space_gb |
+------------------+----------+-----------+------------+--------------------+---------------+
| michaelell-rearc | arm64e   | Apple M1  | 8          | 8                  | 105           |
+------------------+----------+-----------+------------+--------------------+---------------+

In osqueryd form it outputs to the logger as a json array with each row it’s own object in the array containing key-value pairs for each column, we’ll see an example of that in a later article where we deploy fleet and osquery in a lab.

What is a table in osquery and how do I find out about them?

A table in osquery is a function (or functions) declared in osquery’s spec, most of them they rely on API calls provided by the OS but some tables do implement their own logic. If we examine the code for system_info used in the query we created above, we can see that much of the data is pulled from OSX’s sysctl.

The osquery website has an excellent schema documentation that tracks the changes in tables between versions and filters by OS, as of the latest release (5.10.2) there are over 270 tables supported across Linux, MacOS, and Windows.

From within osqueryi as well you can list all tables supported on your platform, with .tables to list the tables and either .schema system_info or .types select * from system_info to list the columns and their SQL types.

Osquery is also extensible and supports a robust system for loading extensions which can provide addition tables beyond the standard schema. Osquery Extensions use the Thrift API, so any language that supports the Thrift API can be used to create new tables, custom loggers, custom configuration managers, or any other osquery feature an engineer could need.

YARA and Query costs

Some tables in Osquery like the yara table can be end up being quite costly to invoke and should be used sparingly. The yara table extends osquery’s detection capabilities by adding a YARA rules engine into the query, YARA rules can be used to further inspect the contents of files found using a query. When writing a query they can be defined in-line with the query, referenced to a URL, or local file path.

SELECT 
    * 
FROM 
    yara 
WHERE 
    path LIKE '/tmp/%%' 
    AND sigrule = 'rule text { strings: $txt = "some_malicious_content" condition: $txt}'
    AND count > 0;

The wildcard search term of %% allows for osquery to traverse recursively and finds our target file.

+-----------------------+---------+-------+-----------+---------+---------+------+
| path                  | matches | count | sig_group | sigfile | strings | tags |
+-----------------------+---------+-------+-----------+---------+---------+------+
| /tmp/sketchy_file.txt | text    | 1     |           |         |         |      |
+-----------------------+---------+-------+-----------+---------+---------+------+

If this is used to inspect files though the entire file will be loaded into the host’s memory which may cause undesirable performance issues. When running costly queries it is best practice to try and limit the scope and thus the performance impact of a query on a host. The above query takes several seconds to run and uses a relatively small amount of the host’s resources, changing to path LIKE '/%%' will now cause osquery to attempt to recursively traverse the entire host’s filesystem consuming considerably more resources and taking several minutes to several hours to complete.

Osquery when running in osqueryd mode supports a feature called Watchdog will spawn an additional osqueryd process that monitors the resource consumption of the main osqueryd process and will stop queries that exceed it’s configured bounds of CPU and memory usage. Watchdog will also denylist a query if it kills it twice within 24 hours which will prevent it’s execution for the next 24 hours. Osquery indicates which scheduled queries have been denylisted in it’s osquery_schedule table along with the most recent performance metrics for the query.

How do I write a Query well?

Well written osquery queries follow the same guidelines that well written SQL queries do. They should return only the expected columns, so avoid using the * argument in select statements, this both aids in readability and reduces the size of the content returned by the logger plugin. Queries, especially those interacting with the host’s filesystem, should be well scoped and avoid using wild cards and should use WHERE statements to narrow. Don’t be afraid to alias columns to give a clearer and more concise column heading to clarify the units returned.

Using these guidelines let’s re-write the YARA query above into a cleaner query. First we’ll change the SELECT * to SELECT path, count, since the other columns don’t provide any useful information to us in this case. Next we’ll change our path to argument to '/tmp/%.txt' to prevent un-needed path traversal and to limit to only files with the specified extension, this keeps the YARA rules engine from processing files that in this case it doesn’t need to.

SELECT 
    path,
    count 
FROM 
    yara 
WHERE 
    path LIKE '/tmp/%.txt' 
    AND sigrule = 'rule text { strings: $txt = "some_malicious_content" condition: $txt}'
    AND count > 0;

In the end we get this cleaner and smaller result that returns much more quickly than our first query did.

+-----------------------+-------+
| path                  | count |
+-----------------------+-------+
| /tmp/sketchy_file.txt | 1     |
+-----------------------+-------+

Where to go from here

Osquery is a really powerful tool for discovering information about your hosts, but we’ve only examined how to write queries and how to interact with the osqueryi command line. In a future article we’ll dive into how to configure and run osquery at scale across many hosts using fleet.