# Advanced Log Filtering

Filter logs with regular expressions

# Querying the logs

## Understanding field references

The log tables are queried with a subset of BigQuery SQL syntax. They all have three columns: `event_message`, `timestamp`, and `metadata`.

| column        | description                 |
| ------------- | --------------------------- |
| timestamp     | time event was recorded     |
| event_message | the log's message           |
| metadata      | information about the event |

The `metadata` column is an array of JSON objects that stores important details about each recorded event. For example, in the Postgres table, the `metadata.parsed.error_severity` field indicates the error level of an event. To work with its values, you need to `unnest` them using a `cross join`.

This approach is commonly used with JSON and array columns, so it might look a bit unfamiliar if you're not used to working with these data types.

```sql
select
  event_message,
  parsed.error_severity,
  parsed.user_name
from
  postgres_logs
  -- extract first layer
  cross join unnest(postgres_logs.metadata) as metadata
  -- extract second layer
  cross join unnest(metadata.parsed) as parsed;
```

## Expanding results

Logs returned by queries may be difficult to read in table format. A row can be double-clicked to expand the results into more readable JSON:

![Expanding log results](/docs/img/guides/platform/expanded-log-results.png)

## Filtering with [regular expressions](https://en.wikipedia.org/wiki/Regular_expression)

The Logs use BigQuery Style regular expressions with the [regexp_contains function](https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#regexp_contains). In its most basic form, it will check if a string is present in a specified column.

```sql
select
  cast(timestamp as datetime) as timestamp,
  event_message,
  metadata
from postgres_logs
where regexp_contains(event_message, 'is present');
```

There are multiple operators that you should consider using:

### Find messages that start with a phrase

`^` only looks for values at the start of a string

```sql
-- find only messages that start with connection
regexp_contains(event_message, '^connection')
```

### Find messages that end with a phrase:

`$` only looks for values at the end of the string

```sql
-- find only messages that ends with port=12345
regexp_contains(event_message, '$port=12345')
```

### Ignore case sensitivity:

`(?i)` ignores capitalization for all proceeding characters

```sql
-- find all event_messages with the word "connection"
regexp_contains(event_message, '(?i)COnnecTion')
```

### Wildcards:

`.` can represent any string of characters

```sql
-- find event_messages like "hello<anything>world"
regexp_contains(event_message, 'hello.world')
```

### Alphanumeric ranges:

`[1-9a-zA-Z]` finds any strings with only numbers and letters

```sql
-- find event_messages that contain a number between 1 and 5 (inclusive)
regexp_contains(event_message, '[1-5]')
```

### Repeated values:

`x*` zero or more x
`x+` one or more x
`x?` zero or one x
`x{4,}` four or more x
`x{3}` exactly 3 x

```sql
-- find event_messages that contains any sequence of 3 digits
regexp_contains(event_message, '[0-9]{3}')
```

### Escaping reserved characters:

`\.` interpreted as period `.` instead of as a wildcard

```sql
-- escapes .
regexp_contains(event_message, 'hello world\.')
```

### `or` statements:

`x|y` any string with `x` or `y` present

```sql
-- find event_messages that have the word 'started' followed by either the word "host" or "authenticated"
regexp_contains(event_message, 'started host|authenticated')
```

### `and`/`or`/`not` statements in SQL:

`and`, `or`, and `not` are all native terms in SQL and can be used in conjunction with regular expressions to filter results

```sql
select
  cast(timestamp as datetime) as timestamp,
  event_message,
  metadata
from postgres_logs
where
  (regexp_contains(event_message, 'connection') and regexp_contains(event_message, 'host'))
  or not regexp_contains(event_message, 'received');
```

### Filtering and unnesting example

**Filter for Postgres**

```sql
select
  cast(postgres_logs.timestamp as datetime) as timestamp,
  parsed.error_severity,
  parsed.user_name,
  event_message
from
  postgres_logs
  cross join unnest(metadata) as metadata
  cross join unnest(metadata.parsed) as parsed
where regexp_contains(parsed.error_severity, 'ERROR|FATAL|PANIC')
order by timestamp desc
limit 100;
```

## Limitations

### Log tables cannot be joined together

Each product table operates independently without the ability to join with other log tables. This may change in the future.

### The `with` keyword and subqueries are not supported

The parser does not yet support `with` and subquery statements.

### The `ilike` and `similar to` keywords are not supported

Although `like` and other comparison operators can be used, `ilike` and `similar to` are incompatible with BigQuery's variant of SQL. `regexp_contains` can be used as an alternative.

### The wildcard operator `*` to select columns is not supported

The log parser is not able to parse the `*` operator for column selection. Instead, you can access all fields from the `metadata` column:

```sql
select
  cast(postgres_logs.timestamp as datetime) as timestamp,
  event_message,
  metadata
from
  <log_table_name>
order by timestamp desc
limit 100;
```