Plausible unable to start after stopping containers

After moving my plausible installation to a new server (moving all volumes to the new server, changing paths and then running docker-compose up -d again) plausible won’t start anymore.

I first see some output from the db and then an error from the plausible container:

plausible_1  | Running migrations for Elixir.Plausible.ClickhouseRepo
plausible_1  | 16:51:46.592 [error] Clickhousex.Protocol (#PID<0.245.0>) failed to connect: ** (ErlangError) Erlang error: :econnrefused
plausible_1  | 16:51:47.177 [error] Could not create schema migrations table. This error usually happens due to the following:
plausible_1  | 
plausible_1  |   * The database does not exist
plausible_1  |   * The "schema_migrations" table, which Ecto uses for managing
plausible_1  |     migrations, was defined by another library
plausible_1  |   * There is a deadlock while migrating (such as using concurrent
plausible_1  |     indexes with a migration_lock)
plausible_1  | 
plausible_1  | To fix the first issue, run "mix ecto.create".
plausible_1  | 
plausible_1  | To address the second, you can run "mix ecto.drop" followed by
plausible_1  | "mix ecto.create". Alternatively you may configure Ecto to use
plausible_1  | another table for managing migrations:
plausible_1  | 
plausible_1  |     config :plausible, Plausible.ClickhouseRepo,
plausible_1  |       migration_source: "some_other_table_for_schema_migrations"
plausible_1  | 
plausible_1  | The full error report is shown below.
plausible_1  | 
plausible_1  | ** (DBConnection.ConnectionError) connection not available and request was dropped from queue after 2382ms. This means requests are coming in and your connection pool cannot serve them fast enough. You can address this by:
plausible_1  | 
plausible_1  |   1. By tracking down slow queries and making sure they are running fast enough
plausible_1  |   2. Increasing the pool_size (albeit it increases resource consumption)
plausible_1  |   3. Allow requests to wait longer by increasing :queue_target and :queue_interval
plausible_1  | 
plausible_1  | See DBConnection.start_link/2 for more information
plausible_1  | 
plausible_1  |     (ecto_sql 3.4.4) lib/ecto/adapters/sql.ex:593: Ecto.Adapters.SQL.raise_sql_call_error/1
plausible_1  |     (elixir 1.10.3) lib/enum.ex:1396: Enum."-map/2-lists^map/1-0-"/2
plausible_1  |     (ecto_sql 3.4.4) lib/ecto/adapters/sql.ex:686: Ecto.Adapters.SQL.execute_ddl/4
plausible_1  |     (ecto_sql 3.4.4) lib/ecto/migrator.ex:641: Ecto.Migrator.verbose_schema_migration/3
plausible_1  |     (ecto_sql 3.4.4) lib/ecto/migrator.ex:484: Ecto.Migrator.lock_for_migrations/4
plausible_1  |     (ecto_sql 3.4.4) lib/ecto/migrator.ex:406: Ecto.Migrator.run/4
plausible_1  |     (ecto_sql 3.4.4) lib/ecto/migrator.ex:149: Ecto.Migrator.with_repo/3
plausible_1  |     (plausible 0.0.1) lib/plausible_release.ex:116: Plausible.Release.run_migrations_for/1

After the error the container just exits.

Is this an issue with postgres or clickhouse?

My docker-compose.yml (based on the official one):

version: "3.3"
services:
  db:
    image: postgres:12
    command: ["postgres", "-c", "log_statement=all", "-c", "log_destination=stderr"]
    volumes:
      - /var/apps/plausible/db:/var/lib/postgresql/data
    environment:
      - POSTGRES_PASSWORD=postgres
      - POSTGRES_DB=plausible
      - POSTGRES_USER=postgres
    restart: unless-stopped
  events-db:
    image: yandex/clickhouse-server
    restart: unless-stopped
    volumes:
      - /var/apps/plausible/events-db:/var/lib/clickhouse
  plausible:
    image: plausible/analytics
    command: sh -c "sleep 10 && /entrypoint.sh db migrate && /entrypoint.sh run"
    environment:
      - ENVIRONMENT=production
      - PORT=8080
      - SECRET_KEY_BASE=<secret>
      - SIGNING_SALT=<secret>
      - HOST=analytics.domain.tld
      - SCHEME=https
      - DATABASE_URL=postgres://postgres:postgres@db:5432/plausible
      - DATABASE_TLS_ENABLED=false
      - APP_VERSION=test
      - MAILER_ADAPTER=Bamboo.SMTPAdapter
      - MAILER_EMAIL=plausible@domain.tld
      - SMTP_HOST_ADDR=mail.domain.tld
      - SMTP_HOST_PORT=465
      - SMTP_USER_NAME=plausible@domain.tld
      - SMTP_USER_PWD=<secret>
      - SMTP_HOST_SSL_ENABLED=true
      - SMTP_MX_LOOKUPS_ENABLED=false
      - CLICKHOUSE_DATABASE_HOST=events-db
      - CLICKHOUSE_DATABASE_NAME=plausible_events_db
    depends_on:
      - events-db
      - db
    restart: unless-stopped
    networks:
      - default
      - web
    labels:
      - "traefik.docker.network=web"
      - "traefik.enable=true"
      - "traefik.http.routers.plausible.rule=Host(`plausible.domain.tld`)"
      - "traefik.http.routers.plausible.entrypoints=https"
      - "traefik.http.routers.plausible.tls.certResolver=basic"
      - "traefik.http.services.plausible.loadbalancer.server.port=8080"

networks:
  web:
    external: true

I’ve made a breaking change to the configuration parameters for the clickhouse connection: https://github.com/plausible/analytics/blob/master/CHANGELOG.md

Try replacing your CLICKHOUSE_DATABASE_HOST and CLICKHOUSE_DATABASE_NAME with something like:
CLICKHOUSE_DATABASE_URL=http://default@events-db:8123/plausible_events_db

Sorry about this. I’m gearing up for a 1.0 release (hopefully this week). Once it’s released you can pin your version and not worry about breaking changes like this.

Is that format correct as it appears to be missing the password?

Is it clickhouse:// like the postgres:// format or http(s)?

Also, having what I believe is a correct clickhouse url my working config for a couple of months is broken; is it picking up the password, as I am not sure what has changed otherwise :slight_smile:

Thanks for the answer.

The example you provided didn’t work right away, but I got it working by removing the default user from the connection string:

 CLICKHOUSE_DATABASE_URL=http://events-db:8123/plausible_events_db

Maybe it would be a good idea to tag the docker images with the proper versions?

Now it’s been running migrations like these for a while:

db_1         | 2020-10-03 15:32:15.751 UTC [42] LOG:  execute ecto_98: UPDATE "public"."oban_jobs" AS o0 SET "state" = $1 WHERE (o0."state" IN ('scheduled','retryable')) AND (o0."queue" = $2) AND (o0."scheduled_at" <= $3)
db_1         | 2020-10-03 15:32:15.751 UTC [42] DETAIL:  parameters: $1 = 'available', $2 = 'rotate_salts', $3 = '2020-10-03 15:32:15.751314'
db_1         | 2020-10-03 15:32:15.754 UTC [42] LOG:  execute ecto_insert_oban_beats: INSERT INTO "public"."oban_beats" ("limit","node","nonce","paused","queue","running","started_at") VALUES ($1,$2,$3,$4,$5,$6,$7)
db_1         | 2020-10-03 15:32:15.754 UTC [42] DETAIL:  parameters: $1 = '1', $2 = 'plausible@b1989b89ec15', $3 = 'gr0mnvpp', $4 = 'f', $5 = 'rotate_salts', $6 = '{}', $7 = '2020-10-03 15:26:26.176088'
db_1         | 2020-10-03 15:32:15.756 UTC [42] LOG:  execute ecto_739: UPDATE "public"."oban_jobs" AS o0 SET "state" = $1, "attempted_at" = $2, "attempted_by" = $3, "attempt" = o0."attempt" + $4 FROM (SELECT so0."id" AS "id" FROM "public"."oban_jobs" AS so0 WHERE (so0."state" = 'available') AND (so0."queue" = $5) ORDER BY so0."priority", so0."scheduled_at", so0."id" LIMIT $6 FOR UPDATE SKIP LOCKED) AS s1 WHERE (o0."id" = s1."id") RETURNING o0."id", o0."state", o0."queue", o0."worker", o0."args", o0."errors", o0."tags", o0."attempt", o0."attempted_by", o0."max_attempts", o0."priority", o0."attempted_at", o0."completed_at", o0."discarded_at", o0."inserted_at", o0."scheduled_at"
db_1         | 2020-10-03 15:32:15.756 UTC [42] DETAIL:  parameters: $1 = 'executing', $2 = '2020-10-03 15:32:15.756354', $3 = '{plausible@b1989b89ec15,rotate_salts,gr0mnvpp}', $4 = '1', $5 = 'rotate_salts', $6 = '1'
db_1         | 2020-10-03 15:32:16.757 UTC [42] LOG:  execute ecto_98: UPDATE "public"."oban_jobs" AS o0 SET "state" = $1 WHERE (o0."state" IN ('scheduled','retryable')) AND (o0."queue" = $2) AND (o0."scheduled_at" <= $3)
db_1         | 2020-10-03 15:32:16.757 UTC [42] DETAIL:  parameters: $1 = 'available', $2 = 'rotate_salts', $3 = '2020-10-03 15:32:16.757299'
db_1         | 2020-10-03 15:32:16.759 UTC [42] LOG:  execute ecto_insert_oban_beats: INSERT INTO "public"."oban_beats" ("limit","node","nonce","paused","queue","running","started_at") VALUES ($1,$2,$3,$4,$5,$6,$7)
db_1         | 2020-10-03 15:32:16.759 UTC [42] DETAIL:  parameters: $1 = '1', $2 = 'plausible@b1989b89ec15', $3 = 'gr0mnvpp', $4 = 'f', $5 = 'rotate_salts', $6 = '{}', $7 = '2020-10-03 15:26:26.176088'
db_1         | 2020-10-03 15:32:16.762 UTC [42] LOG:  execute ecto_739: UPDATE "public"."oban_jobs" AS o0 SET "state" = $1, "attempted_at" = $2, "attempted_by" = $3, "attempt" = o0."attempt" + $4 FROM (SELECT so0."id" AS "id" FROM "public"."oban_jobs" AS so0 WHERE (so0."state" = 'available') AND (so0."queue" = $5) ORDER BY so0."priority", so0."scheduled_at", so0."id" LIMIT $6 FOR UPDATE SKIP LOCKED) AS s1 WHERE (o0."id" = s1."id") RETURNING o0."id", o0."state", o0."queue", o0."worker", o0."args", o0."errors", o0."tags", o0."attempt", o0."attempted_by", o0."max_attempts", o0."priority", o0."attempted_at", o0."completed_at", o0."discarded_at", o0."inserted_at", o0."scheduled_at"
db_1         | 2020-10-03 15:32:16.762 UTC [42] DETAIL:  parameters: $1 = 'executing', $2 = '2020-10-03 15:32:16.761525', $3 = '{plausible@b1989b89ec15,rotate_salts,gr0mnvpp}', $4 = '1', $5 = 'rotate_salts', $6 = '1'`

It doesn’t seem to advance since it’s running a query like this every second with only changing the timestamp to the current one. I’m not sure if that’s what it should be doing?

Is that format correct as it appears to be missing the password?

It works for me, although just omitting the username and password works OK as well.

Is it clickhouse:// like the postgres:// format or http(s)?

HTTP is correct here. Clickhouse has an http interface which is used by clickhouse_ecto

Also, having what I believe is a correct clickhouse url my working config for a couple of months is broken; is it picking up the password, as I am not sure what has changed otherwise :slight_smile:

Hmm… have you configured clickhouse with a custom username and password?

Yes, I will tag the 1.0 release which you can pin today/tomorrow.

Are you getting any error logs in your plausible container as well?

Looks like it’s trying to run the daily salt rotation job but the query isn’t working…

No, only the logs in the database container.

I use user plausible_events_user and my own password yes.

I have updated to the latest docker release today and plausible is working okay with no errors in the log, after removing the separately defined variables.

Today not connected to clickhouse again…taking 10 mins to delete its db for a minimal starter site.

According to the author of the Oban library, these queries are normal and expected: https://twitter.com/ukutaht/status/1313093379421593604

But then maybe it shouldn’t be executed as part of the migration and rather in a background job or separate container?

So, after looking at it again I figured out there was another issue in my proxy setup unrelated to plausible. I fixed that and am now able to access plausible again :open_hands:

Thanks again for plausible itself and the quick help - apologies for the noise.

great! thanks for reporting!