Postgresql and Mysql Docker containers

DB, docker, mysql, postgresql

To quickly get an instance of PostgreSQL and MySQL up and running, use the following docker-compose setup:

Create a subdirectory to place the docker-compose.yml and optionally the data files for the DBs:

Windows

cd %USERPROFILE%
mkdir dbs\data\mysql
mkdir dbs\data\psql
cd dbs

Others

cd ~
mkdir -p dbs/data/mysql
mkdir -p dbs/data/psql
cd dbs

Add this docker-compose.yml to start with:

version: '3.1'
services:
  mysql:
    image: mysql
    command: --default-authentication-plugin=mysql_native_password
    restart: always
    environment:
      MYSQL_ROOT_PASSWORD: password
    ports:
      - "3306:3306"
    expose:
      - "3306"
    volumes:
      - ./data/mysql:/var/lib/mysql
  postgresql:
    image: postgres
    environment:
      - POSTGRES_DB=postgres
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=password
    ports:
      - "5432:5432"
    expose:
      - "5432"
    volumes:
      - ./data/psql:/var/lib/postgresql/data

To bring them up:

docker-compose up

By default:

  • The PostgreSQL instance uses postgres / password as the admin.
  • The MySQL instance uses root / password for the admin.

Django Dev Env w/ Docker Compose

django, docker, postgresql, programming

A few years back I ran into a problem when working with Django on Windows while my colleagues were on Mac OS where a datetime routine (forgot which one) behaved differently between us. Even after syncing on the version of Python and Django between us, the discrepancy still existed. Turns out it’s due to the difference between Python on Windows vs. Python on Mac OS. We ended up working around it by not using that routine.

Thinking back now, I guess the problem could’ve been avoided if we used Docker or Vagrant or similar so that we at least are all on the same environment. It’s the type of thing that “real” work environments would’ve been. But since we were working on that project on our own as a hobby, we didn’t think too much about it.

ALSO: Docker Desktop or even Linux on Windows Home was not available at the time, so most likely I would’ve had to wrestle w/ Docker Toolbar and VirtualBox which still had problems with host volumes.

UPDATE: this post has been updated on 2022-05 based on new learnings.

Setting Up Environment in Docker

If I were to do it now, this is how I would do it:

  • Create a subdirectory for DB data. We were using PostgreSQL, so I would create something like C:\dbdata\ and use host volume to mount it to the container’s /var/lib/postgresql/data.
  • Use the postgres and python:3 base images from Docker Hub.

Step-by-step, here’s how I would set it up:

Project scaffold

NOTE: the following is using “myproject” as the name of the Django project. Replace it with the name of your Django project as appropriate.

cd dev/projects
mkdir dj

Create two starter versions of Dockerfile and docker-compose.yml:

Dockerfile

FROM python:3.7-buster
ENV PYTHONUNBUFFERED 1

WORKDIR /code
#COPY Pipfile Pipfile.lock /code/
#
RUN pip install pipenv
#RUN pipenv install

docker-compose.yml

version: '3'
services:
  app:
    build: .
#    command: >
#      sh -c "pipenv run python manage.py migrate &&
#             pipenv run python manage.py runserver 0.0.0.0:8000"
    ports:
      - "8000:8000"
    expose:
      - "8000"
    volumes:
      - ./:/code
    tty: true
    stdin_open: true

Then build and start up the containers:

docker-compose build
docker-compose run --rm app /bin/bash
pipenv install
pipenv install django 
pipenv install <other stuff as needed>

pipenv run django-admin startproject myproject .
pipenv run django-admin startapp myapp

Now uncomment the lines previously commented in Dockerfile and docker-compose.yml.

PostgreSQL Setup

Modify myapp/settings.py to use PostgreSQL:

...
DATABASES = {
    #'default': {
    #    'ENGINE': 'django.db.backends.sqlite3',
    #    'NAME': os.path.join(BASE_DIR, 'db.sqlite3'),
    #}
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': 'postgres',
        'USER': 'postgres',
        'PASSWORD': 'postgres',
        'HOST': 'db',  # MUST match the service name for the DB
        'PORT': 5432,
    }
}
...

All pipenv-related operations should be done inside the container.

docker-compose run --rm app /bin/bash
pipenv install psycopg2-binary

Modify docker-compose.yml to bring up the DB and app containers:

version: '3'
services:
  # service name must match the HOST in myproject/settings.py's
  db:
    image: postgres
    environment:
      # Must match the values in myproject/settings.py's DATABASES
      - POSTGRES_DB=postgres
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=postgres
      # Put the DB data for myproject under myproject_db 
      # so that I can add more projects later
      - PGDATA=/var/lib/postgresql/data/myproject_db
    ports:
      - "5432:5432"
    expose:
      - "5432"
    volumes:
      # host volume where DB data are actually stored
      - c:/dbdata:/var/lib/postgresql/data
  app:
    build: .
    command: >
      sh -c "pipenv run python manage.py migrate &&
             pipenv run python manage.py runserver 0.0.0.0:8000"
    ports:
      - "8000:8000"
    expose:
      - "8000"
    volumes:
      - ./:/code
    depends_on:
      - db

The above:

  • sets up two “services” (containers): a “db” service for the DB in addition to the “app” service for the app.
  • sets up a host mount (for the “db” service) of c:\dbdata to the container’s /var/lib/postgresql/data where PostgreSQL stores/uses data for the DBs. This will allow the data to persist beyond the container’s life time.
  • sets up the PGPATH environment variable that specifies to PostgreSQL the data subdirectory to be /var/lib/postgresql/data/myproject_db which, because of the mount, will end up as c:\dbdata\myproject_db on my Windows host. This allows c:\dbdata to be used as a parent subdirectory for multiple project DBs.

Bring Up The Environment

Just run:

docker-compose up app --build

The above will:

  • Build the images and start the containers for the db and web services.
  • Initialize a new empty PostgreSQL database.
  • Run the Django migrations to prime the database for Django.
  • Run the app and have it listen on port 8000.

NOTE: there may be a race condition in the first run where the DB is still being build/initialize before the web service is starting.

This error happens in that case:

web_1 | psycopg2.OperationalError: could not connect to server: Connection refused
web_1 | Is the server running on host "db" (172.19.0.2) and accepting
web_1 | TCP/IP connections on port 5432?

Just wait until the “db_1” service is finished, hit CTRL-C, and run the

docker-compose up app --build

command again. It should now work fine.

Optionally, start up the “db” service first in the background, then start up the “web” service:

docker-compose up -d db
docker-compose up app

Docker Now Available for Windows Home

docker, Windows

For the longest time, Docker just didn’t like Windows Home. Legend has it that, with VirtualBox running a Linux VM, you then can install Docker Toolbox on top of that. I’ve tried that route. It works until I try to do volume mounts to the host file system. Somehow somewhere in the Docker > Virtualbox > Windows Home link something’s not right.

With the recent changes to Windows to add Linux support, Docker can now run on Windows Home. Unless there are other reasons (e.g. Remote Desktop), there is no need to upgrade to Windows Professional ($99 USD).

Install Linux Support

The process to install Docker on Windows Home is kinda long and spans multiple pages. Here is the summary:

  • Get the “Windows 10 May 2020 Update” or later by downloading and running the “Windows 10 Update Assistant”: https://www.microsoft.com/en-us/software-download/windows10 . NOTE: This will take a while.
  • Run PowerShell as Administrator (Windows-S; type “PowerShell”; click “Run as Administrator”)
  • Run in PowerShell:
    dism.exe /online /enable-feature /featurename:Microsoft-Windows-Subsystem-Linux /all /norestart
  • Restart Windows
  • Run PowerShell as Administrator (Windows-S; type “PowerShell”; click “Run as Administrator”)
  • Run in PowerShell:
    wsl --set-default-version 2
  • Open Microsoft Store and search for “Windows Subsystem for Linux” (WSL) (https://aka.ms/wslstore) and install a Linux distribution (e.g. “Ubuntu”)
  • Run the Linux distribution. You will be prompted to create a user account and set its password. See https://docs.microsoft.com/en-us/windows/wsl/install-win10#troubleshooting-installation for troubleshooting if necessary.

Install Docker for Windows

  • Go to https://hub.docker.com/editions/community/docker-ce-desktop-windows/ and download “Docker Desktop for Windows” (Don’t worry about the wording about Windows Professional). This downloads the “Docker Desktop Installer.exe” file that you then run.
  • Typically this will install into C:\Program Files\Docker\Docker. The runnable is “Docker Desktop.exe“.

References

https://hub.docker.com/editions/community/docker-ce-desktop-windows/

https://docs.microsoft.com/en-us/windows/wsl/install-win10

Some Cloud Storage Services

Java, programming

Azure Blob Storage (“Azure”) and Google Cloud Storage (“GCS”) are now supported as well as AWS S3 (“S3”) by log4j-s3-search project.

While working on adding the two options, I learned a bit about the three storage services. These services’ similarity to one another may not be a coincidence. After all, they are competing offerings from competing service providers, so months/years of competitive analyses may just yield a bunch of similar things.

HTTP and Language SDKs

All three services have the basic HTTP interface:

However, all three services also have language bindings, or client SDKs, for popular programming languages (e.g. Java, Python, C, etc.). And my experience is that working with these SDKs is definitely easier than dealing with them on your own via HTTP. This is especially true for AWS S3, considering the logic used to sign a request.

Storage Model

The models are similar for all three:

  • A global namespace is used for a container of blobs. This can be referred to either a container (Azure) or a bucket (S3 and GCS).
  • Within each container are key-value pairs where the key is a string of characters that may resemble a path-like format to mimic that used by the folder hierarchy in file systems (e.g. “/documents/2020/April/abc.doc“).
    The consoles for these services may interpret keys of that format and present a hierarchical tree-like interface to further the illusion of the hierarchy. However, keep in mind that underneath in the implementation, a key is just a string of characters.
  • The value is a binary stream (a “blob”) of arbitrary data. Typically the services will allow attaching metadata to the key-value entry. One of the common metadata properties is “Content-Type” that is similar to the HTTP header of the same name in usage: to hint to the users of the blob what the content is (e.g. “text/plain,” “application/json,” “application/gzip,” etc.).

Not-So-Quick Walkthrough

The following steps are what I went through in order to upload a file into the services.

In order to use these services, of course, an account with the service is required. All three services have a “free” period for first-time users.

Set Up Authentication

S3

Sign into https://console.aws.amazon.com/

Create an access key. Despite the name, creating one actually yields two values: an access key and a corresponding secret key. The process is a bit clumsy. Be sure to write down and/or download and save the secret key because it is only available during this time. The access key is listed in the console. If the secret key is lost, a new access key needs to be created.

Create a subdirectory .aws and a file credentials under your user’s home directory ($HOME in Linux/Mac, %USERPROFILE% in Windows):

.aws/
    credentials

The contents of the file should be something like (substitute in your actual access and secret keys, of course):

[default]
aws_access_key_id = ABCDEFGABCDEFGABCDEFG
aws_secret_access_key = EFGABCDEFG/ABCDEFGABCDEFGAABCDEFGABCDEFG

That should be sufficient for development purposes.

Azure

Sign into https://portal.azure.com/

Create a Storage account. One of the Settings for the Storage account is Access keys. A pair of keys should have been generated. Any of them will work fine. Just copy down the Connection string of a key to use.

The connection string will be used to authenticate when using the SDK.

Optional: one common pattern I see is that an environment variable AZURE_STORAGE_CONNECTION_STRING is created whose value is the connection string. Then the code will simply look up the environment variable for the value. This will avoid having to hard-code the connection string into the source code.

GCS

Sign into https://console.cloud.google.com/

Create a project. Then create a Service account within the project.

In the project’s IAM > Permissions page, add the appropriate “Storage *” roles to the service account.

Add “Storage Admin” to include everything. After a while, the “Over granted permissions” column will have information on the actual permissions needed based on your code’s usage, and you can adjust then.

Then create a key for the Service account. The recommended type is JSON. This will download a JSON file that will be needed.

Set the environment variable GOOGLE_APPLICATION_CREDENTIALS to the full path to where the JSON file is stored.

Write Code to Upload File

The examples below are in Java.

S3

String bucketName = "mybucket";
String key = "myfile";
File file = new File(...);  // file to upload

AmazonS3Client s3 = (AmazonS3Client)AmazonS3ClientBuilder
    .standard()
    .build();
if (!client.doesBucketExist(bucketName)) {
    client.createBucket(bucketName);
}
PutObjectRequest por = new PutObjectRequest(
    bucketName, key, file);
PutObjectResult result = client.putObject(por);

Azure

This is using the v8 (Legacy) API that I ended up doing. To do this with the newer v12 API, see https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-java#upload-blobs-to-a-container

String containerName = "mycontainer";
String key = "myfile";
File file = new File(...);  // file to upload

String connectionString = System.getenv(
    "AZURE_STORAGE_CONNECTION_STRING");
CloudStorageAccount account = CloudStorageAccount.parse(
    connectionString);
CloudBlobClient blobClient = account.createCloudBlobClient();
CloudBlobContainer container = blobClient.getContainerReference(
    containerName);
boolean created = container.createIfNotExists(
    BlobContainerPublicAccessType.CONTAINER, 
    new BlobRequestOptions(), new OperationContext());

CloudBlockBlob blob = container.getBlockBlobReference(key);
blob.uploadFromFile(file.getAbsolutePath());

GCS

While the other two services have convenient methods to upload a file, the GCS Java SDK does not; it only has a version that uploads a byte[] which is dangerous if your data can be large.

Internet to the rescue, I guess, since this article has one solution by implementing our own buffering uploader:

private void uploadToStorage(
    Storage storage, File uploadFrom, BlobInfo blobInfo) 
    throws IOException {
    // Based on: https://stackoverflow.com/questions/53628684/how-to-upload-a-large-file-into-gcp-cloud-storage

    if (uploadFrom.length() < 1_000_000) {
        storage.create(
            blobInfo, Files.readAllBytes(uploadFrom.toPath()));
    } else {
        try (WriteChannel writer = storage.writer(blobInfo)) {
            byte[] buffer = new byte[10_240];
            try (InputStream input = Files.newInputStream(uploadFrom.toPath())) {
                int limit;
                while ((limit = input.read(buffer)) >= 0) {
                    writer.write(
                        ByteBuffer.wrap(buffer, 0, limit));
                }
            }
        }
    }
}

With that defined, then the upload code is:

String bucketName= "mybucket";
String key = "myfile";
File file = new File(...);  // file to upload

Storage storage = StorageOptions.getDefaultInstance()
    .getService();
Bucket bucket = storage.get(bucketName);
if (null == bucket) {
    bucket = storage.create(BucketInfo.of(bucketName));
}

BlobId blobId = BlobId.of(bucketName, key);
BlobInfo blobInfo = BlobInfo.newBuilder(blobId).build();
uploadToStorage(storage, file, blobInfo);

Setting up Git repo on Dreamhost

programming, Uncategorized

Create an empty repo

Create an empty repo off of /home/username where username is your user name.
The following will create an empty repo named “myrepo” in the subdirectory /home/username/myrepo.git

cd ~
git init --bare myrepo.git

Adding content

Go to where the source code are and initialize a git repository. Then add the files and configure as necessary:

git init
git add ...
git commit ...

Configure a remote repository to map to the repo created earlier in order to push contents to:

git remote add dreamhost ssh://username@server.dreamhost.com/home/username/myrepo.git

The above sets up a remote repo called “dreamhost” that tracks the repo created above. The URL component ssh://username@server.dreamhost.com indicates how to access the server where the repo is. The user name and server names can be found by following the docs from Dreamhost:

https://help.dreamhost.com/hc/en-us/articles/115000675027-FTP-overview-and-credentials#Finding_your_FTP_server_hostname

Finally, push the change up:

git push -u dreamhost master

Pulling content

Most likely on a different machine, use git clone to pull content and start tracking changes:

git clone ssh://username@server.dreamhost.com/home/username/myrepo.git
Cloning into 'myrepo'...
username@server.dreamhost.com's password: xxxxxxxxx
remote: Counting objects: xx, done.
remote: Compressing objects: 100% (10/10), done.
remote: Total xx (delta 1), reused 0 (delta 0)
Receiving objects: 100% (xx/xx), 3.28 KiB | 3.28 MiB/s, done.
Resolving deltas: 100% (1/1), done.

The URL component ssh://username@server.dreamhost.com indicates how to access the server where the repo is as is the case before. The path /home/username/myrepo.git is the path to the remote repo that you create in the first step above.

Now you can use git add, git commit, and git push to add content:

git add ...
...
git commit
...
git push origin master

Or, for a specific branch (mybranch in this example):

git checkout -b mybranch
git add ...
...
git commit
...
git push origin mybranch

Django app dev in Docker

django, docker, programming

Start with Docker’s doc

The link https://docs.docker.com/compose/django/ for the most part is valid in setting up a development environment for working with Django in Docker.

Update to use Pipfile

The document still uses the example of a requirements.txt to handle app dependencies. Django now has at least two newer alternatives:

  • Setup.py
  • Pipfile (pipenv)

One update to use pipenv is to modify the Dockerfile as such:

FROM python:3
ENV PYTHONUNBUFFERED 1

RUN mkdir /code
WORKDIR /code

COPY . /code/
RUN pip install pipenv && pipenv install
RUN pipenv run python manage.py migrate

That last command is optional; it can be deferred in the docker-compose.yml if that works better.

Speaking of docker-compose.yml:

version: '3'

services:
  web:
    build: .
    command: pipenv run python manage.py runserver 0.0.0.0:8000
    volumes:
      - .:/code
    ports:
      - "8000:8000"

(I removed the “db” service to keep this minimalistic.)

Fast Guide to Launching an EC2 Instance w/ SSH Access

AWS, ec2, ssh, Windows

Concepts

Minimal number of concepts to understand:

  • Key pair — a pair of public and private cryptographic keys that will be used to establish a secure shell/terminal to the launched EC2 instance.
  • Security group — a group of access rules that determine what network traffic and go into (inbound rules) and go out of (outbound rules) the EC2 instance.
  • IAM role — a collection of rules that determine what AWS services the EC2 instance will have access to (and what kind of access). E.g. read-only access to S3.
  • AMI — an image that prescribes an OS and some software to run when an EC2 instance comes up.

Shared Key Pair

The only thing that is shared between EC2 and the SSH program that matters in this example is the key pair. The instructions here will describe how to create a new key pair.

Creating a Key Pair

  • Log into the AWS console. The remaining steps to launch an instance will be done in the AWS Console.
  • Access Services > EC2 > Key Pairs from AWS Console.
  • Click “Create Key Pair”
  • Give it a name “KP”
  • Once it’s created, a “.pem” file will be downloaded. Remember the name and where the file is downloaded. It will be needed later.

Create a Security Group

  • Access Services > EC2 > Security Groups
  • Click “Create Security Group” to create a security group. Name it “SG.”
  • In the “Inbound” rules, add an entry for Type SSH, Protocol TCP, Port Range 22. For the Source, select “My IP” to let the tool automatically select your IP address.
  • Add other rules to open up more ports as needed.

Create an IAM Role

  • Access Services > Security, Identity, & Compliance > IAM > Roles
  • Click “Create Role” and select “EC2” (as opposed to Lambda)
  • Click “Next: Permissions”
  • Add permissions as needed (e.g. add “AmazonS3ReadOnlyAccess” if read-only access to S3 is needed).
  • Give the role a name and description.

Launch Instance

  • Access Services > EC2 > EC2 Dashboard
  • Click “Launch Instance”
  • Select an appropriate AMI (e.g. any of the Amazon Linux ones) to use for the EC2 instance. For the instance type, start with “t2.nano” to experiment with since it’s cheapest. Once the instance is up and running, larger instance types can be used as needed.
  • Click “Next: Configure Instance Details.”
  • For IAM role, select the role created above. Everything else can stay as-is.
  • Click “Next: Add Storage.”
  • Edit as desired, but the default is fine.
  • Click “Next: Add Tags.”
  • Add tags as needed. These are optional.
  • Click “Next: Configure Security Group.”
  • Choose “Select an existing security group” and select the security group “SG” created above.
  • Click “Review and Launch.”
  • Click “Launch” after everything looks right.
  • A modal comes up to allow selection of a key pair to use to access the instance. Select “KP” as created above.
  • Continue the launch.
  • Click on the “i-xxxxxxxxx” link to see the status of the instance.
  • Wait until Instance State is “running” and Status Checks is “2/2.”
  • Note the “Public DNS (IPv4)” value. It is the host name to SSH into.

Connecting to The EC2 Instance

Windows with Bitvise SSH Client

  • Download and start Bitvise SSH Client.
  • Click “New profile”
  • Go to “Login” tab
  • Click the link “Client key manager”
  • Click “Import”
  • Change file filter to (*.*)
  • Locate the .pem file downloaded above and import it. Accept default settings.
  • In the “Server” tab, enter the Public DNS host name from above. Use port 22.
  • In the “Authentication” section, enter “ec2-user” as the Username.
  • Use “publickey” as the Initial Method.
  • For “Client key,” select the profile created earlier when importing the .pem file.
  • Click “Log in” and confirm any dialogs.

Mac OS

Change the permission of the downloaded .pem file to allow only the owner access:

chmod 400 ~/Downloads/mykey.pem

Use ssh with the .pem file:

ssh -i ~/Downloads/mykey.pem ec2-user@xx-xx-xx-xx.yyyy.amazonaws.com