Some Cloud Storage Services

Java, programming

Azure Blob Storage (“Azure”) and Google Cloud Storage (“GCS”) are now supported as well as AWS S3 (“S3”) by log4j-s3-search project.

While working on adding the two options, I learned a bit about the three storage services. These services’ similarity to one another may not be a coincidence. After all, they are competing offerings from competing service providers, so months/years of competitive analyses may just yield a bunch of similar things.

HTTP and Language SDKs

All three services have the basic HTTP interface:

However, all three services also have language bindings, or client SDKs, for popular programming languages (e.g. Java, Python, C, etc.). And my experience is that working with these SDKs is definitely easier than dealing with them on your own via HTTP. This is especially true for AWS S3, considering the logic used to sign a request.

Storage Model

The models are similar for all three:

  • A global namespace is used for a container of blobs. This can be referred to either a container (Azure) or a bucket (S3 and GCS).
  • Within each container are key-value pairs where the key is a string of characters that may resemble a path-like format to mimic that used by the folder hierarchy in file systems (e.g. “/documents/2020/April/abc.doc“).
    The consoles for these services may interpret keys of that format and present a hierarchical tree-like interface to further the illusion of the hierarchy. However, keep in mind that underneath in the implementation, a key is just a string of characters.
  • The value is a binary stream (a “blob”) of arbitrary data. Typically the services will allow attaching metadata to the key-value entry. One of the common metadata properties is “Content-Type” that is similar to the HTTP header of the same name in usage: to hint to the users of the blob what the content is (e.g. “text/plain,” “application/json,” “application/gzip,” etc.).

Not-So-Quick Walkthrough

The following steps are what I went through in order to upload a file into the services.

In order to use these services, of course, an account with the service is required. All three services have a “free” period for first-time users.

Set Up Authentication

S3

Sign into https://console.aws.amazon.com/

Create an access key. Despite the name, creating one actually yields two values: an access key and a corresponding secret key. The process is a bit clumsy. Be sure to write down and/or download and save the secret key because it is only available during this time. The access key is listed in the console. If the secret key is lost, a new access key needs to be created.

Create a subdirectory .aws and a file credentials under your user’s home directory ($HOME in Linux/Mac, %USERPROFILE% in Windows):

.aws/
    credentials

The contents of the file should be something like (substitute in your actual access and secret keys, of course):

[default]
aws_access_key_id = ABCDEFGABCDEFGABCDEFG
aws_secret_access_key = EFGABCDEFG/ABCDEFGABCDEFGAABCDEFGABCDEFG

That should be sufficient for development purposes.

Azure

Sign into https://portal.azure.com/

Create a Storage account. One of the Settings for the Storage account is Access keys. A pair of keys should have been generated. Any of them will work fine. Just copy down the Connection string of a key to use.

The connection string will be used to authenticate when using the SDK.

Optional: one common pattern I see is that an environment variable AZURE_STORAGE_CONNECTION_STRING is created whose value is the connection string. Then the code will simply look up the environment variable for the value. This will avoid having to hard-code the connection string into the source code.

GCS

Sign into https://console.cloud.google.com/

Create a project. Then create a Service account within the project.

In the project’s IAM > Permissions page, add the appropriate “Storage *” roles to the service account.

Add “Storage Admin” to include everything. After a while, the “Over granted permissions” column will have information on the actual permissions needed based on your code’s usage, and you can adjust then.

Then create a key for the Service account. The recommended type is JSON. This will download a JSON file that will be needed.

Set the environment variable GOOGLE_APPLICATION_CREDENTIALS to the full path to where the JSON file is stored.

Write Code to Upload File

The examples below are in Java.

S3

String bucketName = "mybucket";
String key = "myfile";
File file = new File(...);  // file to upload

AmazonS3Client s3 = (AmazonS3Client)AmazonS3ClientBuilder
    .standard()
    .build();
if (!client.doesBucketExist(bucketName)) {
    client.createBucket(bucketName);
}
PutObjectRequest por = new PutObjectRequest(
    bucketName, key, file);
PutObjectResult result = client.putObject(por);

Azure

This is using the v8 (Legacy) API that I ended up doing. To do this with the newer v12 API, see https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-java#upload-blobs-to-a-container

String containerName = "mycontainer";
String key = "myfile";
File file = new File(...);  // file to upload

String connectionString = System.getenv(
    "AZURE_STORAGE_CONNECTION_STRING");
CloudStorageAccount account = CloudStorageAccount.parse(
    connectionString);
CloudBlobClient blobClient = account.createCloudBlobClient();
CloudBlobContainer container = blobClient.getContainerReference(
    containerName);
boolean created = container.createIfNotExists(
    BlobContainerPublicAccessType.CONTAINER, 
    new BlobRequestOptions(), new OperationContext());

CloudBlockBlob blob = container.getBlockBlobReference(key);
blob.uploadFromFile(file.getAbsolutePath());

GCS

While the other two services have convenient methods to upload a file, the GCS Java SDK does not; it only has a version that uploads a byte[] which is dangerous if your data can be large.

Internet to the rescue, I guess, since this article has one solution by implementing our own buffering uploader:

private void uploadToStorage(
    Storage storage, File uploadFrom, BlobInfo blobInfo) 
    throws IOException {
    // Based on: https://stackoverflow.com/questions/53628684/how-to-upload-a-large-file-into-gcp-cloud-storage

    if (uploadFrom.length() < 1_000_000) {
        storage.create(
            blobInfo, Files.readAllBytes(uploadFrom.toPath()));
    } else {
        try (WriteChannel writer = storage.writer(blobInfo)) {
            byte[] buffer = new byte[10_240];
            try (InputStream input = Files.newInputStream(uploadFrom.toPath())) {
                int limit;
                while ((limit = input.read(buffer)) >= 0) {
                    writer.write(
                        ByteBuffer.wrap(buffer, 0, limit));
                }
            }
        }
    }
}

With that defined, then the upload code is:

String bucketName= "mybucket";
String key = "myfile";
File file = new File(...);  // file to upload

Storage storage = StorageOptions.getDefaultInstance()
    .getService();
Bucket bucket = storage.get(bucketName);
if (null == bucket) {
    bucket = storage.create(BucketInfo.of(bucketName));
}

BlobId blobId = BlobId.of(bucketName, key);
BlobInfo blobInfo = BlobInfo.newBuilder(blobId).build();
uploadToStorage(storage, file, blobInfo);

Setting up Git repo on Dreamhost

programming, Uncategorized

Create an empty repo

Create an empty repo off of /home/username where username is your user name.
The following will create an empty repo named “myrepo” in the subdirectory /home/username/myrepo.git

cd ~
git init --bare myrepo.git

Adding content

Go to where the source code are and initialize a git repository. Then add the files and configure as necessary:

git init
git add ...
git commit ...

Configure a remote repository to map to the repo created earlier in order to push contents to:

git remote add dreamhost ssh://username@server.dreamhost.com/home/username/myrepo.git

The above sets up a remote repo called “dreamhost” that tracks the repo created above. The URL component ssh://username@server.dreamhost.com indicates how to access the server where the repo is. The user name and server names can be found by following the docs from Dreamhost:

https://help.dreamhost.com/hc/en-us/articles/115000675027-FTP-overview-and-credentials#Finding_your_FTP_server_hostname

Finally, push the change up:

git push -u dreamhost master

Pulling content

Most likely on a different machine, use git clone to pull content and start tracking changes:

git clone ssh://username@server.dreamhost.com/home/username/myrepo.git
Cloning into 'myrepo'...
username@server.dreamhost.com's password: xxxxxxxxx
remote: Counting objects: xx, done.
remote: Compressing objects: 100% (10/10), done.
remote: Total xx (delta 1), reused 0 (delta 0)
Receiving objects: 100% (xx/xx), 3.28 KiB | 3.28 MiB/s, done.
Resolving deltas: 100% (1/1), done.

The URL component ssh://username@server.dreamhost.com indicates how to access the server where the repo is as is the case before. The path /home/username/myrepo.git is the path to the remote repo that you create in the first step above.

Now you can use git add, git commit, and git push to add content:

git add ...
...
git commit
...
git push origin master

Or, for a specific branch (mybranch in this example):

git checkout -b mybranch
git add ...
...
git commit
...
git push origin mybranch

Django app dev in Docker

django, docker, programming

Start with Docker’s doc

The link https://docs.docker.com/compose/django/ for the most part is valid in setting up a development environment for working with Django in Docker.

Update to use Pipfile

The document still uses the example of a requirements.txt to handle app dependencies. Django now has at least two newer alternatives:

  • Setup.py
  • Pipfile (pipenv)

One update to use pipenv is to modify the Dockerfile as such:

FROM python:3
ENV PYTHONUNBUFFERED 1

RUN mkdir /code
WORKDIR /code

COPY . /code/
RUN pip install pipenv && pipenv install
RUN pipenv run python manage.py migrate

That last command is optional; it can be deferred in the docker-compose.yml if that works better.

Speaking of docker-compose.yml:

version: '3'

services:
  web:
    build: .
    command: pipenv run python manage.py runserver 0.0.0.0:8000
    volumes:
      - .:/code
    ports:
      - "8000:8000"

(I removed the “db” service to keep this minimalistic.)

Fast Guide to Launching an EC2 Instance w/ SSH Access

AWS, ec2, ssh, Windows

Concepts

Minimal number of concepts to understand:

  • Key pair — a pair of public and private cryptographic keys that will be used to establish a secure shell/terminal to the launched EC2 instance.
  • Security group — a group of access rules that determine what network traffic and go into (inbound rules) and go out of (outbound rules) the EC2 instance.
  • IAM role — a collection of rules that determine what AWS services the EC2 instance will have access to (and what kind of access). E.g. read-only access to S3.
  • AMI — an image that prescribes an OS and some software to run when an EC2 instance comes up.

Shared Key Pair

The only thing that is shared between EC2 and the SSH program that matters in this example is the key pair. The instructions here will describe how to create a new key pair.

Creating a Key Pair

  • Log into the AWS console. The remaining steps to launch an instance will be done in the AWS Console.
  • Access Services > EC2 > Key Pairs from AWS Console.
  • Click “Create Key Pair”
  • Give it a name “KP”
  • Once it’s created, a “.pem” file will be downloaded. Remember the name and where the file is downloaded. It will be needed later.

Create a Security Group

  • Access Services > EC2 > Security Groups
  • Click “Create Security Group” to create a security group. Name it “SG.”
  • In the “Inbound” rules, add an entry for Type SSH, Protocol TCP, Port Range 22. For the Source, select “My IP” to let the tool automatically select your IP address.
  • Add other rules to open up more ports as needed.

Create an IAM Role

  • Access Services > Security, Identity, & Compliance > IAM > Roles
  • Click “Create Role” and select “EC2” (as opposed to Lambda)
  • Click “Next: Permissions”
  • Add permissions as needed (e.g. add “AmazonS3ReadOnlyAccess” if read-only access to S3 is needed).
  • Give the role a name and description.

Launch Instance

  • Access Services > EC2 > EC2 Dashboard
  • Click “Launch Instance”
  • Select an appropriate AMI (e.g. any of the Amazon Linux ones) to use for the EC2 instance. For the instance type, start with “t2.nano” to experiment with since it’s cheapest. Once the instance is up and running, larger instance types can be used as needed.
  • Click “Next: Configure Instance Details.”
  • For IAM role, select the role created above. Everything else can stay as-is.
  • Click “Next: Add Storage.”
  • Edit as desired, but the default is fine.
  • Click “Next: Add Tags.”
  • Add tags as needed. These are optional.
  • Click “Next: Configure Security Group.”
  • Choose “Select an existing security group” and select the security group “SG” created above.
  • Click “Review and Launch.”
  • Click “Launch” after everything looks right.
  • A modal comes up to allow selection of a key pair to use to access the instance. Select “KP” as created above.
  • Continue the launch.
  • Click on the “i-xxxxxxxxx” link to see the status of the instance.
  • Wait until Instance State is “running” and Status Checks is “2/2.”
  • Note the “Public DNS (IPv4)” value. It is the host name to SSH into.

Connecting to The EC2 Instance

Windows with Bitvise SSH Client

  • Download and start Bitvise SSH Client.
  • Click “New profile”
  • Go to “Login” tab
  • Click the link “Client key manager”
  • Click “Import”
  • Change file filter to (*.*)
  • Locate the .pem file downloaded above and import it. Accept default settings.
  • In the “Server” tab, enter the Public DNS host name from above. Use port 22.
  • In the “Authentication” section, enter “ec2-user” as the Username.
  • Use “publickey” as the Initial Method.
  • For “Client key,” select the profile created earlier when importing the .pem file.
  • Click “Log in” and confirm any dialogs.

Mac OS

Change the permission of the downloaded .pem file to allow only the owner access:

chmod 400 ~/Downloads/mykey.pem

Use ssh with the .pem file:

ssh -i ~/Downloads/mykey.pem ec2-user@xx-xx-xx-xx.yyyy.amazonaws.com

Django’s (default) Permission Codenames

django, programming

Permission String

The documentation for Django authentication (e.g. https://docs.djangoproject.com/en/2.2/topics/auth/default/#permissions-and-authorization) talks about allowing code to restrict access to models through a vague notion of a “permission.”

Read further and you kinda get the idea that a “permission” in most practical applications is a string. Whether you’re using explicit checks like user.has_perm(permission) or decorators like @permission_required(permission) or class properties for the PermissionRequiredMixin like this (taken right out from the page):

from django.contrib.auth.mixins import PermissionRequiredMixin

class MyView(PermissionRequiredMixin, View):
    permission_required = 'polls.can_vote'

The problem with using strings is that the burden of getting things correct lies heavily on the documentation. A basic question, then, is that: given a model created in the app, what permission string should you use to work with the Authentication system to guard the model’s access?

Extremely Underrated Section

The short section on “Default permissions” (https://docs.djangoproject.com/en/2.2/topics/auth/default/#default-permissions) initially looks innocuous enough. However, it turns out that the section is the key to using permissions on models (custom permissions notwithstanding).

If you’re in a hurry to get permissions working on some models, you’d be forgiven for searching on the page and jump directly to the section on the “permission_required” decorator (and the section on PermissionRequiredMixin immediately after) where they show examples of using permissions of the form:

<model>.can_<action>

For example, 'polls.can_vote' is used throughout these sections to indicate the ability to “vote” on the “polls” model.

You may infer from that permissions like:

polls.can_add
polls.can_change
polls.can_delete
polls.can_view

because–oh–you skimmed some section earlier on the helper functions has_view_permission(), has_add_permission(), etc.

Well. Turns out these permission values will not work because the default permissions on models do not in fact follow this convention.

UPDATE: the Django 3.0 documentation, fortunately, fixed this by giving actual working values (e.g. “‘polls.add_choice‘”).

Convention on Default Permission Codenames

That brings us back to the section “Default permissions,” that small section of about 20 lines on a page of thousands.

Codename

Firstly, a permission (as can be seen in the table auth_permission) has an attribute “codename,” and that is what is used to communicate the permission being tested on.

Testing Against Codenames

All methods of working on permissions will operate on these codename values:

Model.has_perm(codename) and its friends from ModelAdmin: has_view_permission, has_add_permission, has_change_permission, and has_delete_permission

django.contrib.auth.decorators.permission_required decorator

PermissionRequiredMixin mixin class

Convention of Default Permission Codenames

Finally, the convention of the codename (as documented in that Default permissions section) is:

<app_label>.<action>_<model name>
  • The app_label is the name of the AppConfig for your app, most likely defined in apps.py of your app.
  • The action is one of ‘add’, ‘change’, ‘delete’, and (with Django version 2.1 or higher) ‘view.’
  • The model_name is the lower case form of your model name in the DB.

So if I had a Polls app with a model Posting,

polls/apps.py:

from django.apps import AppConfig
class PollsConfig(AppConfig):
    name = 'polls'

possible permission codenames are:

  • polls.view_posting
  • polls.add_posting
  • polls.change_posting
  • polls.delete_posting

When in Doubt, Check the DB

If you’re not sure, you can always check the auth_permission table in your DB:

select name, codename from auth_permission;

The name is what the Admin pages use to show the permission to assign/unassign. The codename is, of course, the permission string you would use in code.

More MVN Repository Notes

maven

More findings from working with MVN repository

When mvn release:perform is run, there are two steps that I didn’t pay attention to before. Of course, the building, signing, and uploading of JARs/POMs to the staging repository is expected from the various updates to the POMs seen from the previous post.

Here are the steps:

  • Creation of a staging repository
  • Building, signing, and uploading artifacts into the staging repository
  • Closing the staging repository
  • Releasing the staging repository

Looking at the logs of the release more closely, this can be seen:

...
    [INFO]  * Upload of locally staged artifacts finished.
    [INFO]  * Closing staging repository with ID "comtherealvan-1016".
         
    Waiting for operation to complete...

    ........                                                                                                              
    [INFO] Remote staged 1 repositories, finished with success.                    
    [INFO] Remote staging repositories are being released...                                                                                     
    Waiting for operation to complete...                                           
    .......                                                                                                               
    [INFO] Remote staging repositories released.             
    [INFO] ------------------------------------------------------------------------
    [INFO] BUILD SUCCESS                                     
    [INFO] ------------------------------------------------------------------------
    [INFO] Total time: 01:39 min
...

Parent POMs

Yep. More notes about parent POMs that have <packaging>pom</packaging> and global props that submodules inherit:

Closing and releasing have to be done manually

A staging repository will be created. However, I have to log into https://oss.sonatype.org/ and look up the staging repository (I just search for my group ID “therealvan.com” to find my artifacts).

Then I have to select my staging repository and click “Close” and then, after it successfully closes, click “Release” to release the repository. There is also a “Drop” to remove the staging repository when it’s no longer needed (e.g. clean up after a successful release or aborting a release).

More information is expected

For parent POMs, I ran into errors when trying to close the staging repository. They have to do with missing:

  • Developer information
  • Licensing information
  • Project description
  • Project URL

These do not seem to be required (at least for now) when the closing and releasing are done as part of mvn release:perform. However, when closing the staging repository manually, these come up as errors.

Fortunately, they are easy to fix:

  • Developer information can be provided using the <developers> section in the POM.
  • Licensing information can be provided using the <licenses> section in the POM.
  • Project description and URL are just <description> and <url> tags for the top-level <project>.

 

 

Publishing a Java project into MVN Repository

Java, maven, programming

It turns out that this is quite an involved process with a lot of configuration and coordination w/ Sonatype. It took several days, upwards to a week or so, for me to get this working. Plan accordingly.

These notes are based on various articles out there, but updated to reflect learnings I had when going through the process.

Qualifications:

Here are some specifics for my project.

  • Project is hosted in GitHub as a public repo.
  • Project is a Maven project with a POM file at the root. It’s a multi-module project, but that shouldn’t change anything other than the fact that I publish the modules individually. There may be a way to publish all the modules simultaneously, but I haven’t explored that option.

Prerequisite:

  • Create an account with Sonatype. Start at https://issues.sonatype.org/secure/Dashboard.jspa and click “Sign up.”
  • Create an issue for the new project:
    • Project: “Community Support – Open Source Project Repository Hosting”
    • Type: “New Project”
    • Group Id: a domain I own (or something based on the project hosting like io.github.<user narme> or com.github.<user name>.
    • Project URL: GitHub URL to the project.
    • SCM URL: path to the repo (https://github.com/…/<project>.git)
    • Username(s): the Sonatype user name.

Configure project with Maven src and Javadoc plugins:

See https://github.com/bluedenim/log4j-s3-search/blob/master/appender-core/pom.xml#L106-L134

Create and publish an RSA key

  • Create a RSA key (e.g. using GnuPG’s “gpg –full-gen-key with 2048 bits)
  • Publish the key (e.g. “gpg –keyserver pool.sks-keyservers.net –send-key <my RSA key ID>“)

Configure Maven to talk to Sonatype servers:

Create/edit the file settings.xml under M2_HOME/config or M2_REPO/config:

Find the servers section and add:

<server>
  <id>ossrh</id>
  <username>my_sonatype_username</username>
  <password>my_sonatype_password</password>
</server>

Find the profiles section and add:

<profile>
  <id>ossrh</id>
  <activation>
    <activeByDefault>true</activeByDefault>
  </activation>
  <properties>
    <gpg.keyname>my RSA key ID</gpg.keyname>
    <gpg.passphrase>my RSA key's passphrase</gpg.passphrase>
  </properties>
</profile>

Use the same GPG key generated above.

Configure project for Maven DEPLOY Plugin:

Add the Maven Deploy Plugin. See https://github.com/bluedenim/log4j-s3-search/blob/master/appender-core/pom.xml#L64-L76

Add a distributionManagement section to the project POM. See https://github.com/bluedenim/log4j-s3-search/blob/master/appender-core/pom.xml#L137-L146

Add an scm section to the project POM. See https://github.com/bluedenim/log4j-s3-search/blob/master/appender-core/pom.xml#L147-L152

Configure Project for Nexus-Staging-Maven Plugin:

See https://github.com/bluedenim/log4j-s3-search/blob/master/appender-core/pom.xml#L95-L105

If using a different server ID than “ossrh,” keep in sync with the entries defined in the distributionManagement section and also the configuration in the Maven conf/settings.xml.

Configure project for Maven Release Plugin:

Add the Maven Release Plugin. See https://github.com/bluedenim/log4j-s3-search/blob/master/appender-core/pom.xml#L77-L94

Configure Project to Sign artifacts when releasing:

See https://github.com/bluedenim/log4j-s3-search/blob/master/appender-core/pom.xml#L154-L180

Preparation of Release:

Finally, to prepare for a project to be released,

  • Build the project once with mvn install
  • Fix all the issues that come up with unit tests and Javadoc.
  • Commit and push all the changes to GitHub.
  • Modify the project’s version to be a “-SNAPSHOT” release for the release I want to make. For example, if I want to release a 1.0.0 version, use the version “1.0.0-SNAPSHOT” for the project. However, none of the project’s dependencies can be to SNAPSHOT versions.
  • Commit the change. No need to push to GitHub.

Run mvn release:prepare

The process will ask some questions. The release will be “1.0.0” in this example. The next release is probably “1.0.1-SNAPSHOT” as suggested, but this can always be modified later as needed.

Release TO SONATYPE:

Run mvn release:perform

This takes a while, and it will actually push artifacts to Sonatype’s servers. Things can go wrong here that range from intermittent network errors to errors with the setup of the repo.

Artifacts can be verified by logging into https://oss.sonatype.org/ with the Sonatype account created earlier and searching for the released artifacts.

Issues may need to be filed with Sonatype if the repo is set up incompletely. Even when things work correctly, it may take some time for things to propagate through the various servers before the artifacts show up in mvnrepository.com.

Once this works the first time, subsequent releases are more stable.

 

Addendum:

Parent POMs

To release parent POMs w/o triggering releasing the modules under the parent, add:

 -N -Darguments=-N

as documented here: http://maven.apache.org/maven-release/maven-release-plugin/faq.html#nonrecursive