Category Archives: AWS

Delete old files from huge directory with perl

If you need to delete old files from a directory but ls/find/rm/etc hang because of the number of files (especially if they’re on a slow NFS share), then you can use perl!

chdir "DIRECTORY" or die;
opendir D, ".";
$count = 0;
while ($n = readdir D) {
    $days = -M $n;
    if ($days > 3){
        #print "deleting $n $days\n";
        unlink $n or warn "Can't unlink $file: $!";
print "deleted $count files\n";

Replace DIRECTORY with your directory name, and 3 with the max age of the files to keep (here it will keep files 3 days old or less).

If you’re running this on AWS EFS then temporarily switch the filesystem throughput mode from burstable to provisioned, otherwise it will start quick but get very very slow once you’re out of burstable credits.

Connection aborted error using python elasticsearch with large files on AWS ES

AWS ES has an upload limit of 10MB. If you are using the bulk helpers or reindex and some documents are above this limit you will get an error ConnectionError: ('Connection aborted.', error(32, 'Broken pipe')).

To solve it, use the max_chunk_bytes argument, which can be used with reindex like so:

es, source_index, target_index, chunk_size=100,
bulk_kwargs={'max_chunk_bytes': 10048576},  # 10MB AWS ES upload limit

Ideally make the chunk size the average number of documents before the size is 10MB, and then in the case there are some larger documents that push the size over 10MB the elasticsearch library will handle it.

Importing/restoring elastic search snapshot to AWS Elastic Search Service

Took me a long time to find out how to do this.

A few people have re-posted a lot of this AWS article but missed out some crucial details:

The general idea is:

  1. Create an AWS bucket and put the snapshot files into it (don’t use a subdirectory, the .dat files should be in the bucket root).  No need to change permissions on the bucket or anything.
  2. Create an IAM role and policy as per the documentation in the AWS docs link above.  When creating the role using web management console you need to choose EC2 role type and manually modify the trust relationship after creating it.
  3. Run a python script (can find this in the docs link above) using the boto library to register the bucket as a snapshot repository in ES.  You need to sign the request regardless of the ES access policy you are using.  HOWEVER set `is_secure` to `True`.  Without this I was getting `<html></html>` returned instead of any error messages.
  4. Use curl to do the restore (no need to sign restore/backup requests if your access policy is open / IP-based).  Again check the doc for the exact curl command, but as above use https instead of http to get real error messages.

Mount docker socket inside AWS container

  1. To your container, add a new volume
  2. Name: ‘docker_sock’, source path: ‘/var/run/docker.sock’
  3. In Storage and Logging section, add new mount point
  4. Select ‘docker_sock’, container path: ‘/var/run/docker.sock’

And that’s it. No need to give privileged access, and if you run docker commands directly from inside the container there’s no need to change IAM policy.

Associating EC2 instances with an ECS cluster

The EC2 instance is associated with a Container Service cluster using the /etc/ecs/ecs.config file on the instance, in the format ECSCLUSTER=yourcluster_name.

The EC2 instance must also have the ECS agent installed. If you create the instance using the ECS AMI this will be pre-installed (search for AMI called amazon-ecs-optimized).

This configuration can be put in the User Data field:

echo ECS_CLUSTER=your_cluster_name >>/etc/ecs/ecs.config

To find the setting on an instance that already exists: Actions -> Instance Settings -> View/Change User Data

Exact instructions for setting up the EC2 instance properly can be found here: