Connection aborted error using python elasticsearch with large files on AWS ES

AWS ES has an upload limit of 10MB. If you are using the bulk helpers or reindex and some documents are above this limit you will get an error ConnectionError: ('Connection aborted.', error(32, 'Broken pipe')).

To solve it, use the max_chunk_bytes argument, which can be used with reindex like so:

elasticsearch.helpers.reindex(
es, source_index, target_index, chunk_size=100,
bulk_kwargs={'max_chunk_bytes': 10048576},  # 10MB AWS ES upload limit
)

Ideally make the chunk size the average number of documents before the size is 10MB, and then in the case there are some larger documents that push the size over 10MB the elasticsearch library will handle it.

Leave a Reply

Your email address will not be published. Required fields are marked *