You face a problem where you want to remove a leading 'container' directory from
an archive when extracting. This can happen when you're using the Puppet
archive
resource from puppetlabs-archive
module.
archive { '/tmp/somearchive.zip':
extract => true,
extract_path => '/srv/http/omeka_s',
source => 'https://github.com/omeka/omeka-s/releases/download/v1.3.0/omeka-s-1.3.0.zip'
}
Here we want to extract directly into /srv/http/omeka_s
so that the
index.php
is created inside that directory. Problem is that the file contains
an embedded directory, omeka-s
. Note that it doesn't quite match our desired
one, so we can't do the trick of extracting to /srv/http
and relying on the
archive structure to create the desired directories. [That's probably wise, as
that approach is somewhat fragile.]
If it was a tar file, we could use tar --strip-components=1
, but as it's a zip
file we can't do this. Look on the net and you'll find many asking this
question. It's 2019 and we still don't have a good solution for extracting
archives without caring about the format. Well, there are two...
dtrx
-- "do the right extraction" works well, except it's designed for the
opposite of this situation. It's designed for avoiding tarbombs. In fact, we
have a non-tarbomb when we actually desire a tarbomb. Still, it's worth a
mention as it's packaged for many distributions.
7zip doesn't work because it does the equivalent of -j
when using the -r
option.
So here's my hacky python workaround. I was really hoping not to have to do this, and to be able to recommend a solution that was already packaged in Debian, but I couldn't find one, so this will have to do for now.
#! /usr/bin/env python3
import sys
import zipfile
import os
import shutil
strip_n = int(sys.argv[1])
zipfile_path = sys.argv[2]
with zipfile.ZipFile(zipfile_path) as the_zip:
namelist = the_zip.namelist()
for member in namelist:
fixed_path = os.path.normpath(member)
components = fixed_path.split(os.sep)
if len(components) < strip_n:
raise Exception('unexpected number of components in filename')
stripped = components[strip_n:]
target_path = os.path.join(*stripped)
upperdirs = os.path.dirname(target_path)
if upperdirs and not os.path.exists(upperdirs):
os.makedirs(upperdirs)
with the_zip.open(member) as source, open(target_path, "wb") as target:
shutil.copyfileobj(source, target)