Metadedup: Deduplicating Metadata in Encrypted Deduplication via Indirection
Encrypted deduplication combines encryption and deduplication in a seamless way to provide confidentiality guarantees for the physical data in deduplication storage, yet it incurs substantial metadata storage overhead due to the additional storage of keys. We present a new encrypted deduplication storage system called Metadedup, which suppresses metadata storage by also applying deduplication to metadata. Its idea builds on indirection, which adds another level of metadata chunks that record metadata information. We find that metadata chunks are highly redundant in real-world workloads and hence can be effectively deduplicated. In addition, metadata chunks can be protected under the same encrypted deduplication framework, thereby providing confidentiality guarantees for metadata as well.
- Jingwei Li, Patrick P. C. Lee, Yanjing Ren, and Xiaosong Zhang. Metadedup: Deduplicating Metadata in Encrypted Deduplication via Indirection. Proceedings of the 35th International Conference on Massive Storage Systems and Technology (MSST 2019), Santa Clara, U.S.A, May 2019.
Metadedup is built on Ubuntu 14.04.5 LTS with GNU gcc version 6.5.0. It requires the following libraries:
- OpenSSL (version: 1.0.2g)
- Boost C++ library
To install OpenSSL, Boost C++ library, GF-complete and snappy (that is necessary for Leveldb), type the following command:
$ sudo apt-get install libboost-all-dev libsnappy-dev libssl-dev libgf-complete-dev
Leveldb is packed in
server/lib/, and compile it using the following command:
$ make -C server/lib/leveldb/
Metadedup distributes data across n (4 by default) servers for storage, such that the original data can be retrieved provided that any k (3 by default) out of the n servers are available.
Metadedup server processes data and metadata, separately. Compile the server program via the following command:
$ make -C server/
Start a Metadedup server by the following command. Here
meta port and
data port indicate the ports that are listened for data and metadata processing, respectively.
$ server/SERVER [meta port] [data port]
Then, follow the above instructions, and start another n-1 servers.
Edit the configuration file
client/config-u to set server information for upload. Note that the configuration should be consistent with the IPs and ports of the n running servers.
An example of
client/config-u is shown as follows:
0.0.0.0:11030 0.0.0.0:11031 0.0.0.0:11032 0.0.0.0:11033 0.0.0.0:11034 0.0.0.0:11035 0.0.0.0:11036 0.0.0.0:11037
- Line 1-4 specify the IP addresses of 4 running servers, as well as corresponding meta ports.
- Line 5-8 specify the IP addresses of 4 running servers, as well as corresponding data ports.
- In this example, Line 1 and Line 5, Line 2 and Line 6, Line 3 and Line 7, and Line 4 and Line 8 correspond to the same server yet different ports for data and metadata processing, respectively.
In addition, edit the configuration file
client/config-d based on the same instruction above, in order to set server information for download. Note that Metadedup only needs to download data from any k out of n servers for retrieval. Thus,
client/config-d includes the configuration for k necessary servers: the first k lines indicate the IP addresses and meta ports, and the last k lines indicate the IP addresses and data ports.
Then, compile and generate an executable program for client.
$ make -C client/
Note that any changes of
client/config-d do not require to re-compile client program.
Changing (n, k)
You can modify the variables
m_ in lines 47-50 in
client/utils/conf.hh, in order to change the parameter setting (n, k) of Metadedup.
// default (n, k) configuration of Metadedup n_ = 4; m_ = 1; k_ = n_ - m_; r_ = k_ - 1;
n_is the total number of servers.
m_is the fault tolerance degree.
k_is the smallest number of servers required for successful retrieval.
r_is the confidentiality degree.
Note that the change of (n, k) parameter setting requires to re-compile server and client programs.
We provide a shell script to automatically config Metadedup in default setting (i.e., n = 4 and k = 3):
The shell script
auto/auto_config.sh makes the following actions: (i) compile client; (ii) compile server; and (iii) replicate the compiled server folder to another three different folders (
server4/), in order to config multiple servers. After that, each server program can be run by following the SERVER command.
We also provide a shell script to clean the configuration.
Use the executable program
client/CLIENT in the following way:
usage: CLIENT [filename] [userID] [action] [secutiyType] - [filename]: full path of the file; - [userID]: user ID of current client; - [action]: [-u] upload; [-d] download; - [securityType]: [HIGH] AES-256 & SHA-256; [LOW] AES-128 & SHA-1
As an example, to upload a file
test from user 0 using high security mechanism (e.g., AES-256 & SHA-256), type the following command:
$ client/CLIENT test 0 -u HIGH
To download further the file
test, follow the command:
$ client/CLIENT test 0 -d HIGH
New Versions of OpenSSL
We develop Metadedup in Ubuntu 14.04 that can only be compatiable with an old version (e.g., 1.0.2) of OpenSSL. However, such old version is not supported by recent Ubuntu systems. We provide a patch (that is tested on Ubuntu 18.04) to ensure that Metadedup can work with a new version of OpenSSL.
Before patching, check the version of OpenSSL installed in your system:
$ openssl version
You will get OpenSSL version like follows:
OpenSSL 1.1.0g 2 Nov 2017
If the version number is lower than 1.1.0 (e.g., version 1.0.2/1.0.1), you do not need to put the patch.
If the version number is higher than 1.1.0, uncomment line 20 of
#define OPENSSL_VERSION_1_1 1
You also need to uncomment line 20 of
GF-Complete Installation from Source
Some Linux systems do not support to install GF-Complete from package managers. We provide instructions to install GF-Complete from source code.
Download the source code of GF-Complete from here, go into GF-Complete folder, and follow the commands for installation:
$ ./configure $ make $ sudo make install
We assume that the upload and download channels are secure (e.g., encrypted and authenticated), and do not implement SSL/TLS mechanisms upon our protection (e.g., CAONT-RS) of data.