key: cord-0517153-c10xptit
authors: Pagnotta, Giulio; Hitaj, Dorjan; Hitaj, Briland; Perez-Cruz, Fernando; Mancini, Luigi V.
title: TATTOOED: A Robust Deep Neural Network Watermarking Scheme based on Spread-Spectrum Channel Coding
date: 2022-02-12
journal: nan
DOI: nan
sha: b4e456fbe7dd8d93d1a51d8b46ec1c6e947efb66
doc_id: 517153
cord_uid: c10xptit

The proliferation of deep learning applications in several areas has led to the rapid adoption of such solutions from an ever-growing number of institutions and companies. These entities' deep neural network (DNN) models are often trained on proprietary data. They require powerful computational resources, with the resulting DNN models being incorporated in the company's work pipeline or provided as a service. Being trained on proprietary information, these models provide a competitive edge for the owner company. At the same time, these models can be attractive to competitors (or malicious entities), which can employ state-of-the-art security attacks to obtain and use these models for their benefit. As these attacks are hard to prevent, it becomes imperative to have mechanisms that enable an affected entity to verify the ownership of its DNN with high confidence. This paper presents TATTOOED, a robust and efficient DNN watermarking technique based on spread-spectrum channel coding. TATTOOED has a negligible effect on the performance of the DNN model and is robust against several state-of-the-art mechanisms used to remove watermarks from DNNs. Our results show that TATTOOED is robust to such removal techniques even in extreme scenarios. For example, if the removal techniques such as fine-tuning and parameter pruning change as much as 99% of the model parameters, the TATTOOED watermark is still present in full in the DNN model and ensures ownership verification.

Software 2.0 is written in much more abstract, human unfriendly language, such as the weights of a neural network.

The presence of vast quantities of data originating from multitudes of sources in conjunction with powerful computational resources has fueled the last decade of machine learning (ML) applications. In particular, deep learning (DL) has demonstrated exceptional results in a growing number of domains, constantly pushing the boundaries of previously known state-of-the-art solutions. The abundance of significant volumes of data enables DNNs, the core ML technique at the heart of many state-of-the-art DL solutions, to autonomously determine and learn relevant features directly from the input, Giulio [39] , [20] , natural language processing [14] , [2] , speech recognition [18] , data (image, text, audio) generation [25] , [3] , and cyber-security [9] , [13] , [12] . Moreover, a growing body of work has achieved impressive insights obtained from the use of DNNs to mitigate the impact of the currently ongoing COVID-19 pandemic [30] , further emphasizing the benefits resulting from the adoption of DL.

However, despite their undeniable advantages, training good DL models comes with quite some challenges: 1) While frameworks such as TensorFlow or PyTorch make it easy to construct DL pipelines for a target problem, identifying the right DNN architecture for the task (i.e., the correct number of layers or hyper-parameters to use) can be a challenging task even for ML experts.

2) The absence of required computational resources can make it difficult for an entity to benefit from DL. 3) Large quantities of training data can incorporate proprietary characteristics, thus requiring additional layers of protection when the resulting models are made available to the public, including machine learning as a service (MLaaS). It is therefore understandable that companies, institutions, and even individuals devising and training proprietary DNN models, want to protect this new form of intellectual property (IP) from the prying eyes of competition and malicious adversaries.

There has been a growing body of work aiming at devising mechanisms that would enable an entity to verify, with high assurance, the legitimate ownership of a suspected DNN model. In particular, DNN watermarking, a concept first introduced by Uchida et al. [43] has gauged significant interest in the research community, resulting in a surge of watermarking techniques [45] , [40] , [11] , [6] , [1] , [7] , [24] , [32] , [50] , [34] , [41] . However, as it commonly happens in security and privacy works, this has led to the development of novel methods demonstrating that it is possible to remove [8] , [22] , [46] , [34] , [44] , overwrite [46] , and forge [49] , [15] the watermarks placed in a DNN. Thus, the development of a robust DNN watermarking strategy remains an open problem.

In this paper, we push forward the state-of-the-art in DNN watermarking, introducing TATTOOED, a watermarking framework that makes use of spread-spectrum channel-coding to provide a novel DNN watermarking strategy that is robust to the existing state-of-the-art watermark removal techniques [48] , [22] , [38] , [8] . TATTOOED can watermark a DNN model without impacting the models performance on the intended task while requiring as few as one iteration to complete the [k] is used in the LDPC encoder (and decoder on verification procedure) and sp[k] is used to generate the spreading codes (both in CDMA encoder and decoder). The LDPC encoder takes the watermark M, encodes it and then passes it to the CDMA encoder which spreads the LDPC-encoded watermark in a wider bandwidth. Afterwards, the encoded watermark is passed to the marking module where a DNN (W) is watermarked. To verify the watermark presence (1b), a model watermarked using TATTOOED is given as input, first to the CDMA decoder, and then to the LDPC decoder to recover the hidden watermark. Afterwards, the verifying module computes the watermark accuracy using the extracted watermark M' and the embedded watermark M, and returns 1 if the accuracy is equal to or above 90% and 0 otherwise. watermarking process. Relying on spread-spectrum channelcoding techniques to watermark a DNN allows TATTOOED to be DNN architecture-and task-independent allowing to be seamlessly incorporated to watermark different DNN models without modification.

There is little amount of work that couples spread-spectrum channel coding with machine learning, and particularly deep learning applications. Prior work uses such techniques to build a covert communication channel on top of federated learning schemes [23] , a topic which is different to the one discussed in this paper. To the best of our knowledge, this is the first work that makes use of spread-spectrum channel coding for watermarking DNN models.

Our contributions can be summarized as follows:

• We introduce TATTOOED, a novel deep neural network watermarking technique based on CDMA spread-spectrum channel-coding. Figure 1 provides an overview of the proposed scheme, details of which are further elaborated on Section IV. • We demonstrate that TATTOOED is domain-independent and conduct an extensive empirical evaluation under varying conditions: a) diverse watermark sizes; b) different DNN architectures; c) several benchmark datasets; d) different classification tasks; and d) multiple domains, including image, text, and audio. • We test TATTOOED against state-of-the-art watermark removal techniques such as fine-tuning and model pruning and demonstrate that the TATTOOED watermark cannot be removed. • We show that TATTOOED is significantly more efficient to mark and verify a DNN model compared to other state-of-the-art watermarking approaches. • We provide the source code to reproduce the evaluation of TATTOOED at this link 1 . This paper is organized as follows: Section II provides the necessary background information on the topics treated in the subsequent sections. Section III introduces the threat model. Section IV describes TATTOOED, the novel deep neural network watermarking technique proposed in this paper. Section V and Section VI provide details about the experimental set up and evaluation of TATTOOED. Section VII covers the related work, and Section VIII concludes the paper.

Similar to previous work on DNN watermarking, we evaluate TATTOOED on different classification tasks. Classification is a branch of supervised ML [17] , a learning task where (x, y)tuples of input data are used to train models that learn a mapping of the input value x to the output value y, commonly referred in the literature as the class or the target label. Once the training is complete, the model is used to predict the labels of new, previously unseen instances originating from the same domain. This learning can be expressed using the following equation:

whereŷ = f (x; θ) represents the learning machine. The learned function f provides an estimate of the label y for an input x. The learning is guided by the loss function l(ŷ, y) that measures the error for misclassifying y's, providing useful information on how the parameters should be tuned in order for the learned machine to perform better on the task at hand. Ω(θ) is a regularizer [17] , independent of the training data, that prevents the model from overfitting. Examples of supervised learning algorithms include Support Vector Machines (SVMs) [37] , Random Forests [5] , and even Deep Neural Networks (DNNs) [17] .

Digital watermarking was introduced to covertly embed a secret message, the watermark, inside digital data (e.g., an image) to provide proof of ownership of the watermarked data. This concept can be directly applied to DNNs. To prove the ownership of a model W , the owner needs a way to securely embed a secret message inside the model parameters w ⊆ W , and a way to reliably extract the message. A DNN watermarking scheme consists of a message space M, a key space K, and the following Mark and Verify algorithms:

Mark is a polynomial and deterministic algorithm that given a secret key k ∈ K, a model W , and a watermark message m ∈ M, outputs a model W wtm that contains the watermark m.

Verify is a polynomial and deterministic algorithm that, given a secret key k ∈ K and a model W , outputs a bit b ∈ [0, 1], where 1 means that the model contains the watermark m, 0 otherwise.

Watermarking Requirements. Prior work in the domain has outlined several conditions for a DNN watermarking scheme to succeed [4] , [1] , [6] . In the coming paragraphs, we expand on each of these requirements.

1) Functionality-Preserving: this requirement guarantees that the Mark algorithm does not impair the learning task of the model, nor alter its performance in a significant way. Given model W trained on dataset D and a watermarked model W wtm ← M ark(W, k), we define the functionality-preserving requirement as:

2) Robustness: this requirement guarantees that, for any functionality preserving transformation mapping W wtm to W wtm (e.g., fine-tuning), the Verify algorithm is still able to correctly detect the presence of the watermark. We define the robustness requirement as :

3) Reliability: this requirement guarantees that the legitimate ownership of a model can be verified with high probability, i.e., the Verify algorithm outputs 1 if and only if the model contains our watermark. Given the key k, a model W, and a watermarked model W wtm ← M ark(W, k), the probability that the algorithm Verify outputs 1 is:

4) Integrity: this requirement guarantees that the Verify algorithm outputs 0 for each model W that does not contain our watermark. Given the key k, a model W , a watermarked model W wtm ← M ark(W, k), and a model W , the probability that the algorithm Verify raises a false positive error is:

5) Capacity: this requirement guarantees that the Mark algorithm can embed into the model a potentially large amount of information (e.g., the signature of the legitimate owner).

6) Secrecy: this requirement guarantees that the Mark algorithm can embed the watermark in a way that is not detectable by any polynomial-time adversary A.

Given two models, W 0 , W 1 , one of them watermarked and the other not, there is no polynomial-time adversary A such that:

7) Efficiency.: This requirement guarantees that the Mark and Verify algorithms can efficiently embed the watermark and verify its presence in a model without incurring a large computational overhead.

8) Unforgeability.: The watermark should be unforgeable, such that no malicious entity can overwrite it and claim ownership of the model. 9) Authentication.: This requirement guarantees a strong link between the owner of the model and the used watermark. This could be achieved cryptographically [1] and is orthogonal to the Mark and Verify algorithms presented in TATTOOED.

10) Generality.: This requirement guarantees that the Mark and Verify algorithms can be used on every kind of model architecture and any learning task.

Watermark accuracy. To verify the presence of the watermark in a model, the legitimate owner extracts the watermark content (bits) from the model and calculates the watermark accuracy by comparing the extracted bits with the actual bits of the original watermark. The watermark accuracy represents the ratio between correctly extracted watermark bits and the actual bits of the original watermark. Given the nature of typical watermarks, (i.e., a sequence of bits that is clearly related to the legitimate owner) even a lower watermark accuracy (i.e., above 58% [8] ) can be considered as a reasonable proof.

Nevertheless, to highlight the power and robustness of TATTOOED watermark, in our evaluation we consider that the watermark is in the model only if the watermark accuracy is above, or equal to 90% (i.e., Verify algorithm returns 1 if the W T M acc ≥ 0.9).

In digital communications, spread-spectrum techniques [42] are methods by which an electrical, electromagnetic, or acoustic signal with a particular bandwidth is deliberately spread in the frequency domain. These techniques enable the spreading of a narrowband information signal over a wider bandwidth. On receiving the signal, the receiver knowing the spreading mechanism can recover the original bandwidth signal.

Two main techniques are used to spread the bandwidth of a signal: frequency hopping and direct sequence. In frequency hopping, the narrowband signal is transmitted for a few seconds in a given band, that is constantly changed using a pseudorandom frequency band that has been agreed with the receiver. The receiver, in coordination, tunes its filter to the agreed-on frequency band to recover the message. Direct Sequence, the spreading technique used in our neural network watermarking technique, works by directly coding the data at a higher frequency by using pseudo-random generated codes that the receiver knows.

In the 1990s, Direct Sequence Spread Spectrum was proposed as a multiple-access technique (i.e., Code Division Multiple Access or CDMA) in the IS-95 standard for mobile communications in the US, and it was adopted worldwide as the 3G standard. In CDMA, mobile users transmit to the base station at the same time using the same frequency but with different spreading codes. The codes should be quasiorthogonal. The base station correlates the code of each user with its spreading code to detect the transmitted bits.

When the transmitters use pseudo-random codes to encode their data, the spread-spectrum appears random and has noiselike properties. By default, this introduces some level of privacy because the receiver cannot demodulate the transmitted signal without the knowledge of the pseudo-random sequence or the method to generate the sequence that was initially used to encode the data. Another crucial property of CDMA is its resistance to jamming. Typically, a jamming signal has a finite amount of power available to jam the signal by either spreading the energy over the entire bandwidth of the signal or only on a part of it. The wide bandwidths that can be achieved by CDMA require the jamming signal to have a large amount of power, which is typically not feasible in practice.

Adversary's Capabilities. The adversary corresponds to an entity that has gained illegitimate access on a proprietary ML model. The adversary is aware of the possibility of the model being watermarked and employs existing state-of-theart watermark-removal techniques such as fine tuning and parameter pruning in an attempt to remove the watermark from the model. The adversary wins if they can remove the watermark without incurring costs similar to what the legitimate owner spent to construct the model (e.g., the adversary could train a comparable model if he had to spend the same resources to remove the watermark). Watermark verification. There exist two ways to verify the presence of a watermark in a deep neural network. Those are white-box and black-box verification. In white-box watermark verification the verifier (i.e., the legitimate owner) needs physical access to the model, while in the black-box watermark verification the verifier does not need such access. In black-box scenario the watermark verification is performed remotely by querying the model and observing the outputs returned by it.

We position ourselves in a white-box watermark verification scenario similar to prior works [33] , [43] , [40] , [11] , [6] , [47] in the field, where the verifier has white-box access to the DNN model to be verified.

This section introduces TATTOOED, our neural network watermarking technique based on a direct application of CDMA to watermark the model.

In this scenario, a legitimate owner wants to insert a P -bits watermark b = [b 0 , . . . , b P −1 ] into the model parameters. The bits are encoded as ±1, and the code for each bit is represented by c i , which is a vector of +1 and −1 of length R, where R is the number of parameters selected to embed the watermark in model W. C is an R by P matrix that collects all the codes. We assume that the codes have been randomly generated with equal probabilities for ±1. The legitimate owner stores a set of weights W T prior to watermarking the model being trained and then watermarks the model as follows:

where γ is a parameter that affects the strength of the watermark signal. The γ parameter has to be selected in an optimal way such that it does not impact the performance of the trained ML model on the legitimate task. To select γ we performed a grid search among γ values in the range

. Now, the legitimate owner can recover the watermark that was hidden in parameters of the model using the spreading code. For example, for bit i of the watermark, the legitimate owner can recover it as follows:

After obtaining y i from Equation 9 the legitimate owner can recover the i-th bit as b i = sign(y i ).

For moderate size payloads, the spreading codes of the other bits and the values of W T can result in some bits being decoded erroneously for moderate values of γ. For large values of γ the watermark might be easily extracted, but the performance of the network can be affected. To be able to extract the watermark and keep the value of γ in an appropriate range, in addition to CDMA, we employ the Low-Density Parity-Check (LDPC) [16] error correction technique. We choose LDPC due to their good properties in error correction and the ability to perform the error correction in linear time [36] . In TATTOOED, we use a standard rate, 1/2 code for all the available P bits with three ones per column. For LDPC codes to work, they need an estimate of the noise level. To obtain this value, alongside the actual watermark, we embed in the model a preamble of 200 random bits in the first 200 bits of P so that we can use these values to estimate the noise variance.

The process of watermarking and verifying the presence of the watermark in a ML model is displayed by the pseudocodes of Mark, Algorithm 1 and Verify, Algorithm 2.

Mark (Algorithm 1) takes as input an ML model to be watermarked, the γ value that will affect the power of the 

watermark signal, the actual watermark content (message), and a secret key used to ensure the security properties of the embedded watermark. The secret key consists of 512 bits and is used to generate two separate secret seeds, one seed is used to generate the spreading codes, and the other seed for creating the LDPC matrices. Once that is done, the watermark content (information that clearly correlates the model with the legitimate owner) is encoded, and the preamble that we will use to estimate the noise is concatenated to that bit sequence. Then for each bit of this composite sequence (preamble + LDPC encoded watermark), a spreading code of the length of the number of the ML model parameters weights considered is generated. The spreading code is multiplied by the bit value (translated to ±1), and the γ. The resulting vectors are added to the vector of the selected parameters of W .

Verify (Algorithm 2) takes as input the model and the secret key and returns 1 if the watermark is present and 0 otherwise. As in the Mark algorithm, the two seeds are obtained from the secret key, and with them the spreading codes and the LDPC matrices are generated. Each bit of the composite sequence (preamble + watermark) is retrieved by multiplying the transpose of the respective spreading code with the subset of the model parameters W . The first 200 bits of y, which correspond to the preamble are correlated with the actual preamble that was used in Mark. From them, the CDMA gain and signal-to-noise ratio (SNR) are calculated. The gain and SNR are used by the LDPC decoder to retrieve the watermark content. Afterwards, using the retrieved watermark, we compute the watermark accuracy and, if it is above or equal of 90%, a positive response is returned, see Section II-B.

To conduct our evaluation we chose the following benchmark datasets:

• For the image classification domain we used the MNIST [28] and CIFAR-10 [27] datasets. The MNIST handwritten digits dataset consists of 60,000 training and 10,000 testing grayscale images of dimensions 28×28pixels, equally divided in 10 classes. The CIFAR-10 [27]

Algorithm 2: Verify Input: Model: W , Str: key Output: bool: b Data: Int: wtm length, Str: message, Model: W T 1 seed, ldpc seed ← seed gen(key) 2 ldpc ← init ldpc(ldpc seed) 

We evaluate TATTOOED watermark performance on different DNN architectures. 

We used two different watermarks of different sizes for watermarking a model using TATTOOED, namely a short text watermark and a large image watermark.

• text-watermark corresponds to the "TATTOOED watermark!" text sequence corresponding to 152 bits. • image-watermark corresponds to an image of 1KB. For simplicity, we refer to them as the text and the image watermark, respectively.

In this section, we provide a step-by-step, thorough evaluation of TATTOOED and show that it satisfies all the requirements described earlier in Section II-B.

A good watermark should not deteriorate the model's performance on its original task. To this end, we compare baseline performance to the TATTOOED model's performance on different model architectures and datasets (Section V). Table I displays the comparison between the performance of the baseline and TATTOOED watermarked models on the respective test sets. For CIFAR10, MNIST, and ESC-50, the performance corresponds to the classification accuracy (in %), while for Wikitext-2, the performance corresponds to the perplexity. All models were trained for 60 epochs. The watermark used is the text-watermark corresponding to 152 bits. We used the γ-value equal to 9 × 10 −2 to embed the watermark as it allows the watermark accuracy to be 100% while also allowing the TATTOOED model to have comparable performance with the baseline in all the considered tasks and models architectures. The TATTOOED model's performances shown in Table I demonstrate that the use of CDMA spreadspectrum technique to perform watermarking does not impact the performance of the trained ML model on the intended task. This is due to the way CDMA works. The added signal for each bit of the watermark is spread over the model parameters using a low γ value and the strength of a CDMA-based signal (like TATTOOED watermark) increases in proportion to the length of the spreading code. In this way, using a longer code, combined with a small γ-value, does not incur significant changes to the model's weight parameters or impact the model's performance on the intended task. From this evaluation, we can see that the TATTOOED watermark does not significantly affect model performance, thus it satisfies the Functionality-Preserving requirement.

Robustness is the ability of the watermark to survive removal attacks so that legitimate owners can still perform a reliable copyright claim on their models.

The state-of-the-art watermark removal techniques in ML models are based on two main techniques, fine-tuning and model parameter pruning [8] , [44] , [29] . Fine-tuning is a way of applying or using transfer learning. Specifically, finetuning is a process that takes a model that has already been trained for one given task and then tunes or tweaks the model to make it perform a second similar task. This approach can be used on the same task using other unseen data to improve the performance of the model. Parameter pruning is a technique that removes neurons from an already trained neural network while preserving the performance of the model on the intended task. For completeness, in this work, we consider also Parameter Shuffling as a method to disrupt the watermark. Parameter Shuffling consists of reordering the DNN weight parameters while also maintaining adequate performance on the intended task. Fine-tune all layers (FTAL) Fine-tune all layer is a common fine tuning technique to either boost the performance of a pre-trained model on the task using new data. To evaluate the robustness of TATTOOED versus a FTAL removal technique, we performed two experiments: with MNIST and with CI-FAR10. First, we trained the multi-layer perceptron model on 40,000 MNIST instances for 100 epochs, watermarked it using the text-watermark with γ = 9 × 10 −2 and then performed FTAL using the remaining 10,000 MNIST instances for another 100 epochs. Then, we trained the VGG-16 model on 40,000 CIFAR10 instances for 100 epochs, watermarked it using the image-watermark with γ = 9×10 −2 and then performed FTAL using the remaining 10,000 CIFAR10 instances for another 100 epochs.

Results. After each of the fine-tuning epochs, we used the Verify algorithm to check the presence of the watermark in the model. In every epoch, the watermark presence was successfully verified with a watermark accuracy of 100%, showing that the FTAL technique is ineffective in removing the TATTOOED watermark from the models.

Re-train all layers (RTAL). To evaluate the robustness of TATTOOED watermarks against RTAL (i.e., fine tuning the model on a dataset that is the same size as the one used in training), we split MNIST and CIFAR10 into two parts, one for the initial training and one for retraining. Each of these splits contained 30,000 images. We trained the multilayer perceptron model on the first half of the MNIST and CIFAR10 dataset and used TATTOOED to watermark it. After that, we used the second half of the dataset to train the model for over 1,000 epochs.

Results. Figure 2a , shows the validation accuracy of the model trained on the MNIST task. We watermarked the model at epoch 100 (yellow vertical line) with the text watermark using γ = 9 × 10 −2 . During the retraining phase, the validation accuracy of the model increases because it is being trained on new data. After every epoch, we checked if the watermark was present in the model, and confirmed that the watermark was unaffected even after 1,000 epochs. The changes that occurred to the model's weight parameters during retraining (RTAL) were not sufficient to eliminate the TATTOOED watermark. Figure 2b displays the validation accuracy on the CIFAR10 task. Similar to MINST, we trained the model on the first half of the CIFAR10 dataset for 100 epochs and then watermarked it with the image watermark using γ = 9 × 10 −2 . Afterwards, we retrained the model using the second half of the dataset for 1,000 epochs. The retraining (RTAL) phase caused a significant change on the model parameters (indicated by the large increase in validation accuracy); however, this change was not sufficient to remove the image-watermark embedded with TATTOOED. These experiments highlight the ability of TATTOOED watermark to resist significant changes caused by retraining the model using a dataset of the same size as the one used in the initial training. Although these changes can be quite drastic the watermark is unaffected.

In each of the fine-tuning epochs, we used the Verify algorithm to check the presence of the watermark in the model. In every epoch, the watermark presence was successfully verified with a watermark accuracy of 100%, showing that the RTAL technique is ineffective in removing the TATTOOED watermark from the models.

Fine Tuning using REFIT [8] To evaluate the robustness of the TATTOOED watermark, we tested it against REFIT [8] , a state-of-the-art watermark removal technique. REFIT is agnostic to the technique used to watermark a DNN model, and is able to remove a portion of watermark sufficient to make a wide range of prior neural network watermarking schemes unreliable, such as [45] , [40] , [11] , [6] , [1] , [7] , [24] , [32] . REFIT proposes four separate FTAL-based watermark removal techniques. The first is based on traditional fine-tuning, the second relies on unlabeled data augmentation (AU), the third relies on elastic weight consolidation (EWC), and the fourth simultaneously employs AU and EWC to remove the watermark. After we trained our model and watermarked it using TATTOOED, we employed REFIT fine tuning techniques for the same number of epochs that we used in training to test the robustness of TATTOOED watermark towards this attack. Typically, fewer fine tuning steps are used; otherwise, the adversary attempting to remove the watermark would have to spend the same amount of computation power as that needed to train the network from scratch. We used the original REFIT implementation in our experiments. 2 Results. Figure 3 displays the performance for each of the REFIT watermark removal techniques against TATTOOED. We trained the VGG-16 on the CIFAR10 dataset for 100 epochs and used TATTOOED to watermark the model with the text-watermark with a γ = 9 × 10 −2 . The parameters used to configure REFIT are the ones suggested in their official repository [8] and are displayed in the captions of the figure 3.

Beside using Verify 2 algorithm to check the presence of the watermark in each epoch, we also used the signal-to-noise ratio (SNR) to measure the amount of disruption that REFIT removal techniques caused to the TATTOOED watermark signal epoch by epoch. SNR is the ratio of signal power to the noise power, expressed in decibels. the SNR calculation is displayed in Algorithm 2. Figure 3a shows the impact on the TATTOOED watermark signal from REFIT's base fine-tuning techniques. The base finetuning slightly decreased the strength of the watermark signal, but it did not impact the watermark verification procedure. Figure 3b shows the impact of the REFIT with AU removal technique. In this case the AU-based fine tuning is able to decrease the strength of TATTOOED watermark signal more than plain vanilla fine-tuning but not enough such as to completely destroy the TATTOOED watermark. Figure 3c shows the impact of the EWC-based fine tuning watermark removal technique. The third approach to slightly decreased the strength of the signal, but not sufficient enough to prevent watermark verification.

The fourth technique to remove the watermark is the combination of AU and EWC fine-tuning. Figure 3d demonstrates that AU+EWC can not disrupt the signal of the TATTOOED watermark, even if it spends the same number of epochs as the training required.

In each of the above-mentioned experiments, in every epoch, the watermark presence was successfully verified with a watermark accuracy of 100%, showing that all four of the REFIT [8] watermark removal techniques are ineffective in removing the TATTOOED watermark from the model.

To highlight that TATTOOED is pushing forward the stateof-the-art in DNN watermarking, in Table II we compare the impact that REFIT has on prior art and on TATTOOED. Following REFIT [8] , in Table II , we display the performance of REFIT on the representative works [51] , [1] , [34] , [47] in the different watermarking embedding schemes. For [51] , [1] , [34] we report the robustness results from REFIT [8] . As we can see, REFIT is able to disrupt the watermark verification of those techniques. Wang et al. [47] report that REFIT is able to disrupt the watermark accuracy thus hindering the verification process, but the REFIT parameters to fully remove the watermark from the model are difficult to find. On the other side, TATTOOED watermark is completely unaffected by REFIT and the watermark accuracy is always 100%, leading to an improvement over prior art in terms of robustness.

Parameter Pruning Typically, parameter pruning is performed by zeroing the weight parameters of the model. To evaluate the robustness of the TATTOOED watermark against parameter pruning technique, we prune the parameters of the watermarked model by varying the amount of pruning from 25% to 99.99%. Table III displays the effects that different amounts of pruning have on the TATTOOED watermark accuracy and on the test accuracy of the VGG16 model trained on CIFAR10 dataset. Table III shows that after pruning up to 99% of the model parameters, the TATTOOED watermark accuracy is still 100%. However, the model performance on the test set deteriorates to 10%, which is no better than guessing among the 10 classes of CIFAR10, thus the model is unable to perform its intended task. The inability of parameter pruning to remove the TATTOOED watermark, even when 99% of the parameters are pruned, is due to the foundations of the TATTOOED watermarking technique. TATTOOED employs CDMA spread-spectrum channel coding to embed the watermark in a randomly selected portion of the models parameters. As mentioned in Section IV, to extract a bit of the watermark b i , the spreading code is multiplied by the vector of the parameters that were selected for watermarking. Even if 99% of them can be zeroed by the random pruning, the remaining non-zero weights contribute enough into correctly decoding b i to allow the detection of the watermark in the model. This ability derives from the inherent capabilities of the CDMA, where the adversary cannot prevent the communication without completely destroying the channel (thus zeroing all the parameters of the model in our case). Another way to perform parameter pruning is to physically remove the pruned neurons from the model, resulting in a new model architecture. This technique is known as model compression [48] . To thwart a removal attack based on model compression, the legitimate owner can compress the model before using TATTOOED for watermarking to provide another level of robustness to the watermark, since once compressed, an ML model size (in terms of parameters) can not be reduced by a large amount without impairing performance.

Parameter Shuffling An adversary might attempt to reorder the parameters in each network layer to prevent the watermark verification. Firstly, the reordering of the parameters should be carefully done to not impact the model performance. Secondly we can find and reverse back to the original order the parameters of the networks as follows. Let W = {W 1 , W 2 , . . . , W k } be our original watermarked neural network. W = {W 1 , W 2 , . . . , W k } represents the resulting network after the adversary shuffles the parameters.

To undo the shuffling we need to compute the cosine distance between each layer of the network (i.e., submatrix).

For simplicity, lets consider the recovery process of W 1 , from the shuffled version W 1 . To compute the cosine distance we first need to compute the vector norm for W 1 and W 1 :

Where T represents the transpose of the matrices and Diag means that we keep the diagonal values of the resulting matrix after the matrix multiplication. Assuming that N orm W1 and N orm W 1 are column matrices, then we can compute the cosine distance like this:

where represents the element wise division. The cosine matrix 12 is a squared matrix, in which the rows indicate the original network order for the neurons in the first hidden layer, and the columns indicate the order of the neurons in the new network. If the network has not been shuffled, the diagonal elements of this matrix will be one. Otherwise, the ones in each row indicate the performed shuffling for that neuron. For example, if the entry (1, 3) in the cosine matrix 12 would be one, it means that the first neuron in the original network has been moved to the third position in the shuffled network. There is only one one per column and row in that matrix. Knowing the shuffled elements, we can undo the shuffling in this layer and perform the same process for every other layer in the network, recursively. After unshuffling every layer, we can recover the watermark using the Verify algorithm 2. In all the above-mentioned experiments, the watermark accuracy is 100%. From this evaluation, we can see that FTAL, RTAL, REFIT [8] , Parameter Pruning and Parameter Shuffling are unable to remove the TATTOOED watermark, thus satisfying the Robustness requirement.

A reliable watermarking technique should exhibit a minimal false negative rate that will allow legitimate owners to identify their intellectual property with high probability. TATTOOED employs CDMA to watermark the model. CDMA directly codes the data by using pseudo-random generated codes known only by the legitimate owner. Typically, the spreading codes are in tens to hundreds of bits, so the signal is only visible when the spreading code is known and the gain of using CDMA is proportional to the length of the code. Due to the large size of DNNs, whose parameters are in the thousands and millions, the gain of using CDMA is higher when we employ it for watermarking a DNN since we can use spreading codes that are in the order of hundreds of thousands, allowing the legitimate owner to reliably identify the presence of the watermark in the model with no false negatives. Moreover, to empirically support this claim we trained 100 different VGG-16 models on the CIFAR10 dataset. Afterward, we randomly choose 50 models and we watermark them using TATTOOED with γ = 9 × 10 −2 , the image-watermark and same key to generate the CDMA spreading codes. The remaining 50 models are left untouched.

We run the Verify algorithm on each of these 100 models and report the results in Figure 4 . From our evaluation, we see that the false negative rate is 0 (as predicted by CDMA theory). Hereby, based on the theoretical foundations on CDMA and the empirical evaluation, TATTOOED is able to satisfy the Reliability requirement.

The integrity of a watermark means that the watermarking technique should exhibit minimal false alarm rates, to avoid erroneously accusing an honest party that has a similar model to the one watermarked. TATTOOED, due to its foundations on CDMA, satisfies this condition. The legitimate owner can extract the watermark only from its own model. The correlation between the model weight parameters where the watermark is embedded and the spreading codes for the extraction of individual bits of the watermark can be performed only on the watermarked model (see Equation 9 ). Moreover, to empirically support this claim we trained 100 different VGG-16 models on the CIFAR10 dataset. Afterward, we randomly choose 50 models and we watermark them using TATTOOED with γ = 9 × 10 −2 , the image-watermark and same key to generate the CDMA spreading codes. The remaining 50 models are left untouched. We run the Verify algorithm on each of these 100 models and report the results in Figure 4 . From our evaluation, we see that the false positive rate is 0 (as predicted also by the CDMA theory). Hereby, based on the theoretical foundations on CDMA and the empirical evaluation, TATTOOED is able to satisfy the Integrity requirement.

A watermark must be able to undeniably reveal the identity of the legitimate owner. TATTOOED, by employing CDMA, allows for the inclusion of large amounts of information in the network without impacting the models performance on the legitimate task (see section VI-A). We empirically showed that TATTOOED allows embedding watermarks in orders of thousands of bits for networks with tens of thousands of parameters. A typical Advanced Encryption Standard (AES) secret key consists of 256 bits and that alone is sufficient to guarantee that the model belongs to the user who owns the secret key. For another entity to be able to claim that they are the legitimate owner of that model, they would need to provide the 256-bit key and to guess a bit sequence of that size has a probability of 1/2 256 . In our experiments we employed two different sized watermarks. One of them consists of 152 bits and the other is 1 kilobyte. Being able to watermark the models with bit sequences ranging from hundreds to thousands of bits is more than enough to undeniably reveal the identity of the legitimate owner and thus satisfy the Capacity requirement.

Secrecy means the watermark should be undetectable by unauthorized parties. CDMA, the building block of TAT-TOOED watermark, ensures secrecy. Only the legitimate owner has the spreading codes, (s)he is the only one who can detect the presence of the watermark in the model. The spreading codes for each bit of the watermark are a random sequence of ±1 with length W that the legitimate owner has generated, and any entity without the spreading codes can see the model parameters, but get no information from them. Figure 5 shows the distribution of the weight parameters of the same model with and without the watermark. The model architecture is the multi layer perceptron trained on the MNIST dataset. The TATTOOED model is watermarked using the textwatermark using γ = 9 × 10 −2 . The distribution of the weight parameters is nearly identical between the watermarked and non-watermarked model providing no information revealing the presence of the watermark in the model. Based on the theoretical foundations of TATTOOED on CDMA and our empirical evaluation, TATTOOED satisfies the Secrecy requirement. 

TATTOOED watermarking consists of two individual computations to embed the watermark: 1) multiplying the spreading code with the watermark content to create the watermark signal and 2) adding this signal to the model parameters selected to carry the watermark (Eq. 8). In today's computers, matrix multiplication and addition is extremely fast, especially when exploiting the parallelism advantages offered by general processing units (GPUs). This allows TATTOOED watermarking to be embedded and verified in a short amount of time, typically in few seconds even for watermarks and spreading codes length in the order of thousands.

Prior watermarking techniques require modifications to the training regime [43] , [1] and multiple epochs for embedding.

To evaluate TATTOOED performance in Mark and Verify procedures, we measured the time (in seconds) that TAT-TOOED requires to embed and verify the watermark from a neural network containing about 200K parameters, while embedding the watermark using different portions of the network starting from 12.5%. The results in Table IV show that TATTOOED needs less than 12 seconds to watermark a model with about 200K parameters and less than 2 seconds to verify whether the watermark is present in the model. In contrast, watermarking methods such as [43] , [1] , due to the model modifications they incur to watermark a model, require multiple training epochs (e.g., an epoch on MNIST requires around 10 seconds on a high end desktop GPU). Based on our empirical evaluation and the comparison with prior neural network watermarking techniques, TATTOOED satisfies the Efficiency property.

Unforgeability means that another entity cannot claim ownership of a legitimate owner's model. As previously explained (Section IV), TATTOOED employs randomly-generated spreading codes in the orders of thousands. Relying on CDMA theory, the only way to detect (and thus overwrite) the watermark is to know the original spreading codes. For each watermark bit, another entity has to guess the correct spreading code used which is impossible due to the gargantuan dimension of the search space. Another entity can embed another watermark in the model using TATTOOED or other techniques; however, the CDMA properties prevent the removal of the original watermark, protecting the legitimate owner's copyright claim to the model. Under copyright law, an author of an original work automatically owns the copyright to that work, preventing anyone else from using or replicating it. To circumvent any dispute to ownership created by an adversary inserting a fraudulent watermark into a model, the legitimate owner can register the copyright when the watermark is inserted using TATTOOED, providing an upper hand in the legal system in the event that the need arises. To this end, the reliance of TATTOOED on CDMA to watermark a DNN model allows it to satisfy the Unforgeability requirement for digital watermarking.

The watermark should provide a strong, easily verifiable link, such as information directly related to the legitimate owner of the model. TATTOOED allows arbitrary watermark content to be embedded in the network. Content from text messages (such as company name, fiscal code, etc.) to images (such as the legitimate owner's signature, etc.) can be included in the TATTOOED watermark. Due to TATTOOED reliance on CDMA to embed the watermark, this content can be extracted from the ML model only by the legitimate owner who possesses the spreading code (Section II-C). The ability of TATTOOED to allow arbitrary watermarks that strongly link the legitimate owner's identity with the ML model satisfies the Authentication requirement for digital watermarking.

The generality property ensures that the application of a watermarked model is not restricted to a particular neural network architecture or dataset. We verified the generality property of the TATTOOED watermark by employing it on two image classification tasks (MNIST and CIFAR10), one audio classification task (ESC-50), and one language modelling task (Wikitext-2), and various model architectures (multi layer perceptron, CNN and RNN). On our empirical evaluation in multiple datasets and DNN architectures, TATTOOED was successfully applied without any model-or dataset-specific requirements, satisfying the watermark generality property.

In the early 2000s, spread spectrum techniques were applied to watermark digital images [21] , [10] . Hernández et al. [21] , present a spread-spectrum-like discrete cosine transform domain watermarking technique for copyright protection of digital images. Cox et al. [10] propose to insert a watermark into the spectral components of the data using techniques analogous to spread spectrum communications, hiding a narrow band signal in a wideband channel (i.e., multimedia data). The work by Uchida et al. [43] , [33] is the first attempt to watermark a DNN. The embedding is made possible by using an extra regularization term in the cost function that introduces a statistical bias on the weight parameters of the learned ML model. This statistical bias is then used to infer information about the ownership of the ML model. Wang et al. [45] insert an independent neural network that allows the use of selective weights to watermark another ML model. This independent model is kept secret and serves as the watermark verification tool. The watermark is embedded by training the two neural networks simultaneously. In this procedure, the independent neural network uses selected target parameter weights of the model to be watermarked to embed the watermark information.

To extract the watermark, the input layer of the independent neural network is connected to the selective weights of the marked neural network, and the watermark verification is done by observing the outputs of the independent neural network. Song et al. [40] present both a white-and blackbox watermarking technique with the goal of embedding information related to the private training dataset into the trained model's weight parameters. In the white-box case, they do this by either encoding sensitive information about the private training dataset in the least significant bits and the signs of the model's weight parameters, or, similarly to Uchida et al. [43] , [33] , by including an extra regularization term to the loss function. DeepSigns [11] works by embedding the watermark in the probability density function of the data abstraction obtained by a DNN layer. The watermark verification is done by using a subset of training data, querying the model with them, and computing the statistical mean value of the activation features obtained by passing these input data in the model. The acquired mean values are used to extract the watermark content. DeepMarks [6] improves upon DeepSigns [11] by constructing the watermarks using anti-collusion codebooks. The watermark, similar to [6] , is embedded in the probability density function of the weights by incorporating a watermark-specific regularization loss. Along the same line of thought as Uchida et al. [43] , [33] , Wang et al. [47] , propose a GAN-based adversarial training technique that encourages the weights distribution of a watermarked model to be similar to the weights distribution of a nonwatermark model.

Black-box watermarking techniques account for the impossibility of physically accessing the ML model to verify the presence of the watermark. As such, they rely on querying the ML model and correlating the information from the predictions the ML model gives to infer if the watermark is present. Adi et al. [1] watermark construction is based on the concept of backdooring neural networks [19] . The watermark consists of an unique set of input instances to which a random label, from the model's output space, is assigned. Watermarking the deep neural network requires only training the network, beside the original training set, also on the watermark instances. Similarly, Chen et al. [7] encode the watermark using a set of image and label pairs that are designed using targeted adversarial attacks, each of which representing a bit of the watermark. Jia et al. [24] leverage the soft nearest neighbor loss to entangle representations extracted from training data and watermarks, forcing the model to classify data from the task distribution while simultaneously being able to predict the pre-chosen watermark instances. Le Merrer et al. [32] use embed the watermark via adversarial training the ML model along a set of true and false adversarial examples such that they are correctly classified. The watermark verification consists of observing the prediction that the model gives to such adversarial examples. Yang et al. [50] embed the watermark into the same neural connections that are responsible for representing the main classification task by adding an in-grainer loss to normal model's loss function. Namba et al. [34] watermark a model by selecting a trigger dataset from training data and assigning wrong labels as watermark keys and then use exponential weighting to enforce watermark predictions during training. Szyller et al. [41] aim at defending model extraction attacks by changing a small amount (e.g., ≤ 0.5%) of query answers given by the prediction API, to introduce a watermark into the extracted model.

Several works have been proposed to assess watermark resilience to malicious adversaries trying to remove [22] , [46] , [34] , [44] , overwrite [46] , forge [49] , [15] and detect them [46] , [38] . Wang et al. [44] propose a generalized technique to detect and mitigate backdoors in DNNs that can be applied to mitigate backdoor-based watermarking schemes. Chen et al. [8] propose REFIT, a framework to remove watermarks through fine-tuning. They show that making a model forget part of the task it was initially trained on, can effectively remove watermarks. They do that by devising a learning rate schedule, fine-tuning with additional data, and using elastic weight consolidation [26] . We demonstrate the robustness of TATTOOED against the removal techniques proposed by Chen et al. [8] in Section VI-B. Wang et al. [46] demonstrate that many watermarking schemes that embed watermarks into model parameters, change the model weight distribution, thus making the watermarks detectable. Additionally, they propose an attack to disrupt watermarks embedded with the methods of Uchida et al. [43] , detecting the watermark by analyzing the model parameters distribution. We show that TATTOOED does not alter the distribution of the model parameters in Section VI-F, thus making the attack proposed by Wang et al. [46] ineffective. Shafieinejad et al. [38] present a technique to remove watermarks in the white-box and black-box settings by exploiting the fact that backdoor-based watermarks use additional data points as trigger for watermarking. These points are outliers and applying methods to avoid over-fitting is useful to remove these kinds of watermarks. This technique is ineffective against TATTOOED since the TATTOOED watermark does not introduce a backdoor in the model. Xu et al. [49] propose an attack that can forge the backdoor-based watermark by finding the trigger inputs and then claiming ownership of the model. TATTOOED does not use any trigger inputs, thus making this type of attack ineffective. Moreover, due to TATTOOED foundations on CDMA, only the proprietary of the spreading codes (i.e., the key to generate them) can retrieve the watermark and claim legitimate ownership.

In this paper we introduced TATTOOED, a novel whitebox neural network watermarking technique based on CDMA spread-spectrum channel-coding. Our extensive evaluation showed that TATTOOED watermark incurs no penalty on the model performance, and it can be applied to any model architecture and task without modifications.

We demonstrated that TATTOOED watermark remains unaffected by state-of-the-art watermark removal techniques based on fine-tuning and model parameter pruning thus improving the state-of-the-art for white-box watermarking of neural networks. Moreover, TATTOOED is significantly more efficient to watermark and verify the watermark presence in a DNN compared to previous watermarking techniques. The combination of robustness, practicality, and efficiency make TATTOOED a strategy of choice for DNN IP protection.

Turning your weakness into a strength: Watermarking deep neural networks by backdooring

Ask the GRU: Multi-task learning for deep text recommendations

Invertible residual networks

A survey on model watermarking neural networks

Random forests

Deepmarks: A secure fingerprinting framework for digital rights management of deep learning models

Blackmarks: Blackbox multibit watermarking for deep neural networks

Refit: A unified watermark removal framework for deep learning systems with limited data

Shieldfs: A self-healing, ransomware-aware filesystem

Secure spread spectrum watermarking for images, audio and video

Deepsigns: An end-to-end watermarking framework for ownership protection of deep neural networks

Encod: Distinguishing compressed and encrypted file fragments

The naked sun: Malicious cooperation between benign-looking processes

BERT: pre-training of deep bidirectional transformers for language understanding

Rethinking deep neural network ownership verification: Embedding passports to defeat ambiguity attacks

Low-density parity-check codes

Deep Learning

Speech recognition with deep recurrent neural networks

Badnets: Identifying vulnerabilities in the machine learning model supply chain

Deep residual learning for image recognition

Dct-domain watermarking techniques for still images: detector performance analysis and a new structure

Evasion attacks against watermarking techniques found in mlaas systems

Fedcomm: Federated learning as a medium for covert communication

Entangled watermarks as a defense against model extraction

A style-based generator architecture for generative adversarial networks

Overcoming catastrophic forgetting in neural networks

Learning multiple layers of features from tiny images

MNIST handwritten digit database

Neural attention distillation: Erasing backdoor triggers from deep neural networks

How valencia crushed covid with ai

Pointer sentinel mixture models

Adversarial frontier stitching for remote neural network watermarking

Digital watermarking for deep neural networks

Robust watermarking of neural network with exponential weighting

ESC: Dataset for Environmental Sound Classification

Modern Coding Theory

Learning with kernels: support vector machines, regularization, optimization, and beyond

On the robustness of backdoor-based watermarking in deep neural networks

Very deep convolutional networks for large-scale image recognition

Machine learning models that remember too much

DAWN: Dynamic Adversarial Watermarking of Neural Networks

Principles of Spread-Spectrum Communication Systems

Embedding watermarks into deep neural networks

Neural cleanse: Identifying and mitigating backdoor attacks in neural networks

Watermarking in deep neural networks via error back-propagation

Attacks on digital watermarks for deep neural networks

Riga: Covert and robust white-box watermarking of deep neural networks

Quantized convolutional neural networks for mobile devices

A novel method for identifying the deep neural network model with the serial number

Effectiveness of distillation attack and countermeasure on neural network watermarking

Protecting intellectual property of deep neural networks with watermarking