Posts

Showing posts from July, 2019

Day 30: fast.ai Lesson 5 - Extrapolation and RF from scratch I

I watched half of the lesson on Lesson 5 Introduction to machine learning for coders about Extrapolation and RF from scratch.

Day 29: fast.ai Lesson 4 - Feature importance, tree interpreter II

Introduction to Machine Learning for Coders Lesson 4 - Feature importance, tree interpreter .the rest from yesterday.

Day 28 fast.ai Lesson 4 - Feature importance, tree interpreter !

This lesson is 1:40 long, so I decided to divide into 2 parts. Half today and half tomorrow. Introduction to Machine Learning for Coders Lesson 4 - Feature importance, tree interpreter .

Day 27: Introducing Differential Privacy revisited

In the middle of my busy works, preparing for new jobs and finishing current one, I want to understand once more about the basic of this course. First concept is Privacy. Hence I rewatched lesson 3 Introducing Differential Privacy. Plan to attend  #sg_study_jahm  virtual meeting

Day 26: Differential privacy for deep learning revisited

Yesterday, I chose to a have one break day due to my busy days to prepare to start my new job in the other town. In addition, I have finished all the lessons in this course. Time to revise the study again. Today, I revisited the lessons on Differential privacy for deep learning in lesson 6.

Day 25: Build an encrypted database and encrypted deep learning

I watched lesson 9 Encrypted deep learning concepts 4 to 9. Finish the course. I will revise some material again starting tomorrow.

Day 24: Lesson 9 Encrypted deep learning

Watching lesson 9 encrpyted deep learning concepts 1-3.

Day 23: Lesson 8 Securing federated learning

I am still catcing up with my study in this course after a week of jobs interviews and orientations. But still, I want to spare some times for study this course, although only watching lesson 8 Securing federated learning.

Day 22: fast.ai Lesson 3 - Performance, validation and model interpretation

Just comeback to myhome town after 7 hours train journey, my activity is only watching lesson 3 fast.ai Introduction to machine learning - Performance validation and model interpretation.

Day 21: fast.ai Lesson 2 Random forest deep dive

Since I got a job interview almost the whole day, my activity is only watching lesson 2 fast.ai Introduction to machine learning - Random forest deep dive

Day 20: fast.ai Lesson 1 - Introduction to random forest

I am trying to watch lesson 1 of fast.ai Introduction to Machine Learning for Coders.  http://course18.fast.ai/lessonsml1/lesson1.html My note on todo: 1. Installing jupyter. I want to do it in my own computer later on. Create some instructions if possible. 2. Data Science vs Software Engineering....I am looking at Data Science as more like using excel to do works while Software Engineering is creating 'excel' like application to do the work. Data Science more to prototyping while Software Engineering is a producing a finish product. 3. Introduction to random forest. Another article to read,  https://towardsdatascience.com/random-forest-3a55c3aca46d

Day 19: Federated learning

Image
Federated kearning is another tehnique to preserve the privacy. While we use noise in local or global privacy policy, we send the model to each data owner and train them there and only get the trained model as a return from the data owner. The data will never leave the owner. See discussion below, The rest activity is watched lesson 7 Federated Learning videos.

Day 18: Differential privacy for deep learning and PATE analysis revisited

I think I cannot just move on before understanding this. Hence, today, I revisited and rewatched all the videos in lesson 6. I read this article about PATE analysis,  https://towardsdatascience.com/understanding-differential-privacy-85ce191e198a

Day 17: PATE analysis

I watched lesson 6 concept 4, 5, 6 and 8. Dowing notebook on the PATE analysis Still wondering what is PATE analysis means.... Need to read again tomorrow. Watch lesson 1 on fast.ai course.

Day 16: Differential privacy for deep learning

Image
Today, I learned from this discussion, Sensitivity is depending on the query. It is a distance between full database(size = n) with a parallel databases ( n database with @ size = n-1). For example, when the query is sum, then sensitivity = 1. Difference if we remove one entry in database is 1 since the database consists only 0 and 1. While for mean query, the sensitivity will be 1/len(db). @Karan Khishinani sent me the definitions about sensitivity and epsilon: My activity continue on watching lessons 6  concepts 1, 2, 3 and interviews on 7 Differential Privacy at Apple. Next will be watching webinar. While waiting. I am browsing the fast.ai website.

Day 15: Global differential policy

Revisit the formal definition of differential policy, Epsilon and delta measure the threshold for leakage Epsilon is a maximum distance between query on full database with query on parallel database. However, it sometimes does not hold with a probability of delta. The amount of noise necessary to be added depends on 1. The type of noise, there are two types of noise, Gaussian noise and Laplacian noise. Laplasian noise is used in this course. It has delta value always 0. 2. sensitivity of query 3. desired epsilon 4. desired delta When to use local and when to use global differential policy. When the data is too sensitive that people are unwilling to give to the trusted curator, then local differential policy is likely to be used. While, when people are not reluctant to give the data to the data curator, it would be great if we delay giving noise near the end of the function to get a very high accuracy.

Day 14: Formal definition of differential policy

Image
I watched the rest lesson on lesson 5. Image From: "The Algorithmic Foundations of Differential Privacy" - Cynthia Dwork and Aaron Roth - https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf Thus, this definition says that FOR ALL parallel databases, the maximum distance between a query on database (x) and the same query on database (y) will be e^epsilon, but that occasionally this constraint won't hold with probability delta. Thus, this theorem is called "epsilon delta" differential privacy. I am still not quite understand the paragraph definitions above. Let us find out what is epsilon and what is delta. Epsilon Epsilon Zero: If a query satisfied this inequality where epsilon was set to 0, then that would mean that the query for all parallel databases outputed the exact same value as the full database. As you may remember, when we calculated the "threshold" function, often the Sensitivity was 0. In that case, the epsilon also happene...

Day 13: Varying the amount of noise

Image
I watched the video and run the notebook, Expectation is that with smaller size of the dataset, adding more noise will make the difference between true dataset and sk_result is bigger. However, when the dataset is bigger, adding more noise will not make the difference between true dataset ans sk_result not as bigger as smaller dataset.. I experiment with some values like below, The result shows that for true dataset (without noise) > 0.5, increasing the first coin flipped will increase the with noise percentage. While for true datase < 0.5, increasing the first coin flipped will decrease the with noise percentage. It makes sense since the true dataset below 0.5, the with noise will move towards the mean of the true dataset. It also work for the true datase above 0.5, the with noise will move towards the mean of the true dataset.

Day 12: Local differential policy

Image
As suggested by @Aniket Thomas, I move on to Lesson 5 today. Yesterday I watched the introduction video which explained that Global differential policy is adding noise on the database as a whole while Local differential policy is adding noise to each individual data. The database owner called Trusted Curator. Today, I am moving to the next video. The example given about jaywalking with flipping coin "noise" is very clearly explained and I am better at understanding the sensitivity of data. The dataset without noise would be not too sensitive or very sensitive depending on the query. We can attack the dataset using differencing attack to violate the privacy of dataset. One way to preserve the privacy of the dataset, we can add noise to each of the data. The example is by adding the flipping coin twice technique. If the first flip is head, answer honestly, if it is tail, answer depends on the second flip (head is true and tail is false). Hence, half of the time they will an...

Day 11: Introduction to perform differencing attack

Image
Yesterday, I was confused with the sensitivity means. Is that depending on the query or the dataset? Below were discussion I made in slack, As suggested by @Aniket Thomas, I will go to the next lesson to find out more about this. Next I watched the last topic on lesson 4. I also watched the ontro for lesson 5. Global and local differencial policy. In addition to starting discussion about ethics in AI. 

Day 10: DIfferential privacy - Evaluating the differential privacy

Today I revisited Lesson 3 since I am curious about anonymousity. Why it is not enough. The lesson clearly give example why it is not enough. Although, we already take care the privacy our dataset. Other people may accidentally release the sensitive information which will lead to exposure of the information of our anonymized dataset. Next, I tried the workbook of private-ai. Lesson: Towards Evaluating The Differential Privacy of a Function. Sensitivity value is max value of the difference between each parallel db query result with the query result for the entire database. I am still not quite understand with the sensitivity and sensitivity with threshold. I would like to ask on the slack.

Day 9: Differential policy introduction

I paused my exploration on transfer learning materials. I will look at it again later if I finished with the course. Next is Lesson 3 Introducing to Differential Policy. Data scientists face dilemmatic problem. In one hand, they need a lot of information as possible, while on the other hand, they will not have free access in respect to the privacy of the community. Differential privacy in this lesson is to make sure that learning process in out neural network will only learn data only they supposed to learn without accidentally learning from data they supposed not to learn Cynthia Dwork definition, Differential Privacy is describing a process by data holder to a data subject, and the promise is like this, Parallel database is simple a database with one entry removed That's what I learned for today.

Day 8: Transfer learning - Impact of the number of hidden layers 4

Yesterday, I found out that additional hidden layer does not have any impact on the time to process each batch. Today, I want to find out whether it has any impact on the test loss and accuracy. The result for one hidden layer, epochs steps processing_time/batch training_loss test_loss accuracy 1/3 5 9.998 2.583 1.164 0.504 1/3 10 10.004 0.817 0.242 0.911 1/3 15 10.019 0.247 0.103 0.970 1/3 20 10.009 0.177 0.132 0.953 1/3 25 9.999 0.197 0.075 0.975 1/3 30 9.986 0.221 0.066 0.978 1/3 35 10.028 0.214 0.062 0.978 1/3 40 10.017 0.164 0.058 0.977 1/3 45 10.005 0.112 0.069 0.972 1/3 50 9.959 0.228 0.055 0.978 2/3 5 9.866 0.197 0.060 0.978 2/3 10 10.019 0.178 0.084 0.967 2/3 15 9.690 0.252 0.064 0.977 2/3 20 10.035 0.295 0.054 0.983 2/3 25 9.698 0.148 0.061 0.981 2/3 30 10.009 0.140 0.113 0.958 2/3 35 9.697 0.215 0.196 0.927 2/3 40 10.030 0.275 0.053 0.978 2/3 45 9.724 0.213 0.042 0.984 2/3 50 10.020 0.148 0.053 0.983 3/3 5 10.193 0.133 0.045 0.984 3/3 10 10.004 0.164 0.041 0.984 3/3 15 9...

Day 7: Transfer Learning - Impact of the number of hidden layers 3

Still looking on how to measure the impact of increasing the number of hidden layers on training. I am comparing in difference pre trained network using densenet 121 and resnet101, one and 10 hidden layers, the results are not expected that all values are similar. One hidden layer and 10 hidden layers make no difference in term of training time. # resnet101 - one hidden layer epochs step processing_time/batch training_loss 1/5 1 0.499 0.688 1/5 2 0.460 3.413 1/5 3 0.438 3.209 1/5 4 0.427 3.816 1/5 5 0.428 0.382 1/5 6 0.429 1.813 1/5 7 0.428 1.739 1/5 8 0.430 0.169 1/5 9 0.430 0.635 1/5 10 0.432 0.469 2/5 1 0.428 0.515 2/5 2 0.422 0.413 2/5 3 0.437 0.337 2/5 4 0.430 0.354 2/5 5 0.436 0.266 2/5 6 0.425 0.763 2/5 7 0.431 0.299 2/5 8 0.437 0.446 2/5 9 0.422 0.278 2/5 10 0.424 0.143 3/5 1 0.434 0.403 3/5 2 0.435 0.085 3/5 3 0.428 0.211 3/5 4 0.434 0.261 3/5 5 0.425 0.108 3/5 6 0.433 0.239 3/5 7 0.432 0.201 3/5 8 0.423 0.230 3/5 9 0.433 0.143 3/5 10 0.443 0.2...

Day 6: Transfer learning - impact of the number of hidden layers 2

I modified my code for training and testing processes like below, epochs = 1 steps = 0 running_loss = 0 print_every = 10 # Start timer start = time.time() for epoch in range(epochs):     for inputs, labels in trainloader:         if steps > 200:             break                      steps += 1                  # Move input and label tensors to the default device         inputs, labels = inputs.to(device), labels.to(device)                  optimizer.zero_grad()                  logps = model.forward(inputs)         loss = criterion(logps, labels)         loss.backward()         optimizer.step()   ...