SecondLook x Ocean Protocol (Proposal Round 4)

SecondLook
6 min readApr 1, 2021

Share & sell data with zero privacy risks using AI synthetic data generation.

Problem

Introduction

The world is producing more and more valuable data.

“90% of the world’s data was generated in the last two years with 2.5 quintillion bytes data being created each day.” — Forbes

But more and more of these data in companies are being restricted from being bought, sold or shared because of increasing privacy regulations.

“By 2023, 65% of the world’s population will have its personal data covered under modern privacy regulations, up from 10% in 2020.” — Gartner

Privacy laws increasing being implemented and enhanced around the world:

Europe: General Data Protection Regulation (GDPR) → 25 May 2018

Singapore: Enhanced Personal Data Protection Act (PDPA) → 01 Feb 2021

USA: California Consumer Privacy Act (CCPA) → 01 July 2020

India: Personal Data Protection Bill → Proposed 11 Dec 2019

Ocean Protocol’s team have rightly identified that the most valuable data is private data — using it can improve research and business outcomes. But concerns over privacy and control make it hard to access.”

Could concerns over privacy and control could be a reason why data accumulated and trapped in companies, unanalysed and unsold?

“Nearly 97 percent of data sits unused by organizations” — Gartner

Problem 1: Even when data is being shared and sold, it has to be anonymised beforehand, which results in a loss of information and reduces its utility.

Imperva explains in detail the current methods of anonymisation:

Data masking: hiding data with altered value

Generalization: deliberately removes some of the data to make it less identifiable

Pseudo-anonymisation: replace private identifiers with fake identifiers or pseudonyms

Data perturbation: modifies the original dataset by rounding numbers and adding random noise

The more anonymisation techniques applied, the greater the information loss which reduces its utility for the data buyer.

What if there is a method that resolves the trade-off between data privacy and utility?

“Compute-to-data resolves the tradeoff between the benefits of using private data, and the risks of exposing it. It lets the data stay on-premise, yet allows 3rd parties to run specific compute jobs on it to get useful compute results like averaging or building an AI model.” -Ocean Protocol

Problem 2: However, compute-to-data still requires trust in the algorithm to ensure data does not get exposed.

Ocean Protocol Technical Whitepaper: Section 3.7.7 (Compute-to-Data Trusting Algorithms)

Solution

SecondLook is a data-as-a-service platform that allows users to generate realistic and privacy-safe synthetic data from sensitive personal data.

Synthetic data generated would have similar structure and statistical properties as the original data without privacy compliance or data exposure risks because the synthetic data cannot be attributed back to any individual record from the original data.

We are using the same AI models behind deepfakes that had enabled the realistic generation of human faces that you may have seen in the media. Namely, we are focusing on Generative Adversarial Networks (GANs) that have shown impressive improvements over previous generative methods.

The synthetic data generation model can be used with or without Compute-to-Data.

  • With Compute-to-Data: This adds an additional layer of privacy since the synthetic data provided is realistic but cannot be traced back to the original private data.
Synthetic Generation Model added to Compute-to-Data
  • Without Compute-to-data: Instead of the same data hosted as an “encrypted data storage URL” which is distributed to any data buyer, synthetic data generation allows the data seller to generate a unique synthetic dataset per data buyer which is unique and traceable.

What is the final product?

The final product is a web application that would be available at https://www.secondlook.ai/. We have bought the domain and are working towards our first MVP.

The easy-to-use web application would allow users to:

  1. Input: Upload and preview sensitive personal data
  2. Process: Automatically identify the data structure and select the most appropriate generative model
  3. Output: Generate synthetic data
  4. Metrics: Evaluate the utility & privacy of generated data
  5. Report: One-stop compliance report
Web application user flow preview: Anonymisation vs Synthesis

How does this project drive value to the Ocean ecosystem? This is best expressed as Expected ROI, details here.

We drive value to the Ocean ecosystem by taking out one of the biggest friction of sharing and selling private data — the privacy risk. Data owners need to be assured that the data they publish is privacy compliance and risk free.

Our synthetic data generation model would allow data owners to convert their sensitive data into realistic privacy-preserving synthetic data that has both high utility with no privacy risk. This is beneficial for both data owners and data buyers.

If Gartner’s estimates that “nearly 97 percent of data sits unused by organizations” is accurate, only 3 percent of data is being utilised, shared or sold now.

We think our solution would encourage more companies to share their untapped private data and capture a fraction of the unused 97% of data. This would increase the Total Value Locked (TVL) from the total OCEAN staked in data token pools. The demand for staking OCEAN drives demand for OCEAN and therefore drives $OCEAN.

Conservatively, we expect the chain effect to increases the value of $OCEAN by 1%, at a total market cap of ~$600m, we would create a value of $6m.

Bang = USD 6m

Buck = Grant size = 10K OCEAN = USD 15k

Expected ROI = Bang / Buck * Estimate % chance of success= USD 6000k / USD 15k * 0.75 = 400 x 0.75 = 300

We expect an expected ROI = 300 >1.

Project Deliverables — Roadmap

Any prior work completed thus far?

  • 30 completed interviews with privacy officers, data protection officers, c-suite executives from mid-large enterprises to understand the legal challenges and requirements for companies to share and sell data
  • 7 interested parties from banks, consultancy firms and co-working spaces to try our demo
  • Preliminary evaluation and testing of Generative Adversarial Models (GANs) variants

Project Roadmap — Key Milestones:

Q2, 2021 — Immediate priorities

  • Build a synthetic data generative model MVP and provide API gateway access for developers
  • Develop automatic evaluation pipelines of the utility and privacy of generated data
  • Develop easy-to-use interface web application interface for easy access to non-developers for greater adoption

Q3, 2021 — Possible developments (ideas taken from Ocean Protocol’s Technical Whitepaper)

  • Integration as an Ocean Protocol’s Compute-to-Data Flow Variant
Ocean Protocol Technical Whitepaper: Section 3.7.3 (Compute-to-Data Flow Variants)
  • Ocean Market fork focused on synthetically generated private data
Ocean Protocol Technical Whitepaper: Section 8.7 (Data Marketplaces Forks)

Team’s future plans and intentions:

  • This grant would be the first investment in our team which would be used to cover the cost of:
  1. Hiring to develop frontend and backend infrastructure
  2. GPU and server compute to test and train our AI models
  • We hope to use the Ocean Protocol’s data marketplace as our first case study to show the value proposition of synthetic data generation in encouraging companies to share their sensitive data by reducing the privacy risk involved.
  • Our broader vision is to remove the friction in sharing and access to data between any mediums starting with being privacy compliant.
  • This grant and case-study is important to help us raise our scheduled pre-seed funding round with Entrepreneur First in May 2021 to raise ~56k USD which would allow this project to turn into a proper self-sustaining start-up.

Project Details

Proposed technology stack:

  • AI models: Variations of Generative Adversarial Networks (GANs)
  • Data & server infrastructure: AWS, Docker, Kubernetes
  • Frontend & backend: ReactJS, NodeJS

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Unlisted
SecondLook
SecondLook

No responses yet

Write a response