

Sequence content carries large amounts of information, but rewriting is difficult information stored in nicks 19 is usually of smaller volume but highly amenable for efficient, permanent and privacy-preserving erasing and rewriting. This is achieved by superimposing metadata (such as ownership information, dates, clinical status descriptions) stored via nicks onto images encoded in the sequence. 2DDNA uses two different information dimensions and combines desirable features of both synthetic and nick-based recorders 19.

Here, we develop and experimentally test a hybrid DNA-based data storage system termed 2DDNA, to address the issue of rewriting and avoid the use of worst-case error-correcting redundancy needed to combat random and missing oligo errors that may accumulate in time and due to content changes. This is clearly hard to achieve for traditional DNA-based data storage systems due to the highly stochastic nature of the PCR, sequencing and rewriting process. The mismatched-decoder problem is an issue mostly overlooked in prior works and it asserts that powerful error-correction schemes such as low-density parity-check (LDPC) codes 18 require good estimates of the channel error probability to operate properly. Moreover, the estimated error rates have to be accurate enough for efficient error correction due to the mismatched decoding parameter problem 16, 17. Therefore, to ensure accurate reconstruction, one needs to account for the worst-case scenario and perform extensive write-read-rewrite experiments to estimate the error rates before adding redundancy 13, 14, 15. Moreover, the rate of synthesis and sequencing errors may vary an order of magnitude from one platform to another, while PCR reactions and topological data rewriting may cause additional gradual increases in sequencing errors. Image data is typically compressed before being recorded, and even a single mismatch can cause catastrophic error-propagation during decompression and lead to unrecognizable reproductions 6, 11, 12. Despite recent progress, several issues continue to hinder the practical implementation of molecular information storage models, including the high cost of synthetic DNA, lack of straightforward rewriting mechanisms, large write-read latencies, and missing oligo errors incurred by solid-phase synthesis. Traditional DNA-based data recording architectures store user information in the sequence content of synthetic DNA oligos within large pools that lack an inherent ordering, and user information is retrieved via next-generation or nanopore sequencing 6. Macromolecular data storage platforms are nonvolatile, readout-compatible, extremely durable and they offer unprecedented data densities unmatched by other modern storage systems 2, 3, 4, 5, 6, 7, 8, 9, 10.

Moreover, the storage system can be made robust to degrading channel qualities while avoiding global error-correction redundancy.ĭNA-based data storage systems are viable alternatives to classical magnetic, optical, and flash archival recorders 1. Our results demonstrate that DNA can serve both as a write-once and rewritable memory for heterogenous data and that data can be erased in a permanent, privacy-preserving manner. The 2DDNA platform is experimentally tested by reconstructing a library of images with undetectable or small visual degradation after readout processing, and by erasing and rewriting copyright metadata encoded in nicks. To avoid costly worst-case redundancy for correcting sequencing/rewriting errors and to mitigate issues associated with mismatched decoding parameters, we develop machine learning techniques for automatic discoloration detection and image inpainting. Our 2DDNA method efficiently stores images in synthetic DNA and embeds pertinent metadata as nicks in the DNA backbone. Here we report on a two-dimensional molecular data storage system that records information in both the sequence and the backbone structure of DNA and performs nontrivial joint data encoding, decoding and processing. DNA-based data storage platforms traditionally encode information only in the nucleotide sequence of the molecule.
