CAPTION
Height (cm)
Weight (kg)
Age (Years)
STUB
BODY OF THE TABLE
* Sources: 1. Kailasha Foundation – Fun & Learn Portal LMS Directory *Footnotes: The entire upper part of the table is called BOX HEAD.
3. Diagrammatic Mode of Presentation:
A. Non-Frequency Diagrams: Non-frequency diagrams correspond to the data which are NOT frequency data. (a) Bar Diagrams (b) Line Diagrams (Historiagram) (c) Pie Diagram or Pie Chart
B. Frequency Diagrams: Frequency Data are presented. Mostly class-intervals are presented via this mode. Three most common frequency diagrams are: (a) Histogram (b) Frequency Polygon (c) Ogives: (i) Less than type Ogives (ii) More than type Ogives
Bar Diagrams:
Line Diagram:
Multiple Bar Diagram:
Frequency Polygon:
A smooth join of all vertices of a frequency polygon. This is broadly divided into four shapes:
(i) Bell Shaped (Most Common Shape) (ii) U-Shaped (iii) J – Shaped: Simple J – shaped & Inverted J – Shaped (iv) Mixed Curve (Second Most Common Shape)
Hindi explanation:.
Thanks for learning at Kailasha Foundation – Fun & Learn Portal.
Share this course with friends. Follow us on Facebook , twitter to stay updated.
This site uses Akismet to reduce spam. Learn how your comment data is processed .
Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Scientific Reports volume 14 , Article number: 19755 ( 2024 ) Cite this article
Metrics details
The mpox epidemic in the UK began in May 2022, with rates of new cases unexpectedly and rapidly declining during August 2022. Interpreting trends in infection requires disentangling the underlying growth rate of cases from the delay from symptom onset to presenting to healthcare. We developed a nowcasting Bayesian method which incorporates time-varying delays (EpiLine) to quantify the changes in the delay from symptom onset to healthcare presentation and the underlying mpox growth rate over the period May-August 2022 in the UK. We show that the mean delay between symptom onset and healthcare presentation for mpox in the UK decreased from 22 days in early May 2022 to 10 days by early June and 8 days in August 2022. When we account for these dynamic delays, the time-varying growth rate declined gradually and continuously in the UK during the May-August 2022 period. Not accounting for varying time delays would have incorrectly characterised the growth rate by a sharp increase followed by a rapid decline in mpox cases. Our results highlight the importance of correctly quantifying the delay between symptom onset to healthcare presentation when characterising the epidemic growth of mpox in the UK. The gradual reduction in the rate of epidemic spread, which pre-dated the vaccine roll-out, is consistent with gradual risk reduction or acquired immunity amongst the highest risk individuals. Our study highlights the need for public health agencies to record the delays from symptom onset to healthcare presentation early in an outbreak.
Mpox (monkeypox) is a zoonotic infection caused by a virus that belongs to the family of Orthopoxvirus. Since the first human reported case of mpox in the Democratic Republic of Congo in 1970, sporadic human cases have been identified both inside and outside of Africa 1 . On May 7, 2022, mpox was detected in a traveller returning to the United Kingdom from Nigeria, with the rate of reported cases in the UK then increasing until mid-July before declining (Fig. 1 a) 2 .
Estimated reporting delay and epidemic growth rate \(r\left( t \right)\) using UKHSA data. ( a ) Number of daily reported cases and daily symptoms onset (only reported by 40–80% cases) along with the fitted posterior density of the symptoms onset (for all reported cases). For the vast majority of reported cases in May the symptoms onset date is known, thus the posterior distribution and the reported data fit well to mid-May. For cases reported from June onwards, the proportion for which the date of symptoms onset was known declined, thus the reported symptoms onset is less than the posterior distribution. ( b ) Distribution of the reported delays (the time between onset of symptoms and reporting to healthcare providers) for those developing symptoms at the start of May and June 2022. ( c ) Epidemic growth rate \(r\left( t \right)\) of mpox in the UK when accounting for dynamic reporting delays, showing a gradual decline over the reporting period. The right-axis converts \(r\left( t \right)\) to a doubling or halving time. ( d ) Estimates (median and 25,75 percentiles) of the reported delay indexed by the date of symptoms onset. In all graphs the solid lines are the median estimates and the shaded area being the 5%-95% confidence interval.
Mpox virus is an infectious pathogen which can be transmitted through close physical contact and fomites. Since the onset of the mpox epidemic in the UK in early May 2022, and over the epidemic period May–September 2022, the UK Health Security Agency (UKHSA) has been monitoring and responding to the outbreak in the UK 8 . As part of the response, our group had access to mpox data and tracked the epidemic potential giving informed advice to aid public health policy.
During the 2022 outbreak, mpox cases were predominantly amongst gay and bisexual men who have sex with men (GBMSM) with clusters of cases linked with venues where individuals were exposed to the virus through close, often sexual, contact 3 . Pre-symptomatic transmission was found to be a significant component of this outbreak, with approximately 53% of transmission prior to symptom onset 4 . Initial symptoms for mpox can be influenza-like (e.g. fever or sore throat) with reported incubation periods of 3–20 days 5 and mean incubation period of 7–9 days 5 , 6 . This is followed by smallpox-like rashes, which sequentially progress to macules, papules and vesicles before crusting over after 2–4 weeks 7 . Within the UK mpox epidemic, healthcare professionals obtained details of symptoms and their time of onset for a fraction of the confirmed cases. There was a variation in the uncertainty around the date-dependant confirmed cases from 77/97 confirmed cases (~ 80%) in the first two weeks of the outbreak in May 2022 to 163/495 confirmed cases (~ 40%) during August 2022 8 . Initially, the delay between the onset of symptoms and presenting to healthcare providers was up to a month (based on data within UKHSA mpox response group). As the awareness of the disease increases, due to public health information or other interventions, the delays in presenting to healthcare providers may decrease with people reporting infection earlier.
When models are fit to epidemics, one would ideally have data on the number of people infected each day. However, the data typically available is the number of symptomatic cases who present to healthcare providers each day (i.e. reported cases). Therefore, models are normally fit to the reported cases, with the delay between infection and reporting being modelled as a static distribution. This delay distribution consists of the incubation period (infection to symptoms) and the delay from symptom onset to accessing healthcare, and is typically estimated from detailed studies on a subset of early cases in the epidemic. Whilst the incubation period is not expected to change over time, the delay between symptom onset and accessing healthcare is likely to decrease as public awareness of the epidemic increases. If this delay from symptom onset to healthcare presentation decreases rapidly, it would lead to a surge in reported cases due to the effect of people at different stages of disease presenting to healthcare providers at the same time. Hence it is important to quantify how the time between symptom onset and healthcare presentation changes in the course of an epidemic.
The general approach in modelling mpox 2022 epidemic was either using Bayesian models such as 20 or a more mechanistic SEIR model 11 that allow for processes such as vaccination to be more readily modelled. The former model used Bayesian doubly interval censored model adjusted for right truncation to calculate the time from infection to hospital admission, infection to a first positive test, and the length of hospital stay whilst keeping the distribution of infection to symptom onset uniform. The latter model had constant time from symptom onset to presenting to healthcare and focused on evaluating the impact of vaccination on the trajectory of the mpox epidemic.
The two aims of our study were to quantify time-varying delays between symptom onset and healthcare presentation, and also the time-varying epidemic growth rate of the mpox epidemic in the UK in 2022. This is non-trivial since both together contribute to observed data, and so require statistical deconvolution (disentangling). The two datasets we used were the total number of reported cases, and the self-reported symptoms onset time which was provided by a subset of cases. At the time of analysis, we did not find required statistical tools within existing ‘nowcasting’ software, and so to perform this analysis, we developed a Bayesian model (“EpiLine”) which simultaneously estimated the time-varying delay between symptom onset and healthcare presentation and the epidemic growth rate over the period May–August 2022 in the UK.
In this study, we illustrate the development and application of a custom nowcasting Bayesian method which incorporates time-varying delays (EpiLine) to simulate the growth rate of symptomatic cases, from both actual mpox data in the UK as well as simulated data, aiming to quantify the time between mpox symptom onset to healthcare presentation and the growth rate over the mpox epidemic in the UK in 2022.
To jointly quantify the changes in delays in the distribution of time \(\tau\) from symptom onset to healthcare presentation at time \(t\) , denoted \(f\left( {\tau ,t - \tau } \right)\) and its effect on estimating the epidemic growth rate, denoted \(r\left( t \right)\) . We developed a Bayesian model incorporating both a dynamic growth rate and dynamic delays called EpiLine ( https://github.com/BDI-pathogens/EpiLine ). The model was implemented in R software ( https://www.r-project.org ), version 4.1.3.
The model was built to understand the interaction between the symptom-healthcare presentation time distribution and the underlying dynamics of the infection rate, therefore we use a very simple model for the number of people developing the symptoms each day. The model contains a generative model which calculates the expected number of reported cases on a particular day, and an observation model for the observed data (same approach as Epidemia uses for estimating daily deaths 19 ). We model the daily growth rate \(r\left( t \right)\) with a simple Gaussian process, the daily number of people of developing symptoms \(S\left( t \right)\) is given by
where \(\sigma_{rGP}^{2}\) is the daily variance of the Gaussian process. Gaussian processes were used because of their flexibility to model time varying processes and provide confidence intervals even when the underlying mechanisms for time variation are unknown. Note that by making \(r\left( t \right)\) a Gaussian process instead of \(S\left( t \right)\) implies the prior is that the change in \(S\left( t \right)\) is the same as the previous day. Next we define \(f\left( {\tau ,t} \right)\) as the probability of someone presenting to healthcare with an infection on day \((t + \tau\) ), if they developed symptoms on day \(t\) . Note that \(\tau\) can be negative if a case is found prior to symptoms developing (e.g. if contact-traced and tested positive). On day \(t\) the expected number of cases presenting to healthcare is \(\mu \left( t \right)\) and given by
where \(\tau_{pre}\) is the maximum number of days pre-reporting the case develops symptoms and \(\tau_{post}\) the maximum number of days post-reporting the case develops symptoms. The number of observed cases presenting to healthcare \(C\left( t \right)\) is modelled as a negative binomial variable
where \(\phi_{OD}\) is the over-dispersion parameter. The symptom-healthcare presentation time distribution must support both positive and negative values. In addition, empirically it is observed that this distribution can be highly skewed with heavy tails, therefore we model it using the Johnson SU distribution which contains four parameters \((\xi ,\lambda ,\gamma ,\delta\) , so can fit mean, variance, skew and kurtosis). To account for the changes in the distribution over time, we model these four parameters using simple Gaussian processes
At the end of the reporting period, there may not be many cases presenting to healthcare for each symptoms date (since the data is right censored), therefore there is the option to make the distribution static after a particular time \(t_{static}\) (i.e. when then \(t > t_{static}\) \(\xi \left( t \right) = \xi \left( {t_{static} } \right)).\) These parameters are estimated using line list data of individual cases where the symptoms date and healthcare presentation date are known. Note, cases where only the healthcare presentation date is known are included in the daily report totals \(C\left( t \right)\) , but not in the symptom-healthcare presentation line list. The incompleteness of case data including the symptom date as well as the healthcare presentation date is the reason why cases based on symptoms date cannot be modelled directly.
From the line-list, let \(N_{S} \left( t \right)\) be the number of people who reported symptoms onset as of day \(t\) , and let \(n_{SR} \left( {t,\tau } \right)\) be the number of people who reported symptoms onset as of day \(t\) and presented to healthcare authorities on day \(t + \tau\) . For day \(t\) , we model \(\left\{ { n_{SR} \left( {t, - \tau_{post} } \right), \ldots , n_{SR} \left( {t,\tau_{pre} } \right) } \right\}\) using a multinomial distribution with parameters \(\left\{ { x\left( {t, - \tau_{post} } \right), \ldots ,x\left( {t,\tau_{pre} } \right) } \right\}\) . The multinomial parameters are set as the expected number given the total number of cases which reported symptoms onset on that date and the symptoms-healthcare presentation delay distribution for that day.
where the factor \(F\left( {t,\tau } \right)\) is adjusting for the fact that the line-list is date-censored given that it only includes dates from the reporting period. The distributions for \(\left\{ { n_{SR} \left( {t, - \tau_{post} } \right),..., n_{SR} \left( {t,\tau_{pre} } \right) } \right\}\) on different days are assumed to be independent and also independent of the total number of cases observed on each day.
To initiate the model, range prior distributions are put on the initial values of all the parameters as well as the variances of the Gaussian processes. The posterior distribution of all parameters were sampled by Markov Chain Monte Carlo (MCMC) using the software Stan in the R package rstan 10 . This allowed for the model to be simultaneously fit to both the daily cases presenting to healthcare and the line-list of cases for which the onset of symptoms was known, thus providing an estimate of \(r\left( t \right)\) corrected for changes in the symptom-healthcare presentation delays.
Mpox data was collected by the UK Health Security Agency (UKHSA) health protection teams from targeted testing of infected individuals (with specimens processed by UKHSA affiliated laboratories and NHS laboratories), and questionnaires (collected by UKHSA health protection teams). The definition of a case included both confirmed cases and highly probable individuals with a positive polymerase chain reaction (PCR) test. All mpox cases were combined in a linelist which was used for analysis. Data were extracted as of August 31, 2022, at which time 2746 people had been identified with mpox in the UK. We identified the dates of symptom onset and the date of being reported to HPZone (where UKHSA teams store the data collected during an incident) by matching pseudo identifier numbers to the line list. We used the time when the case was reported to HPZone to be a proxy for the time when the case was presented to healthcare providers.
We applied EpiLine to the data from May 07, 2022 to August 31, 2022 sampling the posterior distribution of the model parameters i.e. both the growth rate and the distribution from symptom onset to healthcare presentation. To explore the importance of allowing a dynamic distribution from symptom onset to healthcare presentation, we alternatively used a static distribution with the parameters from the dynamic fit on May 07, 2022. This date was chosen as an estimate of the distribution in the early phase of the epidemic. This static distribution had a mean of 15 days, which was then used to re-estimate the growth rate over the whole period. Finally, we investigated the effect of date-censoring on the growth rate projections by re-sampling the model parameters using different date cut-off times.
To check the robustness of EpiLine, we applied the model to simulated data. A simulated line-list was generated by drawing the delay for each symptomatic individual from the symptom-presentation distribution for that day and then down-sampling (by 20%) to mimic the incompleteness of this data in the real data sets. Additionally, to mimic the real data, we seeded the epidemic at the start of April 2022 and allowed it to grow with a constant growth rate until the end of May 2022 (Figure S1a ). From the start of June 2022 the growth rate declined linearly, turning negative in mid-July 2022 (Figure S1c , red line). The distribution from symptom onset to healthcare presentation was modelled to be similar to that estimated from the actual data. Similar to the analysis on the real data, we re-estimated \(r\left( t \right)\) using the same static distribution from symptom onset to healthcare presentation. Finally, we repeated the date-censoring analysis.
Applying EpiLine to the mpox data, we estimated that the time from symptom onset to healthcare presentation declined from about 21.9 days (CrI 16.7–30.8 days) for people who developed symptoms in early May 2022, to around 9.6 days (CrI 8.6–10.8 days) for people who developed symptoms in early June and 7.8 days (CrI 7.0–8.9 days) for people who developed symptoms in August 2022 (Fig. 1 b,d). Hence the time from symptom onset to healthcare presentation declined substantially over the course of the UK epidemic.
Allowing for a dynamic time from symptom onset to healthcare presentation, we estimated the epidemic growth rate had a doubling time of 10.3 days (CrI 6.6–23.9 days) in early May 2022. Subsequently the estimated growth rate gradually decreased in May and June turning negative in early July (Fig. 1 c and Fig. 2 a, blue curve). We note that a positive growth rate \(r\left( t \right)\) corresponds to an effective reproduction number \(R_{e} \left( t \right)\) greater than 1, and a negative \(r\left( t \right)\) corresponds to a \(R_{e} \left( t \right)\) less than 1. Note that since \(r\left( t \right)\) is the growth rate of new symptomatic cases, it will lag the growth rate of new infections by the incubation period (7–9 days 5 , 6 ); suggesting that the new infections would have started to decline in late June 2022.
Epidemic growth rate \(r\left( t \right)\) using static reporting delays and censored data. ( a ) Epidemic growth rate \(r\left( t \right)\) estimated using a static symptom-report delay as of May 7th 2022 (green) and using dynamic delays (blue). Note the static delay model estimates a higher peak \(r\left( t \right)\) (doubling time of 6 days vs 10 days) and a larger decline of \(r\left( t \right)\) in May 2022 (to a doubling time of 22 days vs 14 days), compared to the model adjusting for dynamic delays. ( b ) Estimated epidemic growth rate \(r\left( t \right)\) using date-censored mpox line-list data. In both graphs the solid lines are the median estimates with the shaded area being the 5–95% confidence interval.
A naive approach might have used a static distribution from symptoms onset to healthcare presentation for the sake of making inference simpler. To compare this approach to the dynamic distribution, we estimated that this incorrect approach would have led to the inference that the growth rate had a doubling time of 6 days in the first two weeks of May 2022 compared with 10 days with dynamic delays (Fig. 2 a; green curve). Subsequently, the estimated growth rate declined more rapidly throughout May 2022, leading to a slowing of the doubling time to 22 days at the end of May compared to the estimate of 14 days with dynamic delays.
As noted previously, this is likely due to the effect of more people presenting to healthcare providers in late May even if they had first developed symptoms much earlier. As measured by a change in \(r\left( t \right)\) , the decrease throughout May 2022 was 0.073 (CrI 0.002–0.186) with static delays versus 0.015 (CrI − 0.048 vs 0.076) with dynamic delays. While the difference is not statistically significant, this is primarily due to the poor precision of the static model during May resulting in wide confidence intervals in the drop of \(r\left( t \right)\) in the static model. This poor fit of this model to the data reflects the deficiency of a static model during a time in which the delays are dynamic.
Our final analysis showed that the estimates of \(r\left( t \right)\) were consistent at times at least 10 days prior to each censor date, however, flattened in the 10 days immediately prior to the censor date. This is because newly symptomatic cases in the final 10 days are unlikely to have presented to healthcare by the censor date, therefore the estimate of \(r\left( t \right)\) here will be dominated by its prior (i.e. Gaussian process without drift). Flattening the estimates of \(r\left( t \right)\) is desirable when interpreting right censored data and corresponds to projecting the impact of no change in policy.
Because the inferential method for joint estimation of \(f\left( {\tau ,t} \right)\) and \(r\left( t \right)\) is deceptively complex, requiring assumptions on the Gaussian process that maintain identifiability during the deconvolution process, we verified the power and accuracy of the method on simulated epidemic curves. With simulated data for a similar epidemic, the inference method (Epiline) was able to capture the dynamic time-varying delays and slowly declining epidemic growth rate (Figure S1 ). In contrast, when re-estimating the growth rate using the misspecified model with the static symptom-presentation distribution, we found, similar to the actual data (Fig. 2 a), the incorrect inference of first over estimating the growth rate increase and then estimating a rapid decline (Figure S2a ). Finally, the re-estimated value of \(r\left( t \right)\) from simulated data was consistent up to about 10 days before the censor date, with \(r\left( t \right)\) flattening in the final few days and reverting to its prior distribution (compare Fig. 2 b and Figure S2b ). The posterior confidence intervals were narrower in the simulated data because the simulated epidemic size was about 7 times larger than the actual epidemic.
In this paper we developed a Bayesian model (EpiLine) that captures dynamic changes in the time from symptom onset to healthcare presentation and applied it to the UK mpox epidemic in 2022. Our results show that the time from symptom onset to healthcare presentation was dynamic and declined from an average of 22 days in early May 2022 to 10 days by early June and 8 days in August 2022, documenting one of the factors that will have contributed to controlling the epidemic. When we account for these dynamic delays in healthcare presentation, we found that the time-varying growth rate declined gradually over the epidemic. However, using a misspecified model with static delays in healthcare presentation (i.e. using a median value during the initial phase) incorrectly over estimated the initial growth rate and then implied a rapid decline. For example, a modelling study which was fit to weekly reported UK mpox cases, required a substantial rapid change in sexual behaviour in the GBMSM and wider community in May and June 2022 to fit the apparent surge and then flattening of reported cases 11 . However, as noted by the authors, they did not model delays in healthcare presentation, such as those we discuss here, and which we suggest were the cause of the apparent sharp drop in the growth rate around the end of May 2022. Their results may have been different if they had accounted for this, especially when considering the large parameter space sampled during the calibration process.
Our results demonstrate the importance of accounting for dynamic changes in the time from symptom onset to healthcare presentation. Most previous studies have modelled the delays between symptom onset (or infection) and healthcare presentation to be static e.g. 15 , 16 . However, a couple of nowcasting models have accounted for dynamic delays in the context of the STEC O104:H4 outbreak in Germany 17 and measles in the Netherlands 18 . Consistent with our findings, both studies reported that accounting for dynamic delays improved nowcasts. Our work adds to this literature by highlighting the qualitative effect dynamic delays have on the epidemic curve in the context of a small but rapidly growing epidemic i.e. the 2022 mpox epidemic in England.
Following the UK Joint Committee on Vaccination and Immunisation (JCVI)’s recommendation, vaccination against mpox in the UK was offered to GBMSM at highest risk from June 21, 2022 12 with the vaccination campaign speeding up from July 22, 2022 13 . However, our analysis suggests that the number of new symptomatic cases was already falling by early July 2022, thus vaccinations were not the cause of this initial decline in growth rate. We hypothesise that if the declines are not caused by vaccination, then the most likely causes are a combination of immunity and behaviour change. Hence, our results of a gradual drop in growth rate can be interpreted to be consistent with the core GBMSM transmission group possibly gaining immunity due to high prevalence, alongside more rapid diagnosis and more gradual reductions in high risk behaviour due to public health campaigns.
This study had some limitations. Firstly, the focus of this study was the impact of changes in the delays from symptom onset to healthcare presentation on estimates of growth rate. However, a second aspect of reporting to consider is changes in the overall case ascertainment rate. In the context of this analysis it would be the proportion of people who develop symptoms on a particular day who ever present to healthcare services. Unfortunately, without independent survey data it is extremely difficult to estimate the case ascertainment rate or changes in it and thus we assume that it is constant over time in our analysis. In periods where the case ascertainment rate increases/decreases, estimates of growth rate based on reported cases will be overestimated/underestimated. In the first phase of this mpox epidemic in the UK, it is possible that the case ascertainment rate increased with public awareness, thus initial estimates of growth rate were too high even after correcting for the shortening in the symptom onset to healthcare presentation delay. The subsequent fall in growth rate (calculated from reported cases) could be partly explained by a reduction in case ascertainment rate as public concern about mpox reduced due to no fatalities, however, as yet there is no evidence to support or refute this hypothesis.
Furthermore, EpiLine is designed to estimate the epidemic growth rate in the presence of unknown and dynamic delays in healthcare presentation. Whilst health authorities typically collate statistics on the total number of confirmed positive cases by day, detailed follow-ups such as the date when symptoms first developed or estimates of the date of transmission are only collected for a subset of individuals. For this study we had statistics on the date of symptoms onset for 40%-80% of cases but not estimates of the transmission dates. Additionally, we did not have data to estimate the generation time distribution, which itself could be time-varying, so therefore modelling new infections directly was not possible. Therefore we chose to model the number of symptomatic cases directly, assuming that the delays between infection and symptoms onset are approximately constant. We think that this is a sensible choice here, although a possible alternative would be to take this to be uniformly distributed between 5 and 21 days, the suggested time it takes for mpox symptoms to occur.
While our study was focused on modelling the dynamic delays and growth rate of mpox in one setting, our findings have policy implications for general outbreaks across settings. Specifically, we show the importance of modelling reduced delays to presenting to healthcare in order to correctly interpret the status of the epidemic. Shorter delays can prevent onward transmission, and allows prompt use of antivirals post infection. Hence, our study highlights the importance and need for public health agencies to focus on reducing time delays early in an outbreak and when tailoring the optimal policy response.
In summary, we developed EpiLine which simultaneously models dynamic delays between symptom onset and healthcare presentation, and the epidemic growth rate. Applying it to the 2022 UK mpox outbreak, we demonstrated that in the initial phases, the delays changed rapidly, and also that it was essential to account for the dynamic delay to correctly estimate the epidemic growth rate.
The data used in this study is not publicly available. UKHSA operates a robust governance process for applying to access protected data that considers: the benefits and risks of how the data will be used; compliance with policy, regulatory and ethical obligations; data minimisation; how the confidentiality, integrity, and availability will be maintained; retention, archival, and disposal requirements; best practice for protecting data, including the application of ‘privacy by design and by default’, emerging privacy conserving technologies and contractual controls. Access to protected data is always strictly controlled using legally binding data sharing contracts. UKHSA welcomes data applications from organisations looking to use protected data for public health purposes. To request an application pack or discuss a request for UKHSA data you would like to submit, contact [email protected].
Bunge, E. M. et al. The changing epidemiology of human monkeypox—A potential threat? A systematic review. PLoS Negl. Trop. Dis. 16 (2), e0010141. https://doi.org/10.1371/journal.pntd.0010141 (2022).
Article PubMed PubMed Central Google Scholar
UK Government Document, 2022a. Mpox cases confirmed in England. https://www.gov.uk/government/news/monkeypox-cases-confirmed-in-england-latest-updates
Iñigo Martínez, J. et al. Monkeypox outbreak predominantly affecting men who have sex with men, Madrid, Spain, 26 April to 16 June 2022. Euro Surveill. 27 (27), 2200471. https://doi.org/10.2807/1560-7917.ES.2022.27.27.2200471 (2022).
Ward, T., Christie, R., Paton, R. S., Cumming, F. & Overton, C. E. Transmission dynamics of monkeypox in the United Kingdom: Contact tracing study. BMJ 379 , e073153. https://doi.org/10.1136/bmj-2022-073153 (2022).
Thornhill, J. P. et al. Monkeypox virus infection in humans across 16 countries—April–June 2022. New Engl. J. Med. 387 (8), 679–691. https://doi.org/10.1056/NEJMoa2207323 (2022).
Article PubMed Google Scholar
Miura, F. et al. Estimated incubation period for monkeypox cases confirmed in the Netherlands, May 2022. Euro Surveill. 27 (24), 2200448. https://doi.org/10.2807/1560-7917.ES.2022.27.24.2200448 (2022).
Brown, K. & Leggat, P. A. Human monkeypox: Current state of knowledge and implications for the future. Trop. Med. Infect. Dis. 1 , 8. https://doi.org/10.3390/tropicalmed1010008 (2016).
Vivancos, R. et al. Community transmission of monkeypox in the United Kingdom, April to May 2022. Euro Surveill. 27 (22), 2200422. https://doi.org/10.2807/1560-7917.ES.2022.27.22.2200422 (2022).
Epiline, 2022. https://github.com/BDI-pathogens/EpiLine .
Carpenter, B. et al. Stan: A probabilistic programming language. J. Stat. Softw. 76 (1), 1–32. https://doi.org/10.18637/jss.v076.i01 (2017).
Brand, S. P. C. et al. The role of vaccination and public awareness in forecasts of Mpox incidence in the United Kingdom. Nat. Commun. 14 , 4100. https://doi.org/10.1038/s41467-023-38816-8 (2023).
Article ADS PubMed PubMed Central Google Scholar
UK Government Document, 2022b. Mpox outbreak: vaccination strategy. https://www.gov.uk/guidance/monkeypox-outbreak-vaccination-strategy
UK Government Document, 2022c. Accelerated mpox vaccination rollout in London as UKHSA secure more vaccines. https://www.england.nhs.uk/2022/07/accelerated-monkeypox-vaccination-rollout-in-london-as-ukhsa-secure-more-vaccines/
UKHSA technical report: Investigation into mpox outbreak in England: technical briefing 3. https://www.gov.uk/government/publications/monkeypox-outbreak-technical-briefings/investigation-into-monkeypox-outbreak-in-england-technical-briefing-3#part-4-transmission-dynamics . Accessed November 10, 2022.
van Leeuwen, E., Panovska-Griffiths, J., Elgohari, S., Charlett, A. & Watson, C. The interplay between susceptibility and vaccine effectiveness control the timing and size of an emerging seasonal influenza wave in England. Epidemics 44 , 100709. https://doi.org/10.1016/j.epidem.2023.100709 (2023).
Abbott, S. et al. Estimating the time-varying reproduction number of SARS-CoV-2 using national and subnational case counts. Wellcome Open Res. 5 , 112. https://doi.org/10.12688/wellcomeopenres.16006.2 (2020).
Article Google Scholar
Höhle, M. & van der Heiden, M. Bayesian nowcasting during the STEC O104:H4 outbreak in Germany, 2011. Biometrics. 70 , 993–1002. https://doi.org/10.1111/biom.12194 (2014).
Article MathSciNet PubMed Google Scholar
van de Kassteele, J., Eilers, P. H. C. & Wallinga, J. Nowcasting the number of new symptomatic cases during infectious disease outbreaks using constrained P-spline smoothing. Epidemiology 30 (5), 737–745. https://doi.org/10.1097/EDE.0000000000001050 (2019).
Flaxman, S. et al. Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe. Nature 584 , 257–261. https://doi.org/10.1038/s41586-020-2405-7 (2020).
Article ADS PubMed Google Scholar
Ward, T. et al. Understanding the infection severity and epidemiological characteristics of mpox in the UK. Nat. Commun. 15 , 2199. https://doi.org/10.1038/s41467-024-45110-8 (2024).
Download references
RH and CF were funded by a Li Ka Shing Foundation grant to CF. JPG’s work is in part supported by funding from the UK Health Security Agency (UKHSA) and the UK Department of Health and Social Care. TW, AC and NW are employees of the UKHSA. The funders had no role in the study design, data analysis, data interpretation, or writing of this report. We thank Steven Riley, Josie Park and Fergus Cumming at UKHSA for useful discussions and comments on drafts of this manuscript.
These authors contributed equally: Robert Hinch and Jasmina Panovska-Griffiths.
The Big Data Institute and the Pandemic Sciences Institute, Nuffield Department of Medicine, University of Oxford, Oxford, UK
Robert Hinch, Jasmina Panovska-Griffiths & Christophe Fraser
The Queen’s College, University of Oxford, Oxford, UK
Jasmina Panovska-Griffiths
UK Health Security Agency, London, UK
Jasmina Panovska-Griffiths, Thomas Ward, Andre Charlett & Nicholas Watkins
You can also search for this author in PubMed Google Scholar
RH, JPG and CF conceived the study. RH and JPG developed and undertook the modelling with input from CF and in conversations within UK Health Security Agency (UKHSA). RH and JPG wrote the manuscript, with input from CF, TW, AC and NW. All authors approved the final version. RH and JPG are the manuscript’s guarantors.
Correspondence to Jasmina Panovska-Griffiths .
Competing interests.
The authors declare no competing interests.
Publisher's note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information., rights and permissions.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .
Reprints and permissions
Cite this article.
Hinch, R., Panovska-Griffiths, J., Ward, T. et al. Quantification of the time-varying epidemic growth rate and of the delays between symptom onset and presenting to healthcare for the mpox epidemic in the UK in 2022. Sci Rep 14 , 19755 (2024). https://doi.org/10.1038/s41598-024-68154-8
Download citation
Received : 04 November 2023
Accepted : 19 July 2024
Published : 26 August 2024
DOI : https://doi.org/10.1038/s41598-024-68154-8
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.
Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.
Fast and easily recover your lost or deleted data from PCs, laptops, USB drives, SD cards, cameras and other devices in any data loss situation.
There are 4 methods to fix folder disappeared from external hard drives, including SD card, USB, or external SSD/HDD. Follow our guide to recover any deleted or lost data.
By Irene / Updated on August 26, 2024
Method 4. running chkdsk to fix errors on external hard drive, why does the folder disappear from external hard drive.
External hard drives bring great convenience to users in storing and transferring data. However, there are situations where the disk capacity in the disk properties is shown as used, but you may can’t see files or folders on external hard drive , or even the files seem to be completely lost. If your files and folders are not showing up on an external hard drive in Windows 7, 8, 10, or11, it could be due to the following reasons:
The power supply of the USB port is insufficient.
The file system of the external hard drive is corrupted.
The external hard drive is infected with a virus.
The mirror image in memory is corrupted.
When you eventually find that your external hard drive has lost some files or even the partition is lost or corrupted, in this case, stop writing any new data to the external hard drive and you must perform data recovery with reliable tool like AOMEI FastRecovery on the external hard drive immediately. This software can recover data from many internal and external hard drives, such as HDDs, SSDs, USB drives, SD cards, and more. And it supports NTFS, FAT32, exFAT, ReFS in Windows 11/10/8/7 & Windows Server.
Step 1. Install and launch AOMEI FastRecovery. Choose the exact partition or disk where your data lost and click Scan.
Step 2. Then, the recovery tool start to scan and search. lt will execute the “Quickly Scan" first to find your deleted data fast, and then execute the “Deep Scan" for searching other lost data.
Step 3. Once the scan is completed, all deleted files, recycle bins and other missing files will be displayed. Please select the file you would like to recover and then click "Recover".
Step 4. Then, select a folder path to save your recovered files.
Step 5. Wait patiently for this process of recovering ends.
Several solutions are presented to recover deleted files from external hard drives. Choose one that works for you. Start with the easiest way. Sometimes the external device cable causes the issue. If the cable cannot supply stable power or connection to the computer, the external hard drive may have no files due to delayed refresh. You can reattach your external hard drive or use another port.
If the folder disappeared from external hard drive without any reason, you may accidentally hide it. You can just show the hidden files to fix the problem. The following steps could help you to finish the task.
Step 1 . Check whether the files are endowed with system property: run "Control folders" (or go to Appearance and Personalization under Control Panel).
Step 2 . Then, it will show you the window Folder Options. Click View . In view tab, check "show hidden files, folders, and drives" and uncheck "Hide protected operating system files (Recommended)". Then check whether our files/folders are there.
After that, you can get into the Seagate external hard drive and see whether the files are showing there.
If you have enabled the Windows Backup function and saved your Excel file in a folder that is backed up by this function, you can also select the "Restore previous versions" option to recover missing files from external hard drive.
Step 1. Go to the original folder of the deleted Excel file.
Step 2. Right-click it and select the Restore previous versions option from the menu to check the previous versions of this folder.
Step 3. Select the version that includes the deleted Excel file and click Restore to retrieve the document.
If the "CheckedValue" is suffering from virus infection. You can do as the following:
Step 1. Run "Regedit" and then, follow this route in Registry Editor: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\Advanced\Folder\Hidden\SHOWAL.
Step 2. Check whether the data value of "CheckedValue" is "1". If NOT, delete "CheckedValue", create a new "DWOR" in the blank and rename it as "CheckedValue". After that, modify the value data as "1".
Step 3. Then, you can go back to see whether the Seagate external hard drive files showing up.
If folder disappeared from external hard drive due to corrupted file system, you can follow the steps given below to run Check Disk tool. In File Explorer, right-click the external hard drive and navigate to “ Prosperities ” > “ Tools ” > “ Check ”. It is also available to run Check Disk from Command Prompt .
After checking, the files and folders on external hard drive might appear.
This guide shows simple ways to recover folder disappeared from external hard drive, including HDDs, SSDs, USB drives, SD cards, and more. You can use AOMEI FastRecovery to scan the drive and restore your lost files. You can also try using built-in Windows tools for recovery.
Fixed: external hard drive not showing up in windows 11.
When an external hard drive not showing up in Windows 11, 10, 8, and 7, you can refer to this tutorial to know why and find some effective ways to fix it.
Are you looking for the fastest way to transfer files between two external hard drives? You can find the answer from this article.
If you have no idea about file recovery from external hard drive, you can refer to this article. You can get some useful methods to recover the lost files on your external hard drive.
How to back up Windows 11 to external hard drive? If you realized the importance of back up Windows 11, this post will share 2 clear guidance for you to achieve that.
An official website of the United States government
Here’s how you know
Official websites use .gov A .gov website belongs to an official government organization in the United States.
Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.
This page provides guidance about methods and approaches to achieve de-identification in accordance with the Health Insurance Portability and Accountability Act of 1996 (HIPAA) Privacy Rule. The guidance explains and answers questions regarding the two methods that can be used to satisfy the Privacy Rule’s de-identification standard: Expert Determination and Safe Harbor 1 . This guidance is intended to assist covered entities to understand what is de-identification, the general process by which de-identified information is created, and the options available for performing de-identification.
In developing this guidance, the Office for Civil Rights (OCR) solicited input from stakeholders with practical, technical and policy experience in de-identification. OCR convened stakeholders at a workshop consisting of multiple panel sessions held March 8-9, 2010, in Washington, DC. Each panel addressed a specific topic related to the Privacy Rule’s de-identification methodologies and policies. The workshop was open to the public and each panel was followed by a question and answer period. Read more on the Workshop on the HIPAA Privacy Rule's De-Identification Standard. Read the Full Guidance .
1.1 Protected Health Information 1.2 Covered Entities, Business Associates, and PHI 1.3 De-identification and its Rationale 1.4 The De-identification Standard 1.5 Preparation for De-identification
2.1 Have expert determinations been applied outside of the health field? 2.2 Who is an “expert?” 2.3 What is an acceptable level of identification risk for an expert determination? 2.4 How long is an expert determination valid for a given data set? 2.5 Can an expert derive multiple solutions from the same data set for a recipient? 2.6 How do experts assess the risk of identification of information? 2.7 What are the approaches by which an expert assesses the risk that health information can be identified? 2.8 What are the approaches by which an expert mitigates the risk of identification of an individual in health information? 2.9 Can an Expert determine a code derived from PHI is de-identified? 2.10 Must a covered entity use a data use agreement when sharing de-identified data to satisfy the Expert Determination Method?
3.1 When can ZIP codes be included in de-identified information? 3.2 May parts or derivatives of any of the listed identifiers be disclosed consistent with the Safe Harbor Method? 3.3 What are examples of dates that are not permitted according to the Safe Harbor Method? 3.4 Can dates associated with test measures for a patient be reported in accordance with Safe Harbor? 3.5 What constitutes “any other unique identifying number, characteristic, or code” with respect to the Safe Harbor method of the Privacy Rule? 3.6 What is “actual knowledge” that the remaining information could be used either alone or in combination with other information to identify an individual who is a subject of the information? 3.7 If a covered entity knows of specific studies about methods to re-identify health information or use de-identified health information alone or in combination with other information to identify an individual, does this necessarily mean a covered entity has actual knowledge under the Safe Harbor method? 3.8 Must a covered entity suppress all personal names, such as physician names, from health information for it to be designated as de-identified? 3.9 Must a covered entity use a data use agreement when sharing de-identified data to satisfy the Safe Harbor Method? 3.10 Must a covered entity remove protected health information from free text fields to satisfy the Safe Harbor Method?
Protected health information.
The HIPAA Privacy Rule protects most “individually identifiable health information” held or transmitted by a covered entity or its business associate, in any form or medium, whether electronic, on paper, or oral. The Privacy Rule calls this information protected health information (PHI) 2 . Protected health information is information, including demographic information, which relates to:
For example, a medical record, laboratory report, or hospital bill would be PHI because each document would contain a patient’s name and/or other identifying information associated with the health data content.
By contrast, a health plan report that only noted the average age of health plan members was 45 years would not be PHI because that information, although developed by aggregating information from individual plan member records, does not identify any individual plan members and there is no reasonable basis to believe that it could be used to identify an individual.
The relationship with health information is fundamental. Identifying information alone, such as personal names, residential addresses, or phone numbers, would not necessarily be designated as PHI. For instance, if such information was reported as part of a publicly accessible data source, such as a phone book, then this information would not be PHI because it is not related to heath data (see above). If such information was listed with health condition, health care provision or payment data, such as an indication that the individual was treated at a certain clinic, then this information would be PHI.
Back to top
In general, the protections of the Privacy Rule apply to information held by covered entities and their business associates. HIPAA defines a covered entity as 1) a health care provider that conducts certain standard administrative and financial transactions in electronic form; 2) a health care clearinghouse; or 3) a health plan. 3 A business associate is a person or entity (other than a member of the covered entity’s workforce) that performs certain functions or activities on behalf of, or provides certain services to, a covered entity that involve the use or disclosure of protected health information. A covered entity may use a business associate to de-identify PHI on its behalf only to the extent such activity is authorized by their business associate agreement.
See the OCR website https://www.hhs.gov/ocr/privacy/ for detailed information about the Privacy Rule and how it protects the privacy of health information.
The increasing adoption of health information technologies in the United States accelerates their potential to facilitate beneficial studies that combine large, complex data sets from multiple sources. The process of de-identification, by which identifiers are removed from the health information, mitigates privacy risks to individuals and thereby supports the secondary use of data for comparative effectiveness studies, policy assessment, life sciences research, and other endeavors.
The Privacy Rule was designed to protect individually identifiable health information through permitting only certain uses and disclosures of PHI provided by the Rule, or as authorized by the individual subject of the information. However, in recognition of the potential utility of health information even when it is not individually identifiable, §164.502(d) of the Privacy Rule permits a covered entity or its business associate to create information that is not individually identifiable by following the de-identification standard and implementation specifications in §164.514(a)-(b). These provisions allow the entity to use and disclose information that neither identifies nor provides a reasonable basis to identify an individual. 4 As discussed below, the Privacy Rule provides two de-identification methods: 1) a formal determination by a qualified expert; or 2) the removal of specified individual identifiers as well as absence of actual knowledge by the covered entity that the remaining information could be used alone or in combination with other information to identify the individual.
Both methods, even when properly applied, yield de-identified data that retains some risk of identification. Although the risk is very small, it is not zero, and there is a possibility that de-identified data could be linked back to the identity of the patient to which it corresponds.
Regardless of the method by which de-identification is achieved, the Privacy Rule does not restrict the use or disclosure of de-identified health information, as it is no longer considered protected health information.
Section 164.514(a) of the HIPAA Privacy Rule provides the standard for de-identification of protected health information. Under this standard, health information is not individually identifiable if it does not identify an individual and if the covered entity has no reasonable basis to believe it can be used to identify an individual.
§ 164.514 Other requirements relating to uses and disclosures of protected health information. (a) Standard: de-identification of protected health information. Health information that does not identify an individual and with respect to which there is no reasonable basis to believe that the information can be used to identify an individual is not individually identifiable health information.
Sections 164.514(b) and(c) of the Privacy Rule contain the implementation specifications that a covered entity must follow to meet the de-identification standard. As summarized in Figure 1, the Privacy Rule provides two methods by which health information can be designated as de-identified.
Figure 1. Two methods to achieve de-identification in accordance with the HIPAA Privacy Rule.
The first is the “Expert Determination” method:
(b) Implementation specifications: requirements for de-identification of protected health information. A covered entity may determine that health information is not individually identifiable health information only if: (1) A person with appropriate knowledge of and experience with generally accepted statistical and scientific principles and methods for rendering information not individually identifiable: (i) Applying such principles and methods, determines that the risk is very small that the information could be used, alone or in combination with other reasonably available information, by an anticipated recipient to identify an individual who is a subject of the information; and (ii) Documents the methods and results of the analysis that justify such determination; or
The second is the “Safe Harbor” method:
(2)(i) The following identifiers of the individual or of relatives, employers, or household members of the individual, are removed:
(B) All geographic subdivisions smaller than a state, including street address, city, county, precinct, ZIP code, and their equivalent geocodes, except for the initial three digits of the ZIP code if, according to the current publicly available data from the Bureau of the Census: (1) The geographic unit formed by combining all ZIP codes with the same three initial digits contains more than 20,000 people; and (2) The initial three digits of a ZIP code for all such geographic units containing 20,000 or fewer people is changed to 000
(C) All elements of dates (except year) for dates that are directly related to an individual, including birth date, admission date, discharge date, death date, and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older
(D) Telephone numbers
(L) Vehicle identifiers and serial numbers, including license plate numbers
(E) Fax numbers
(M) Device identifiers and serial numbers
(F) Email addresses
(N) Web Universal Resource Locators (URLs)
(G) Social security numbers
(O) Internet Protocol (IP) addresses
(H) Medical record numbers
(P) Biometric identifiers, including finger and voice prints
(I) Health plan beneficiary numbers
(Q) Full-face photographs and any comparable images
(J) Account numbers
(R) Any other unique identifying number, characteristic, or code, except as permitted by paragraph (c) of this section [Paragraph (c) is presented below in the section “Re-identification”]; and
(K) Certificate/license numbers
(ii) The covered entity does not have actual knowledge that the information could be used alone or in combination with other information to identify an individual who is a subject of the information.
Satisfying either method would demonstrate that a covered entity has met the standard in §164.514(a) above. De-identified health information created following these methods is no longer protected by the Privacy Rule because it does not fall within the definition of PHI. Of course, de-identification leads to information loss which may limit the usefulness of the resulting health information in certain circumstances. As described in the forthcoming sections, covered entities may wish to select de-identification strategies that minimize such loss.
Re-identification
The implementation specifications further provide direction with respect to re-identification , specifically the assignment of a unique code to the set of de-identified health information to permit re-identification by the covered entity.
If a covered entity or business associate successfully undertook an effort to identify the subject of de-identified information it maintained, the health information now related to a specific individual would again be protected by the Privacy Rule, as it would meet the definition of PHI. Disclosure of a code or other means of record identification designed to enable coded or otherwise de-identified information to be re-identified is also considered a disclosure of PHI.
(c) Implementation specifications: re-identification. A covered entity may assign a code or other means of record identification to allow information de-identified under this section to be re-identified by the covered entity, provided that: (1) Derivation. The code or other means of record identification is not derived from or related to information about the individual and is not otherwise capable of being translated so as to identify the individual; and (2) Security. The covered entity does not use or disclose the code or other means of record identification for any other purpose, and does not disclose the mechanism for re-identification.
The importance of documentation for which values in health data correspond to PHI, as well as the systems that manage PHI, for the de-identification process cannot be overstated. Esoteric notation, such as acronyms whose meaning are known to only a select few employees of a covered entity, and incomplete description may lead those overseeing a de-identification procedure to unnecessarily redact information or to fail to redact when necessary. When sufficient documentation is provided, it is straightforward to redact the appropriate fields. See section 3.10 for a more complete discussion.
In the following two sections, we address questions regarding the Expert Determination method (Section 2) and the Safe Harbor method (Section 3).
In §164.514(b), the Expert Determination method for de-identification is defined as follows:
(1) A person with appropriate knowledge of and experience with generally accepted statistical and scientific principles and methods for rendering information not individually identifiable: (i) Applying such principles and methods, determines that the risk is very small that the information could be used, alone or in combination with other reasonably available information, by an anticipated recipient to identify an individual who is a subject of the information; and (ii) Documents the methods and results of the analysis that justify such determination
Yes. The notion of expert certification is not unique to the health care field. Professional scientists and statisticians in various fields routinely determine and accordingly mitigate risk prior to sharing data. The field of statistical disclosure limitation, for instance, has been developed within government statistical agencies, such as the Bureau of the Census, and applied to protect numerous types of data. 5
There is no specific professional degree or certification program for designating who is an expert at rendering health information de-identified. Relevant expertise may be gained through various routes of education and experience. Experts may be found in the statistical, mathematical, or other scientific domains. From an enforcement perspective, OCR would review the relevant professional experience and academic or other training of the expert used by the covered entity, as well as actual experience of the expert using health information de-identification methodologies.
There is no explicit numerical level of identification risk that is deemed to universally meet the “very small” level indicated by the method. The ability of a recipient of information to identify an individual (i.e., subject of the information) is dependent on many factors, which an expert will need to take into account while assessing the risk from a data set. This is because the risk of identification that has been determined for one particular data set in the context of a specific environment may not be appropriate for the same data set in a different environment or a different data set in the same environment. As a result, an expert will define an acceptable “very small” risk based on the ability of an anticipated recipient to identify an individual. This issue is addressed in further depth in Section 2.6.
The Privacy Rule does not explicitly require that an expiration date be attached to the determination that a data set, or the method that generated such a data set, is de-identified information. However, experts have recognized that technology, social conditions, and the availability of information changes over time. Consequently, certain de-identification practitioners use the approach of time-limited certifications. In this sense, the expert will assess the expected change of computational capability, as well as access to various data sources, and then determine an appropriate timeframe within which the health information will be considered reasonably protected from identification of an individual.
Information that had previously been de-identified may still be adequately de-identified when the certification limit has been reached. When the certification timeframe reaches its conclusion, it does not imply that the data which has already been disseminated is no longer sufficiently protected in accordance with the de-identification standard. Covered entities will need to have an expert examine whether future releases of the data to the same recipient (e.g., monthly reporting) should be subject to additional or different de-identification processes consistent with current conditions to reach the very low risk requirement.
Yes. Experts may design multiple solutions, each of which is tailored to the covered entity’s expectations regarding information reasonably available to the anticipated recipient of the data set. In such cases, the expert must take care to ensure that the data sets cannot be combined to compromise the protections set in place through the mitigation strategy. (Of course, the expert must also reduce the risk that the data sets could be combined with prior versions of the de-identified dataset or with other publically available datasets to identify an individual.) For instance, an expert may derive one data set that contains detailed geocodes and generalized aged values (e.g., 5-year age ranges) and another data set that contains generalized geocodes (e.g., only the first two digits) and fine-grained age (e.g., days from birth). The expert may certify a covered entity to share both data sets after determining that the two data sets could not be merged to individually identify a patient. This certification may be based on a technical proof regarding the inability to merge such data sets. Alternatively, the expert also could require additional safeguards through a data use agreement.
No single universal solution addresses all privacy and identifiability issues. Rather, a combination of technical and policy procedures are often applied to the de-identification task. OCR does not require a particular process for an expert to use to reach a determination that the risk of identification is very small. However, the Rule does require that the methods and results of the analysis that justify the determination be documented and made available to OCR upon request. The following information is meant to provide covered entities with a general understanding of the de-identification process applied by an expert. It does not provide sufficient detail in statistical or scientific methods to serve as a substitute for working with an expert in de-identification.
A general workflow for expert determination is depicted in Figure 2. Stakeholder input suggests that the determination of identification risk can be a process that consists of a series of steps. First, the expert will evaluate the extent to which the health information can (or cannot) be identified by the anticipated recipients. Second, the expert often will provide guidance to the covered entity or business associate on which statistical or scientific methods can be applied to the health information to mitigate the anticipated risk. The expert will then execute such methods as deemed acceptable by the covered entity or business associate data managers, i.e., the officials responsible for the design and operations of the covered entity’s information systems. Finally, the expert will evaluate the identifiability of the resulting health information to confirm that the risk is no more than very small when disclosed to the anticipated recipients. Stakeholder input suggests that a process may require several iterations until the expert and data managers agree upon an acceptable solution. Regardless of the process or methods employed, the information must meet the very small risk specification requirement.
Figure 2. Process for expert determination of de-Identification.
Data managers and administrators working with an expert to consider the risk of identification of a particular set of health information can look to the principles summarized in Table 1 for assistance. 6 These principles build on those defined by the Federal Committee on Statistical Methodology (which was referenced in the original publication of the Privacy Rule). 7 The table describes principles for considering the identification risk of health information. The principles should serve as a starting point for reasoning and are not meant to serve as a definitive list. In the process, experts are advised to consider how data sources that are available to a recipient of health information (e.g., computer systems that contain information about patients) could be utilized for identification of an individual. 8
Table 1. Principles used by experts in the determination of the identifiability of health information.
Prioritize health information features into levels of risk according to the chance it will consistently occur in relation to the individual. | Results of a patient’s blood glucose level test will vary | |
Demographics of a patient (e.g., birth date) are relatively stable | ||
Determine which external data sources contain the patients’ identifiers and the replicable features in the health information, as well as who is permitted access to the data source. | The results of laboratory reports are not often disclosed with identity beyond healthcare environments. | |
Patient name and demographics are often in public data sources, such as vital records -- birth, death, and marriage registries. | ||
Determine the extent to which the subject’s data can be distinguished in the health information. | It has been estimated that the combination of and is unique for approximately 0.04% of residents in the United States . This means that very few residents could be identified through this combination of data alone. | |
It has been estimated that the combination of a patient’s and is unique for over 50% of residents in the United States , . This means that over half of U.S. residents could be uniquely described just with these three data elements. | ||
The greater the replicability, availability, and distinguishability of the health information, the greater the risk for identification. | Laboratory values may be very distinguishing, but they are rarely independently replicable and are rarely disclosed in multiple data sources to which many people have access. | |
Demographics are highly distinguishing, highly replicable, and are available in public data sources. |
When evaluating identification risk, an expert often considers the degree to which a data set can be “linked” to a data source that reveals the identity of the corresponding individuals. Linkage is a process that requires the satisfaction of certain conditions. The first condition is that the de-identified data are unique or “distinguishing.” It should be recognized, however, that the ability to distinguish data is, by itself, insufficient to compromise the corresponding patient’s privacy. This is because of a second condition, which is the need for a naming data source, such as a publicly available voter registration database (see Section 2.6). Without such a data source, there is no way to definitively link the de-identified health information to the corresponding patient. Finally, for the third condition, we need a mechanism to relate the de-identified and identified data sources. Inability to design such a relational mechanism would hamper a third party’s ability to achieve success to no better than random assignment of de-identified data and named individuals. The lack of a readily available naming data source does not imply that data are sufficiently protected from future identification, but it does indicate that it is harder to re-identify an individual, or group of individuals, given the data sources at hand.
Example Scenario Imagine that a covered entity is considering sharing the information in the table to the left in Figure 3. This table is devoid of explicit identifiers, such as personal names and Social Security Numbers. The information in this table is distinguishing, such that each row is unique on the combination of demographics (i.e., Age , ZIP Code , and Gender ). Beyond this data, there exists a voter registration data source, which contains personal names, as well as demographics (i.e., Birthdate , ZIP Code , and Gender ), which are also distinguishing. Linkage between the records in the tables is possible through the demographics. Notice, however, that the first record in the covered entity’s table is not linked because the patient is not yet old enough to vote.
Figure 3. Linking two data sources to identity diagnoses.
Thus, an important aspect of identification risk assessment is the route by which health information can be linked to naming sources or sensitive knowledge can be inferred. A higher risk “feature” is one that is found in many places and is publicly available. These are features that could be exploited by anyone who receives the information. For instance, patient demographics could be classified as high-risk features. In contrast, lower risk features are those that do not appear in public records or are less readily available. For instance, clinical features, such as blood pressure, or temporal dependencies between events within a hospital (e.g., minutes between dispensation of pharmaceuticals) may uniquely characterize a patient in a hospital population, but the data sources to which such information could be linked to identify a patient are accessible to a much smaller set of people.
Example Scenario An expert is asked to assess the identifiability of a patient’s demographics. First, the expert will determine if the demographics are independently replicable . Features such as birth date and gender are strongly independently replicable—the individual will always have the same birth date -- whereas ZIP code of residence is less so because an individual may relocate. Second, the expert will determine which data sources that contain the individual’s identification also contain the demographics in question. In this case, the expert may determine that public records, such as birth, death, and marriage registries, are the most likely data sources to be leveraged for identification. Third, the expert will determine if the specific information to be disclosed is distinguishable . At this point, the expert may determine that certain combinations of values (e.g., Asian males born in January of 1915 and living in a particular 5-digit ZIP code) are unique, whereas others (e.g., white females born in March of 1972 and living in a different 5-digit ZIP code) are never unique. Finally, the expert will determine if the data sources that could be used in the identification process are readily accessible , which may differ by region. For instance, voter registration registries are free in the state of North Carolina, but cost over $15,000 in the state of Wisconsin. Thus, data shared in the former state may be deemed more risky than data shared in the latter. 12
The de-identification standard does not mandate a particular method for assessing risk.
A qualified expert may apply generally accepted statistical or scientific principles to compute the likelihood that a record in a data set is expected to be unique, or linkable to only one person, within the population to which it is being compared. Figure 4 provides a visualization of this concept. 13 This figure illustrates a situation in which the records in a data set are not a proper subset of the population for whom identified information is known. This could occur, for instance, if the data set includes patients over one year-old but the population to which it is compared includes data on people over 18 years old (e.g., registered voters).
The computation of population uniques can be achieved in numerous ways, such as through the approaches outlined in published literature. 14 , 15 For instance, if an expert is attempting to assess if the combination of a patient’s race, age, and geographic region of residence is unique, the expert may use population statistics published by the U.S. Census Bureau to assist in this estimation. In instances when population statistics are unavailable or unknown, the expert may calculate and rely on the statistics derived from the data set. This is because a record can only be linked between the data set and the population to which it is being compared if it is unique in both. Thus, by relying on the statistics derived from the data set, the expert will make a conservative estimate regarding the uniqueness of records.
Example Scenario Imagine a covered entity has a data set in which there is one 25 year old male from a certain geographic region in the United States. In truth, there are five 25 year old males in the geographic region in question (i.e., the population). Unfortunately, there is no readily available data source to inform an expert about the number of 25 year old males in this geographic region.
By inspecting the data set, it is clear to the expert that there is at least one 25 year old male in the population, but the expert does not know if there are more. So, without any additional knowledge, the expert assumes there are no more, such that the record in the data set is unique. Based on this observation, the expert recommends removing this record from the data set. In doing so, the expert has made a conservative decision with respect to the uniqueness of the record.
In the previous example, the expert provided a solution (i.e., removing a record from a dataset) to achieve de-identification, but this is one of many possible solutions that an expert could offer. In practice, an expert may provide the covered entity with multiple alternative strategies, based on scientific or statistical principles, to mitigate risk.
Figure 4. Relationship between uniques in the data set and the broader population, as well as the degree to which linkage can be achieved.
The expert may consider different measures of “risk,” depending on the concern of the organization looking to disclose information. The expert will attempt to determine which record in the data set is the most vulnerable to identification. However, in certain instances, the expert may not know which particular record to be disclosed will be most vulnerable for identification purposes. In this case, the expert may attempt to compute risk from several different perspectives.
The Privacy Rule does not require a particular approach to mitigate, or reduce to very small, identification risk. The following provides a survey of potential approaches. An expert may find all or only one appropriate for a particular project, or may use another method entirely.
If an expert determines that the risk of identification is greater than very small, the expert may modify the information to mitigate the identification risk to that level, as required by the de-identification standard. In general, the expert will adjust certain features or values in the data to ensure that unique, identifiable elements no longer, or are not expected to, exist. Some of the methods described below have been reviewed by the Federal Committee on Statistical Methodology 16 , which was referenced in the original preamble guidance to the Privacy Rule de-identification standard and recently revised.
Several broad classes of methods can be applied to protect data. An overarching common goal of such approaches is to balance disclosure risk against data utility. 17 If one approach results in very small identity disclosure risk but also a set of data with little utility, another approach can be considered. However, data utility does not determine when the de-identification standard of the Privacy Rule has been met.
Table 2 illustrates the application of such methods. In this example, we refer to columns as “features” about patients (e.g., Age and Gender) and rows as “records” of patients (e.g., the first and second rows correspond to records on two different patients).
Table 2. An example of protected health information.
15 | Male | 00000 | Diabetes |
21 | Female | 00001 | Influenza |
36 | Male | 10000 | Broken Arm |
91 | Female | 10001 | Acid Reflux |
A first class of identification risk mitigation methods corresponds to suppression techniques. These methods remove or eliminate certain features about the data prior to dissemination. Suppression of an entire feature may be performed if a substantial quantity of records is considered as too risky (e.g., removal of the ZIP Code feature). Suppression may also be performed on individual records, deleting records entirely if they are deemed too risky to share. This can occur when a record is clearly very distinguishing (e.g., the only individual within a county that makes over $500,000 per year). Alternatively, suppression of specific values within a record may be performed, such as when a particular value is deemed too risky (e.g., “President of the local university”, or ages or ZIP codes that may be unique). Table 3 illustrates this last type of suppression by showing how specific values of features in Table 2 might be suppressed (i.e., black shaded cells).
Table 3. A version of Table 2 with suppressed patient values.
Male | 00000 | Diabetes | |
21 | Female | 00001 | Influenza |
36 | Male | Broken Arm | |
Female | Acid Reflux |
A second class of methods that can be applied for risk mitigation are based on generalization (sometimes referred to as abbreviation) of the information. These methods transform data into more abstract representations. For instance, a five-digit ZIP Code may be generalized to a four-digit ZIP Code, which in turn may be generalized to a three-digit ZIP Code, and onward so as to disclose data with lesser degrees of granularity. Similarly, the age of a patient may be generalized from one- to five-year age groups. Table 4 illustrates how generalization (i.e., gray shaded cells) might be applied to the information in Table 2.
Table 4. A version of Table 2 with generalized patient values.
Under 21 | Male | 0000* | Diabetes |
Between 21 and 34 | Female | 0000* | Influenza |
Between 35 and 44 | Male | 1000* | Broken Arm |
45 and over | Female | 1000* | Acid Reflux |
A third class of methods that can be applied for risk mitigation corresponds to perturbation . In this case, specific values are replaced with equally specific, but different, values. For instance, a patient’s age may be reported as a random value within a 5-year window of the actual age. Table 5 illustrates how perturbation (i.e., gray shaded cells) might be applied to Table 2. Notice that every age is within +/- 2 years of the original age. Similarly, the final digit in each ZIP Code is within +/- 3 of the original ZIP Code.
Table 5. A version of Table 2 with randomized patient values.
16 | Male | 00002 | Diabetes |
20 | Female | 00000 | Influenza |
34 | Male | 10000 | Broken Arm |
93 | Female | 10003 | Acid Reflux |
In practice, perturbation is performed to maintain statistical properties about the original data, such as mean or variance.
The application of a method from one class does not necessarily preclude the application of a method from another class. For instance, it is common to apply generalization and suppression to the same data set.
Using such methods, the expert will prove that the likelihood an undesirable event (e.g., future identification of an individual) will occur is very small. For instance, one example of a data protection model that has been applied to health information is the k -anonymity principle. 18 , 19 In this model, “ k ” refers to the number of people to which each disclosed record must correspond. In practice, this correspondence is assessed using the features that could be reasonably applied by a recipient to identify a patient. Table 6 illustrates an application of generalization and suppression methods to achieve 2-anonymity with respect to the Age, Gender, and ZIP Code columns in Table 2. The first two rows (i.e., shaded light gray) and last two rows (i.e., shaded dark gray) correspond to patient records with the same combination of generalized and suppressed values for Age, Gender, and ZIP Code. Notice that Gender has been suppressed completely (i.e., black shaded cell).
Table 6, as well as a value of k equal to 2, is meant to serve as a simple example for illustrative purposes only. Various state and federal agencies define policies regarding small cell counts (i.e., the number of people corresponding to the same combination of features) when sharing tabular, or summary, data. 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 However, OCR does not designate a universal value for k that covered entities should apply to protect health information in accordance with the de-identification standard. The value for k should be set at a level that is appropriate to mitigate risk of identification by the anticipated recipient of the data set. 28
Table 6. A version of Table 2 that is 2-anonymized.
Under 30 | 0000* | Diabetes | |
Under 30 | 0000* | Influenza | |
Over 30 | 1000* | Broken Arm | |
Over 30 | 1000* | Acid Reflux |
As can be seen, there are many different disclosure risk reduction techniques that can be applied to health information. However, it should be noted that there is no particular method that is universally the best option for every covered entity and health information set. Each method has benefits and drawbacks with respect to expected applications of the health information, which will be distinct for each covered entity and each intended recipient. The determination of which method is most appropriate for the information will be assessed by the expert on a case-by-case basis and will be guided by input of the covered entity.
Finally, as noted in the preamble to the Privacy Rule, the expert may also consider the technique of limiting distribution of records through a data use agreement or restricted access agreement in which the recipient agrees to limits on who can use or receive the data, or agrees not to attempt identification of the subjects. Of course, the specific details of such an agreement are left to the discretion of the expert and covered entity.
There has been confusion about what constitutes a code and how it relates to PHI. For clarification, our guidance is similar to that provided by the National Institutes of Standards and Technology (NIST) 29 , which states:
“ De-identified information can be re-identified (rendered distinguishable) by using a code, algorithm, or pseudonym that is assigned to individual records. The code, algorithm, or pseudonym should not be derived from other related information* about the individual, and the means of re-identification should only be known by authorized parties and not disclosed to anyone without the authority to re-identify records. A common de-identification technique for obscuring PII [Personally Identifiable Information] is to use a one-way cryptographic function, also known as a hash function, on the PII.
*This is not intended to exclude the application of cryptographic hash functions to the information.”
In line with this guidance from NIST, a covered entity may disclose codes derived from PHI as part of a de-identified data set if an expert determines that the data meets the de-identification requirements at §164.514(b)(1). The re-identification provision in §164.514(c) does not preclude the transformation of PHI into values derived by cryptographic hash functions using the expert determination method, provided the keys associated with such functions are not disclosed, including to the recipients of the de-identified information.
No. The Privacy Rule does not limit how a covered entity may disclose information that has been de-identified. However, a covered entity may require the recipient of de-identified information to enter into a data use agreement to access files with known disclosure risk, such as is required for release of a limited data set under the Privacy Rule. This agreement may contain a number of clauses designed to protect the data, such as prohibiting re-identification. 30 Of course, the use of a data use agreement does not substitute for any of the specific requirements of the Expert Determination Method. Further information about data use agreements can be found on the OCR website. 31 Covered entities may make their own assessments whether such additional oversight is appropriate.
In §164.514(b), the Safe Harbor method for de-identification is defined as follows:
(R) Any other unique identifying number, characteristic, or code, except as permitted by paragraph (c) of this section; and
Covered entities may include the first three digits of the ZIP code if, according to the current publicly available data from the Bureau of the Census: (1) The geographic unit formed by combining all ZIP codes with the same three initial digits contains more than 20,000 people; or (2) the initial three digits of a ZIP code for all such geographic units containing 20,000 or fewer people is changed to 000. This means that the initial three digits of ZIP codes may be included in de-identified information except when the ZIP codes contain the initial three digits listed in the Table below. In those cases, the first three digits must be listed as 000.
OCR published a final rule on August 14, 2002, that modified certain standards in the Privacy Rule. The preamble to this final rule identified the initial three digits of ZIP codes, or ZIP code tabulation areas (ZCTAs), that must change to 000 for release. 67 FR 53182, 53233-53234 (Aug. 14, 2002)).
Utilizing 2000 Census data, the following three-digit ZCTAs have a population of 20,000 or fewer persons. To produce a de-identified data set utilizing the safe harbor method, all records with three-digit ZIP codes corresponding to these three-digit ZCTAs must have the ZIP code changed to 000. Covered entities should not, however, rely upon this listing or the one found in the August 14, 2002 regulation if more current data has been published .
The 17 restricted ZIP codes are:
The Department notes that these three-digit ZIP codes are based on the five-digit ZIP Code Tabulation Areas created by the Census Bureau for the 2000 Census. This new methodology also is briefly described below, as it will likely be of interest to all users of data tabulated by ZIP code. The Census Bureau will not be producing data files containing U.S. Postal Service ZIP codes either as part of the Census 2000 product series or as a post Census 2000 product. However, due to the public’s interest in having statistics tabulated by ZIP code, the Census Bureau has created a new statistical area called the Zip Code Tabulation Area (ZCTA) for Census 2000. The ZCTAs were designed to overcome the operational difficulties of creating a well-defined ZIP code area by using Census blocks (and the addresses found in them) as the basis for the ZCTAs. In the past, there has been no correlation between ZIP codes and Census Bureau geography. Zip codes can cross State, place, county, census tract, block group, and census block boundaries. The geographic designations the Census Bureau uses to tabulate data are relatively stable over time. For instance, census tracts are only defined every ten years. In contrast, ZIP codes can change more frequently. Because of the ill-defined nature of ZIP code boundaries, the Census Bureau has no file (crosswalk) showing the relationship between US Census Bureau geography and U.S. Postal Service ZIP codes.
ZCTAs are generalized area representations of U.S. Postal Service (USPS) ZIP code service areas. Simply put, each one is built by aggregating the Census 2000 blocks, whose addresses use a given ZIP code, into a ZCTA which gets that ZIP code assigned as its ZCTA code. They represent the majority USPS five-digit ZIP code found in a given area. For those areas where it is difficult to determine the prevailing five-digit ZIP code, the higher-level three-digit ZIP code is used for the ZCTA code. For further information, go to: https://www.census.gov/programs-surveys/geography/guidance/geo-areas/zctas.html
The Bureau of the Census provides information regarding population density in the United States. Covered entities are expected to rely on the most current publicly available Bureau of Census data regarding ZIP codes. This information can be downloaded from, or queried at, the American Fact Finder website (http://factfinder.census.gov). As of the publication of this guidance, the information can be extracted from the detailed tables of the “Census 2000 Summary File 1 (SF 1) 100-Percent Data” files under the “Decennial Census” section of the website. The information is derived from the Decennial Census and was last updated in 2000. It is expected that the Census Bureau will make data available from the 2010 Decennial Census in the near future. This guidance will be updated when the Census makes new information available.
No. For example, a data set that contained patient initials, or the last four digits of a Social Security number, would not meet the requirement of the Safe Harbor method for de-identification.
Elements of dates that are not permitted for disclosure include the day, month, and any other information that is more specific than the year of an event. For instance, the date “January 1, 2009” could not be reported at this level of detail. However, it could be reported in a de-identified data set as “2009”.
Many records contain dates of service or other events that imply age. Ages that are explicitly stated, or implied, as over 89 years old must be recoded as 90 or above. For example, if the patient’s year of birth is 1910 and the year of healthcare service is reported as 2010, then in the de-identified data set the year of birth should be reported as “on or before 1920.” Otherwise, a recipient of the data set would learn that the age of the patient is approximately 100.
No. Dates associated with test measures, such as those derived from a laboratory report, are directly related to a specific individual and relate to the provision of health care. Such dates are protected health information. As a result, no element of a date (except as described in 3.3. above) may be reported to adhere to Safe Harbor.
This category corresponds to any unique features that are not explicitly enumerated in the Safe Harbor list (A-Q), but could be used to identify a particular individual. Thus, a covered entity must ensure that a data set stripped of the explicitly enumerated identifiers also does not contain any of these unique features. The following are examples of such features:
Identifying Number There are many potential identifying numbers. For example, the preamble to the Privacy Rule at 65 FR 82462, 82712 (Dec. 28, 2000) noted that “Clinical trial record numbers are included in the general category of ‘any other unique identifying number, characteristic, or code.’
Identifying Code A code corresponds to a value that is derived from a non-secure encoding mechanism. For instance, a code derived from a secure hash function without a secret key (e.g., “salt”) would be considered an identifying element. This is because the resulting value would be susceptible to compromise by the recipient of such data. As another example, an increasing quantity of electronic medical record and electronic prescribing systems assign and embed barcodes into patient records and their medications. These barcodes are often designed to be unique for each patient, or event in a patient’s record, and thus can be easily applied for tracking purposes. See the discussion of re-identification.
Identifying Characteristic A characteristic may be anything that distinguishes an individual and allows for identification. For example, a unique identifying characteristic could be the occupation of a patient, if it was listed in a record as “current President of State University.”
Many questions have been received regarding what constitutes “any other unique identifying number, characteristic or code” in the Safe Harbor approach, §164.514(b)(2)(i)(R), above. Generally, a code or other means of record identification that is derived from PHI would have to be removed from data de-identified following the safe harbor method. To clarify what must be removed under (R), the implementation specifications at §164.514(c) provide an exception with respect to “re-identification” by the covered entity. The objective of the paragraph is to permit covered entities to assign certain types of codes or other record identification to the de-identified information so that it may be re-identified by the covered entity at some later date. Such codes or other means of record identification assigned by the covered entity are not considered direct identifiers that must be removed under (R) if the covered entity follows the directions provided in §164.514(c).
In the context of the Safe Harbor method, actual knowledge means clear and direct knowledge that the remaining information could be used, either alone or in combination with other information, to identify an individual who is a subject of the information. This means that a covered entity has actual knowledge if it concludes that the remaining information could be used to identify the individual. The covered entity, in other words, is aware that the information is not actually de-identified information.
The following examples illustrate when a covered entity would fail to meet the “actual knowledge” provision.
Example 1: Revealing Occupation Imagine a covered entity was aware that the occupation of a patient was listed in a record as “former president of the State University.” This information in combination with almost any additional data – like age or state of residence – would clearly lead to an identification of the patient. In this example, a covered entity would not satisfy the de-identification standard by simply removing the enumerated identifiers in §164.514(b)(2)(i) because the risk of identification is of a nature and degree that a covered entity must have concluded that the information could identify the patient. Therefore, the data would not have satisfied the de-identification standard’s Safe Harbor method unless the covered entity made a sufficient good faith effort to remove the ‘‘occupation’’ field from the patient record.
Example 2: Clear Familial Relation Imagine a covered entity was aware that the anticipated recipient, a researcher who is an employee of the covered entity, had a family member in the data (e.g., spouse, parent, child, or sibling). In addition, the covered entity was aware that the data would provide sufficient context for the employee to recognize the relative. For instance, the details of a complicated series of procedures, such as a primary surgery followed by a set of follow-up surgeries and examinations, for a person of a certain age and gender, might permit the recipient to comprehend that the data pertains to his or her relative’s case. In this situation, the risk of identification is of a nature and degree that the covered entity must have concluded that the recipient could clearly and directly identify the individual in the data. Therefore, the data would not have satisfied the de-identification standard’s Safe Harbor method.
Example 3: Publicized Clinical Event Rare clinical events may facilitate identification in a clear and direct manner. For instance, imagine the information in a patient record revealed that a patient gave birth to an unusually large number of children at the same time. During the year of this event, it is highly possible that this occurred for only one individual in the hospital (and perhaps the country). As a result, the event was reported in the popular media, and the covered entity was aware of this media exposure. In this case, the risk of identification is of a nature and degree that the covered entity must have concluded that the individual subject of the information could be identified by a recipient of the data. Therefore, the data would not have satisfied the de-identification standard’s Safe Harbor method.
Example 4: Knowledge of a Recipient’s Ability Imagine a covered entity was told that the anticipated recipient of the data has a table or algorithm that can be used to identify the information, or a readily available mechanism to determine a patient’s identity. In this situation, the covered entity has actual knowledge because it was informed outright that the recipient can identify a patient, unless it subsequently received information confirming that the recipient does not in fact have a means to identify a patient. Therefore, the data would not have satisfied the de-identification standard’s Safe Harbor method.
No. Much has been written about the capabilities of researchers with certain analytic and quantitative capacities to combine information in particular ways to identify health information. 32 , 33 , 34 , 35 A covered entity may be aware of studies about methods to identify remaining information or using de-identified information alone or in combination with other information to identify an individual. However, a covered entity’s mere knowledge of these studies and methods, by itself, does not mean it has “actual knowledge” that these methods would be used with the data it is disclosing. OCR does not expect a covered entity to presume such capacities of all potential recipients of de-identified data. This would not be consistent with the intent of the Safe Harbor method, which was to provide covered entities with a simple method to determine if the information is adequately de-identified.
No. Only names of the individuals associated with the corresponding health information (i.e., the subjects of the records) and of their relatives, employers, and household members must be suppressed. There is no explicit requirement to remove the names of providers or workforce members of the covered entity or business associate. At the same time, there is also no requirement to retain such information in a de-identified data set.
Beyond the removal of names related to the patient, the covered entity would need to consider whether additional personal names contained in the data should be suppressed to meet the actual knowledge specification. Additionally, other laws or confidentiality concerns may support the suppression of this information.
No. The Privacy Rule does not limit how a covered entity may disclose information that has been de-identified. However, nothing prevents a covered entity from asking a recipient of de-identified information to enter into a data use agreement, such as is required for release of a limited data set under the Privacy Rule. This agreement may prohibit re-identification. Of course, the use of a data use agreement does not substitute for any of the specific requirements of the Safe Harbor method. Further information about data use agreements can be found on the OCR website. 36 Covered entities may make their own assessments whether such additional oversight is appropriate.
PHI may exist in different types of data in a multitude of forms and formats in a covered entity. This data may reside in highly structured database tables, such as billing records. Yet, it may also be stored in a wide range of documents with less structure and written in natural language, such as discharge summaries, progress notes, and laboratory test interpretations. These documents may vary with respect to the consistency and the format employed by the covered entity.
The de-identification standard makes no distinction between data entered into standardized fields and information entered as free text (i.e., structured and unstructured text) -- an identifier listed in the Safe Harbor standard must be removed regardless of its location in a record if it is recognizable as an identifier.
Whether additional information must be removed falls under the actual knowledge provision; the extent to which the covered entity has actual knowledge that residual information could be used to individually identify a patient. Clinical narratives in which a physician documents the history and/or lifestyle of a patient are information rich and may provide context that readily allows for patient identification.
Medical records are comprised of a wide range of structured and unstructured (also known as “free text”) documents. In structured documents, it is relatively clear which fields contain the identifiers that must be removed following the Safe Harbor method. For instance, it is simple to discern when a feature is a name or a Social Security Number, provided that the fields are appropriately labeled. However, many researchers have observed that identifiers in medical information are not always clearly labeled. 37 . 38 As such, in some electronic health record systems it may be difficult to discern what a particular term or phrase corresponds to (e.g., is 5/97 a date or a ratio?). It also is important to document when fields are derived from the Safe Harbor listed identifiers. For instance, if a field corresponds to the first initials of names, then this derivation should be noted. De-identification is more efficient and effective when data managers explicitly document when a feature or value pertains to identifiers. Health Level 7 (HL7) and the International Standards Organization (ISO) publish best practices in documentation and standards that covered entities may consult in this process.
Example Scenario 1 The free text field of a patient’s medical record notes that the patient is the Executive Vice President of the state university. The covered entity must remove this information.
Example Scenario 2 The intake notes for a new patient include the stand-alone notation, “Newark, NJ.” It is not clear whether this relates to the patient’s address, the location of the patient’s previous health care provider, the location of the patient’s recent auto collision, or some other point. The phrase may be retained in the data.
Glossary of terms used in Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. Note: some of these terms are paraphrased from the regulatory text; please see the HIPAA Rules for actual definitions.
A person or entity that performs certain functions or activities that involve the use or disclosure of protected health information on behalf of, or provides services to, a covered entity. A member of the covered entity’s workforce is not a business associate. A covered health care provider, health plan, or health care clearinghouse can be a business associate of another covered entity. | |
---|---|
Any entity that is | |
A hash function that is designed to achieve certain security properties. Further details can be found at http://csrc.nist.gov/groups/ST/hash/ | |
A “disclosure” of Protected Health Information (PHI) is the sharing of that PHI outside of a covered entity. The sharing of PHI outside of the health care component of a covered entity is a disclosure. | |
A mathematical function which takes binary data, called the message, and produces a condensed representation, called the message digest. Further details can be found at http://csrc.nist.gov/groups/ST/hash/ | |
Any information, whether oral or recorded in any form or medium, that: | |
Information that is a subset of health information, including demographic information collected from an individual, and: (1) Is created or received by a health care provider, health plan, employer, or health care clearinghouse; and (2) Relates to the past, present, or future physical or mental health or condition of an individual; the provision of health care to an individual; or the past, present, or future payment for the provision of health care to the individual; and (i) That identifies the individual; or (ii) With respect to which there is a reasonable basis to believe the information can be used to identify the individual. | |
Individually identifiable health information: (1) Except as provided in paragraph (2) of this definition, that is: (i) Transmitted by electronic media; (ii) Maintained in electronic media; or (iii) Transmitted or maintained in any other form or medium. (2) Protected health information excludes individually identifiable health information in: (i) Education records covered by the Family Educational Rights and Privacy Act, as amended, 20 U.S.C. 1232g; (ii) Records described at 20 U.S.C. 1232g(a)(4)(B)(iv); and (iii) Employment records held by a covered entity in its role as employer. | |
Withholding information in selected records from release. |
Read the Full Guidance
Comments & Suggestions
In an effort to make this guidance a useful tool for HIPAA covered entities and business associates, we welcome and appreciate your sending us any feedback or suggestions to improve this guidance. You may submit a comment by sending an e-mail to [email protected] .
Read more on the Workshop on the HIPAA Privacy Rule's De-Identification Standard
Acknowledgements
OCR gratefully acknowledges the significant contributions made by Bradley Malin, PhD, to the development of this guidance, through both organizing the 2010 workshop and synthesizing the concepts and perspectives in the document itself. OCR also thanks the 2010 workshop panelists for generously providing their expertise and recommendations to the Department.
For more information on HHS's web notification policies, see Website Disclaimers .
Roeterseilandcampus - Gebouw G, Straat: Nieuwe Achtergracht 129-B, Ruimte: S.02
This study investigates the generalizability of Variational Autoencoders (VAE) for Item Response
Theory (IRT) parameter estimation from data with random missingness to data with Computerized
Adaptive Testing (CAT)-like missingness. Traditional IRT models face computational challenges
in high-dimensional spaces, particularly with the intractable integrals required for marginal
maximum likelihood (MML) estimation (Bock & Atkin, 1981). Recent advancements suggest
VAEs, which leverage a lower bound on the marginal log-likelihood, can produce comparable
results to MML methods while potentially being more computationally efficient (Curi et al. 2019;
Liu et al., 2022; Veldkamp under review). This research compares the performance of VAE and
MIRT across different missingness types, latent ability dimensions, and missing data proportions
focusing on comparison between random missing data and CAT missing data. Results indicate that
while VAE generally performs comparably to MIRT in accuracy, some discrepancies emerge
under CAT missingness in lower-dimensional settings. Surprisingly, the anticipated computational
efficiency of VAE over MIRT was not observed, with both methods showing similar computation
times. This unexpected result suggests that MML methods might be sufficient when the percentage
of missing data is high and VAE methods might not bring a lot of benefit. This finding should
however, be interpreted carefully because the hardware available for MML and VAE
computations wasn’t held constant.
IMAGES
COMMENTS
Among various types of data presentation, tabular is the most fundamental method, with data presented in rows and columns. Excel or Google Sheets would qualify for the job. Nothing fancy. This is an example of a tabular presentation of data on Google Sheets.
The selection of the most suitable data presentation method hinges on the specific dataset and the presentation's objectives. For instance, when comparing sales figures of different products, a bar chart shines in its simplicity and clarity. On the other hand, if your aim is to display how a product's sales have changed over time, a line graph ...
A data presentation is a slide deck that aims to disclose quantitative information to an audience through the use of visual formats and narrative techniques derived from data analysis, making complex data understandable and actionable. ... Plan ahead whether you want to use a thank-you slide, a video presentation, or which method is apt and ...
So, the presentation of data in ascending or descending order is a bit time-consuming. Hence, we can go for the method called ungrouped frequency distribution table or simply frequency distribution table. In this method, we can arrange the data in tabular form in terms of frequency. For example, 3 students scored 50 marks.
TheJoelTruth. While a good presentation has data, data alone doesn't guarantee a good presentation. It's all about how that data is presented. The quickest way to confuse your audience is by ...
1. Bar graph. Ideal for comparing data across categories or showing trends over time. Bar graphs, also known as bar charts are workhorses of data presentation. They're like the Swiss Army knives of visualization methods because they can be used to compare data in different categories or display data changes over time.
As you can see below, the table data above transforms from a complex table to a clear and concise visual. It's the identical range of data! The magic happens in the display of it. Charts are the key to success in the presentation of data and information. The table data above, transformed into a stunning, easy-to-read visual.
Always consider your audience's knowledge level and what information they need when you present your data. To present the data effectively: 1. Provide context to help the audience understand the numbers. 2. Compare data groups using visual aids. 3. Step back and view the data from the audience's perspective.
Effective data presentation skills are critical for being a world-class financial analyst. It is the analyst's job to effectively communicate the output to the target audience, such as the management team or a company's external investors. This requires focusing on the main points, facts, insights, and recommendations that will prompt the ...
In many ways, data presentation is like storytelling—only you do them with a series of graphs and charts. One of the most common mistakes presenters make is being so submerged in the data that they fail to view it from an outsider's point of view. Always keep this in mind: What makes sense to you may not make sense to your audience.
2) Clean and Organize Your Data. Now that you know what point you want to make with your data, it's time to make sure your numbers are ready to be visualized. Every good data visualization ...
This page titled 1.3: Presentation of Data is shared under a CC BY-NC-SA 3.0 license and was authored, remixed, and/or curated by Anonymous via source content that was edited to the style and standards of the LibreTexts platform. In this book we will use two formats for presenting data sets. Data could be presented as the data list or in set ...
Thankfully, we're here to help. Here are 10 data presentation tips to effectively communicate with executives, senior managers, marketing managers, and other stakeholders. 1. Choose a Communication Style. Every data professional has a different way of presenting data to their audience. Some people like to tell stories with data, illustrating ...
Methods of Data Presentation in Statistics. 1. Pictorial Presentation. It is the simplest form of data Presentation often used in schools or universities to provide a clearer picture to students, who are better able to capture the concepts effectively through a pictorial Presentation of simple data. 2.
Public health resources channel is now on TIKTOK - FOLLOW https://www.tiktok.com/@public_health_resourcesNext video to watch: Full lecture on Biostatistics h...
This method of displaying data uses diagrams and images. It is the most visual type for presenting data and provides a quick glance at statistical data. There are four basic types of diagrams, including: Pictograms: This diagram uses images to represent data. For example, to show the number of books sold in the first release week, you may draw ...
In this article, the techniques of data and information presentation in textual, tabular, and graphical forms are introduced. Text is the principal method for explaining findings, outlining trends, and providing contextual information. A table is best suited for representing individual information and represents both quantitative and ...
How to create data presentations. If you're ready to create your data presentation, here are some steps you can take: 1. Collect your data. The first step to creating a data presentation is to collect the data you want to use in your share. You might have some guidance about what audience members are looking for in your talk.
What are the Top 5 methods of Data Presentation? 1. Textual Ways of Presenting Data. Out of the five data presentation examples, this is the simplest one. Just write your findings coherently and your job is done. The demerit of this method is that one has to read the whole text to get a clear picture.
Data can be represented in countless ways. The format for the presentation of data will depend on the target audience and the information that needs to be relayed. In the end, data should be presented in a way that interpretation and analysis is made easy. Let us see some ways in which we represent data in economics.
Stem and Leaf Plot. This is a type of plot in which each value is split into a "leaf" (in most cases, it is the last digit) and "stem" (the other remaining digits). For example: the number 42 is split into leaf (2) and stem (4). Box and Whisker Plot. These plots divide the data into four parts to show their summary.
Presentation of Data refers to the exhibition of data in such a clear and attractive way that it is easily understood and analysed. Data can be presented in different forms, including Textual or Descriptive Presentation, Tabular Presentation, and Diagrammatic Presentation. ... This method of presenting data is not suitable for large sets of ...
Data can be presented in three ways: 1. Textual Mode of presentation is layman's method of presentation of data. Anyone can prepare, anyone can understand. No specific skill (s) is/are required. 2. Tabular Mode of presentation is the most accurate mode of presentation of data. It requires a lot of skill to prepare, and some skill (s) to ...
This paper applies a graphical model learning method to single-cell flow cytometry data to discover a directed signaling network. Article CAS PubMed Google Scholar Squires, C., Wang, Y. & Uhler, C ...
Testing the reliability of the method with simulated data. Because the inferential method for joint estimation of \(f\left( {\tau ,t} \right)\) and \(r\left( t \right)\) is deceptively complex ...
Step 2. Check whether the data value of "CheckedValue" is "1". If NOT, delete "CheckedValue", create a new "DWOR" in the blank and rename it as "CheckedValue". After that, modify the value data as "1". Step 3. Then, you can go back to see whether the Seagate external hard drive files showing up. Method 4.
To produce a de-identified data set utilizing the safe harbor method, all records with three-digit ZIP codes corresponding to these three-digit ZCTAs must have the ZIP code changed to 000. Covered entities should not, however, rely upon this listing or the one found in the August 14, 2002 regulation if more current data has been published.
From textual data to theoretical insights: Introducing and applying the word-text-topic extraction approach. Organizational Research Methods. Forthcoming. Google Scholar; Kruschke, J. K., Aguinis, H., & Joo, H. 2012. The time has come: Bayesian methods for data analysis in the organizational sciences. Organizational Research Methods, 15: ...
Preservation and Presentation of the Moving Image (duale master) Preventieve jeugdhulp en opvoeding (schakelprogramma) Privaatrechtelijke rechtspraktijk (master) ... of missing data is high and VAE methods might not bring a lot of benefit. This finding should. however, be interpreted carefully because the hardware available for MML and VAE ...