Incidences of cyber attacks on corporations that compromise the privacy of sensitive data have increased in recent years. As such, eSimplicity employs mechanisms including two-factor authentication or encryption and hashing of sensitive information to protect data stores from possible breaches.
Operationalize your insights
At eSimplicity, do not want our jobs to end with finding insights, rather we want to help encourage executives and other business leaders to put findings into place. Think about the actionable reports that can be created or changes to existing apps and processes. Your findings could affect the creation or improvement of services, app, or products. We know that at the end of the day, it is not about how many insights get uncovered. What business care about is how much impact we have on the people and society we serve.
Model creation and evaluation
eSimplicity starts with univariate descriptives and graphs to help us find any errors that we missed during cleaning. Next, we run bivariate descriptives, again including graphs. We want to understand how each potential predictor relates, on its own, to the outcome and to every other predictor. And we think of predictors in sets. In many models, the predictors may be in theoretically distinct sets. By building the models within those sets first, can see how related variables work together and then what happens once we put them together. Every model we run tells a story – so we stop and listen to it. When we pause to do this, we make better decisions on the model to run next.
Taking a data science model from the sandbox to a production environment is fraught with pitfalls. The nature of data science projects require many tests at each step of the project. For this reason, a common practice for data science projects is using notebooks. At the end of the project, it is very likely to have excess code in spanning multiple notebooks that will not be used in production. When engineers finish one step, they may take the output of the completed step, leaving everything in their notebooks unorganized, and immediately continue to the next step. When the code runs in the final step, it is a mistake to assume the code is production-ready and the project is complete. This misunderstanding can lead to loss of time and money. eSimplicity’s processes help you test, scale, and release your models with ease.
The data pipeline is the most critical part of any organization’s data architecture. eSimplicity’s pipeline process ensures consistency in your data, with minimum fuss. When there are changes in the upstream datasets, our pipeline handles the updates graciously. On-boarding a new dataset brings additional complexity to a data pipeline. We take all datasets through our standard onboarding and cleansing process to ensure it flows reliably.
Healthcare data normalization
eSimplicity has significant experience in building normalized healthcare datasets including provider, payer, beneficiary, patient, as well as clinical and outcome measure data. We often minimize duplicate data, minimize or avoid data modification issues, and simplify queries. As we go through the various states of normalization we discuss how each form addresses these issues, but to start, we will look at some data which hasn’t been normalized and discuss potential pitfalls.
Data matching and entity resolution
eSimplicity’s identity matching process uses deterministic and probabilistic matching. We identify, match and merge records that correspond to the same entities from several databases or even within one database.