1.1 Introduction

In this vignette we show different functions to get characteristics (e.g. age, sex, prior history…) of subjects in OMOP CDM tables and cohort tables. This can be useful when doing explanatory analysis as well as calling these functions for more complex analyses.

The PatientProfiles package is designed to work with data in the OMOP CDM format, so our first step is to create a reference to the data using the DBI and CDMConnector packages. The connection to a Postgres database would look like:

library(DBI)
library(CDMConnector)

# The input arguments provided are for illustrative purposes only and do not provide access to any database.

con <- DBI::dbConnect(RPostgres::Postgres(),
  dbname = "omop_cdm",
  host = "10.80.192.00",
  user = "user_name",
  password = "user_pasword"
)

cdm <- CDMConnector::cdm_from_con(con,
  cdm_schema = "main",
  write_schema = "main",
  cohort_tables = "cohort_example"
)

For this example we will work with simulated data generated by the mockPatientProfiles() function provided in this package, which mimics a database formatted in OMOP:

library(PatientProfiles)
library(duckdb)
library(dplyr)

cdm <- mockPatientProfiles(
  patient_size = 1000,
  drug_exposure_size = 1000
)

1.2 Example: get characteristics in tables

addAge(): adds a new column to the input table containing each patient’s age at a certain date, specified in indexDate. Function allows to set month and/or day of birth to patients with missings or it can be imposed to all subjects. Further, the function can classify patient’s into different age groups based on the argument ageGroup.

Suppose we want to calculate the age at condition start date for records in the condition_occurrence table. Also, we wan to group patients in 20-year age band and if they are 60 years old or more.

cdm$condition_occurrence %>%
  glimpse()
## Rows: ??
## Columns: 6
## Database: DuckDB 0.7.1 [martics@Windows 10 x64:R 4.2.1/:memory:]
## $ condition_occurrence_id   <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1…
## $ person_id                 <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1…
## $ condition_concept_id      <int> 4, 3, 5, 2, 3, 4, 4, 3, 5, 4, 1, 1, 4, 4, 3,…
## $ condition_start_date      <date> 2005-06-30, 2005-05-28, 2008-06-30, 2011-01…
## $ condition_end_date        <date> 2007-07-25, 2007-09-16, 2010-10-06, 2011-10…
## $ condition_type_concept_id <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
cdm$condition_occurrence_mod <- cdm$condition_occurrence %>%
  addAge(
    ageDefaultMonth = 1,
    ageDefaultDay = 6,
    indexDate = "condition_start_date",
    ageGroup = list(
      "age_band_20" =
        list(
          "0 to 19" = c(0, 19),
          "20 to 39" = c(20, 39),
          "40 to 59" = c(40, 59),
          "60 to 79" = c(60, 79),
          "80 to 99" = c(80, 99),
          ">= 100" = c(100, 150)
        ),
      "age_threshold_60" =
        list(
          "less60" = c(0, 59),
          "more60" = c(60, 150)
        )
    )
  ) |>
  dplyr::compute(name = "condition_occurrence_mod")

cdm$condition_occurrence_mod %>%
  glimpse()
## Rows: ??
## Columns: 9
## Database: DuckDB 0.7.1 [martics@Windows 10 x64:R 4.2.1/:memory:]
## $ condition_occurrence_id   <int> 1, 2, 4, 5, 6, 7, 8, 9, 11, 12, 13, 17, 18, …
## $ person_id                 <int> 1, 2, 4, 5, 6, 7, 8, 9, 11, 12, 13, 17, 18, …
## $ condition_concept_id      <int> 4, 3, 2, 3, 4, 4, 3, 5, 1, 1, 4, 4, 1, 5, 3,…
## $ condition_start_date      <date> 2005-06-30, 2005-05-28, 2011-01-25, 2005-04…
## $ condition_end_date        <date> 2007-07-25, 2007-09-16, 2011-10-03, 2006-10…
## $ condition_type_concept_id <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ age                       <dbl> 7, 59, 62, 69, 38, 23, 83, 43, 47, 54, 63, 4…
## $ age_band_20               <chr> "0 to 19", "40 to 59", "60 to 79", "60 to 79…
## $ age_threshold_60          <chr> "less60", "less60", "more60", "more60", "les…

addSex(): appends a column to the input table indicating the sex for each patient as “Female” or “Male”.

First, we can add the sex of the patients to the table. This information can be used to count the occurrences of the condition_concept_id = 5 in males aged 60 years or older. We can also stratify the number of events by age, grouping patients into 20-year age bands.

cdm$condition_occurrence_mod <- cdm$condition_occurrence_mod %>%
  addSex()

cdm$condition_occurrence_mod %>%
  glimpse()
## Rows: ??
## Columns: 10
## Database: DuckDB 0.7.1 [martics@Windows 10 x64:R 4.2.1/:memory:]
## $ condition_occurrence_id   <int> 1, 2, 4, 5, 6, 7, 8, 9, 11, 12, 13, 17, 18, …
## $ person_id                 <int> 1, 2, 4, 5, 6, 7, 8, 9, 11, 12, 13, 17, 18, …
## $ condition_concept_id      <int> 4, 3, 2, 3, 4, 4, 3, 5, 1, 1, 4, 4, 1, 5, 3,…
## $ condition_start_date      <date> 2005-06-30, 2005-05-28, 2011-01-25, 2005-04…
## $ condition_end_date        <date> 2007-07-25, 2007-09-16, 2011-10-03, 2006-10…
## $ condition_type_concept_id <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ age                       <dbl> 7, 59, 62, 69, 38, 23, 83, 43, 47, 54, 63, 4…
## $ age_band_20               <chr> "0 to 19", "40 to 59", "60 to 79", "60 to 79…
## $ age_threshold_60          <chr> "less60", "less60", "more60", "more60", "les…
## $ sex                       <chr> "Female", "Female", "Female", "Male", "Male"…
numConditions <- cdm$condition_occurrence_mod %>%
  filter(
    sex == "Male"
  ) %>%
  filter(
    age_threshold_60 == "more60"
  ) %>%
  filter(
    condition_concept_id == 5
  ) %>%
  group_by(
    age_band_20
  ) %>%
  summarise(
    n = count(condition_occurrence_id)
  )

numConditions
## # Source:   SQL [2 x 2]
## # Database: DuckDB 0.7.1 [martics@Windows 10 x64:R 4.2.1/:memory:]
##   age_band_20     n
##   <chr>       <dbl>
## 1 80 to 99       15
## 2 60 to 79       28

1.3 Example: get characteristics in cohort tables

PatientProfiles functions can be used on both OMOP CDM tables and cohort tables. In this example we will see some of the package functionalities applied to a cohort table:

addInObservation(): adds a new binary column to the input table, indicating whether the subjects are being observed at a specific time.

addPriorObservation(): appends a column to the input table containing the number of days each patient has been in observation up to a specified date.

addFutureObservation(): adds a column with the days of future observation for an individual at a certain date

We can use the first function to obtain patients which are in observation at “cohort_start_date” and subsequently get their prior and future observation days. Notice that we are not using the argument “indexDate”, since it is already defaulted to “cohort_start_date”.

cdm$cohort1 %>%
  glimpse()
## Rows: ??
## Columns: 4
## Database: DuckDB 0.7.1 [martics@Windows 10 x64:R 4.2.1/:memory:]
## $ cohort_definition_id <dbl> 1, 1, 1, 2
## $ subject_id           <dbl> 1, 1, 2, 3
## $ cohort_start_date    <date> 2020-01-01, 2020-06-01, 2020-01-02, 2020-01-01
## $ cohort_end_date      <date> 2020-04-01, 2020-08-01, 2020-02-02, 2020-03-01
cdm$cohort1 <- cdm$cohort1 %>%
  addInObservation() %>%
  filter(
    in_observation == 1
  ) %>%
  addPriorObservation() %>%
  addFutureObservation()

cdm$cohort1 %>%
  glimpse()
## Rows: ??
## Columns: 7
## Database: DuckDB 0.7.1 [martics@Windows 10 x64:R 4.2.1/:memory:]
## $ cohort_definition_id <dbl> 1, 1, 2, 1
## $ subject_id           <dbl> 1, 2, 3, 1
## $ cohort_start_date    <date> 2020-06-01, 2020-01-02, 2020-01-01, 2020-01-01
## $ cohort_end_date      <date> 2020-08-01, 2020-02-02, 2020-03-01, 2020-04-01
## $ in_observation       <dbl> 1, 1, 1, 1
## $ prior_observation    <dbl> 4209, 4486, 5267, 4057
## $ future_observation   <dbl> 9196, 6296, 1121, 9348

If the database allows for multiple observation periods, it’s important to note that the results of the previous functions will be based on the period where “indexDate” falls within. If a patient is not under observation at the specified date, addPriorObservation() and addFutureObservation() functions will return NA.

1.4 Example: get all characteristics at once

addDemographics(): can be used to add all the features presented in this vignette (except for addInObservation()) at once, in both tables and cohort tables.

If we want to get the age, sex and prior history of individuals at the day they enter a cohort, we can use the function addDemographics() as follows

cdm$cohort2 %>%
  glimpse()
## Rows: ??
## Columns: 4
## Database: DuckDB 0.7.1 [martics@Windows 10 x64:R 4.2.1/:memory:]
## $ cohort_definition_id <dbl> 1, 1, 2, 3, 1
## $ subject_id           <dbl> 1, 3, 1, 2, 1
## $ cohort_start_date    <date> 2019-12-30, 2020-01-01, 2020-05-25, 2020-01-01, 2…
## $ cohort_end_date      <date> 2019-12-30, 2020-01-01, 2020-05-25, 2020-01-01, 2…
cdm$cohort2 <- cdm$cohort2 %>%
  addDemographics(
    age = TRUE,
    ageName = "age",
    ageGroup = NULL,
    sex = TRUE,
    sexName = "sex",
    priorObservation = TRUE,
    priorObservationName = "prior_observation",
    futureObservation = FALSE,
  )

cdm$cohort2 %>%
  glimpse()
## Rows: ??
## Columns: 7
## Database: DuckDB 0.7.1 [martics@Windows 10 x64:R 4.2.1/:memory:]
## $ cohort_definition_id <dbl> 1, 3, 2, 1, 1
## $ subject_id           <dbl> 1, 2, 1, 1, 3
## $ cohort_start_date    <date> 2020-05-25, 2020-01-01, 2020-05-25, 2019-12-30, 2…
## $ cohort_end_date      <date> 2020-05-25, 2020-01-01, 2020-05-25, 2019-12-30, 2…
## $ age                  <dbl> 22, 73, 22, 22, 21
## $ sex                  <chr> "Female", "Female", "Female", "Female", "Male"
## $ prior_observation    <dbl> 4202, 4485, 4202, 4055, 5267