Cancer incidence among people living with HIV in Zimbabwe: A record linkage study

Abstract Background People living with HIV (PLWH) are at increased risk of developing cancer. Cancer diagnoses are often incompletely captured at antiretroviral therapy (ART) clinics. Aim To estimate the incidence and explore risk factors of cancer in a cohort of PLWH in Harare using probabilistic record linkage (PRL). Methods We conducted a retrospective cohort study that included PLWH aged ≥16 years starting ART between 2004 and 2017. We used PRL to match records from the Zimbabwe National Cancer Registry (ZNCR) with electronic medical records from an ART clinic in Harare to investigate the incidence of cancer among PLWH initiating ART. We matched records based on demographic data followed by manual clerical review. We followed PLWH up until first cancer diagnosis, death, loss to follow‐up, or 31 December 2017, whichever came first. Results We included 3442 PLWH (64.9% female) with 19 346 person‐years (PY) of follow‐up. Median CD4 count at ART initiation was 169 cells/mm3 (interquartile range [IQR]: 82–275), median age was 36.6 years (IQR: 30.6–43.4). There were 66 incident cancer cases for an overall incidence rate of 341/100 000 PY (95% confidence interval [CI]: 268–434). Twenty‐two of these cases were recorded in the ZNCR only. The most common cancers were cervical cancer (n = 16; 123/100 000 PY; 95% CI: 75–201), Kaposi sarcoma, and lymphoma (both n = 12; 62/100 000 PY; 95% CI: 35–109). Cancer incidence increased with age and decreased with higher CD4 cell counts at ART initiation. Conclusion PRL was key to correct for cancer under‐ascertainment in this cohort. The most common cancers were infection‐related types, reinforcing the role of early HIV treatment, human papillomavirus vaccination, and cervical cancer screening for cancer prevention in this setting.

test-and-treat approach to ART initiation. [3][4][5] PLWH who initiate ART at CD4 counts less than 350 cells/mm 3 or with an AIDS defining condition are known to be at increased risk of developing cancer. 6 Furthermore, HIV infection diminishes the ability to clear human papillomavirus (HPV) infection, which increases the risk of HPV related cancers, such as cervical cancer, among PLWH. 7 The impact of HIV on cancer risk in sub-Saharan Africa is still poorly understood. 8 Reporting of cancer incidence among PLWH is often hampered by the absence of HIV serostatus information in national cancer databases, and cancer diagnoses are often incompletely captured at ART clinics. 9 Record linkage methods have been widely used in high-income countries but are not commonly used in low-to middle-income countries due to low uptake of electronic health records systems by health institutions. [9][10][11] Probabilistic record linkage (PRL), a method of linking records from different systems in the absence of a common unique identifier, have been used to improve under ascertainment of cancer incidence in sub-Saharan Africa. 9,10,12 In the current study, we estimated the incidence of cancer in a cohort of PLWH in Harare using PRL of records from the Zimbabwe National Cancer Registry (ZNCR) and an ART clinic, and explored risk factors for incident cancer in this population.

| METHODS
We conducted a retrospective cohort study using datasets from the ZNCR and Newlands Clinic.

| Study setting
The ZNCR is a population-based registry for Harare. It is operated by

| Deduplication and record linkage
We used PRL to identify duplicate entries of the same patient in each of the two datasets. This involved linking each dataset to itself by comparing each record to other records in the same dataset using first names, middle names, last names, national identification numbers, year of birth, month of birth, and day of birth. There were no additional variables common to both databases that could be used for PRL. We compared names as complete texts as well as n-grams. The variables used for record linkage were assessed for completeness.
However, we added all records available into the deduplication and record linkage workflows regardless of their completeness. We used probability scores that were computed based on matching variables to classify records into matches, probable matches, and mismatches. We considered match pairs with a score of less than 12 as mismatches and those with scores >25 as definite matches.
We then linked deduplicated datasets from the ZNCR and Newlands Clinic to each other using the same variables. Records classified as matches or probable matches underwent another clerical review process. We used KNIME software for deduplication and PRL.
Details of the record linkage parameters are shown in Appendix S1.
Records were anonymized after PRL.

| Statistical analysis and definitions
We used frequencies to describe patient characteristics and the spec-   other infection related cancers were the most common cancer types in this cohort of PLWH on ART. Lower CD4 cell counts at the time of ART initiation and older age were associated with higher risk of developing cancer. Intensified efforts toward HPV vaccinations and the WHO prescribed "test-and-treat" approach for early ART initiation may help lower cancer incidence among PLWH.

ACKNOWLEDGMENTS
The authors acknowledge the ZNCR staff, Newlands Clinic patients and staff, as well Joerg Reiher who built the additional regular expression, string transformation and K-Link KNIME nodes used in this project.

CONFLICT OF INTEREST
The authors declare no conflict of interest.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the corresponding author upon reasonable request.

SUPPORTING INFORMATION
Additional supporting information may be found in the online version of the article at the publisher's website.