Is it Just a Numbers Game? Predicting School Access for Out-of-School Children in Pakistan

By Dr Jessica Albrent, Dr Farah Nadeem and Dr Faisal Bari
Is it Just a Numbers Game Predicting School Access for Out-of-School Children in Pakistan

With the world’s largest population of out-of-school children, Pakistan is facing an education emergency that will have far-reaching consequences in economic, social, and health development for at least the next two generations. Addressing this issue is challenging because education is chronically underfunded in Pakistan, with only 1.9% of GDP spent on education in 2023. In comparison, India, Afghanistan, and China all spend 4% or more of their GDP on education.

Pakistan faces numerous development challenges that are expected to intensify as the economy continues to struggle, climate change worsens and concerns about a global recession escalate. How, then, do policymakers prioritise limited financial and human resources to maximise school enrolment?

Our team is working on the Equity Equation Project: Predictive Modelling of Policy Impact on School Enrolment, under DARE-RC. This study aims to develop agent-based simulation models (ABM) that predict school enrolment decisions made by families facing marginalisation in response to policies in education and other sectors. As part of initial investigations, the team has completed a literature review on push and pull factors that create barriers to school enrolment. The team has also analysed publicly available data from Pakistan Social and Living Standards Measurement (PSLM) 2019-2020 and the Multiple Indicator Cluster Surveys (MICS) 2017-2018 to identify individual and household level factors that are highly predictive of school enrolment.

Push and Pull Factors for School Access

Our team conducted a literature review that revealed a number of factors within a child’s life which influence school enrolment. These include children’s immediate family, community, social, and national environments. We examined the literature across various contexts, particularly in the Global South, to identify barriers to education. These include push factors, or school-related challenges to enrolment, such as a shortage of schools, and schools that do not support disability, have poor teaching, and insufficient facilities. Conversely, pull factors are external and contextual barriers to education, such as financial and family needs, climate crises, or sociocultural expectations. We were able to organise these factors into an ecological systems theory model (EST), teasing them out to understand the intersections of barriers across several layers of influence, ranging from the individual level (microsystem) to the time-bound and historical level (chronosystem).

Push and Pull Factors that Create Barriers to School Access Using EST

Credit: Shahbakht Mubarik, Senior Education Research Associate, Equity Education Project

The literature revealed that two cross-cutting barriers impact school enrolment: gender discrimination and poverty. These affect school enrolment at every ecological system level from the individual family to large-scale national policies. Gender discrimination can be seen in policies that do not protect or value women in the workforce, in community narratives around women’s education, and in household decision to restrict girls’ participation in education. Economic poverty also has a similar impact. It leads to limited funds for school expenses, creates low levels of investment on basic education, inhibits access to important technological learning tools in remote communities, and forces families to put their children in the labour market to ensure everyone has enough to eat.

These barriers to school enrolment are well-explored in the global literature and are clear. We further wanted to explore how these various factors predict access to education within Pakistan’s context, and which predictive factors rank the highest.

What Really Determines Enrolment?

We conducted predictive analyses with neural networks and random forests using the MICS data for the Lahore division and the PSLM data for the Nankana, Lahore, Kasur, and Sheikhupura districts. Random forest models generated the most accurate predictions on school enrolment, with around 90% accuracy on the MICS and 83-85% accuracy on PSLM, depending on the district. We ranked the importance of each factor in predicting school enrolment, and the interaction of features in predicting access to education by using SHAP decomposition in tandem with the trained models.

Interestingly, our initial predictive analysis revealed that the proxies for socioeconomic status (SES) predict access to education, specifically mobile phone ownership, household wealth, and parental education. In the MICS survey, we ranked some of the most important features that included whether the child enjoys reading, parental education, household wealth, and disability status, among others.

MICS Lahore Division SHAP Feature Importance Plot for School Enrolment for 5-16 Year Olds

PSLM was more granular and enabled us to look for changes across districts within the Lahore division. We saw similar trends suggesting that access to food, housing, heating, water, and WASH facilities in the home predicted school enrolment (though proxies for these changed over the districts). This indicates that SES remained a crucial predictive factor even as the context changed.  

Only in Nankana was gender an important predictor of educational access.

PSLM Kasur District SHAP Feature Importance Plot for School Enrolment for 5-16 Year Olds

While we certainly expected to see many proxies for SES as highly ranked features, we anticipated that some of the other diverse barriers and enablers from the literature review would also be included, particularly gender. In the face of so many SES proxies, other features simply did not rank in the top 20 features as often or at all.

The Big Questions

Our research team is deeply committed to several core values and ideas, one being that evidence-informed decision-making is one of our strongest enabling tools for making positive changes. Considering how important the poverty indicator is in both the literature and the secondary predictive analyses; we considered possible implications:

What if the most critical key to enrolment is centred on alleviating poverty, while providing access to education through better schools, better roads, addressing cultural barriers, etc. is secondary?

In answering these questions, we keep in mind that data-use experts emphasise that understanding data limitations is key to making evidence-informed decisions. We revisited the literature and the available secondary data, probing for assumptions and missing information, which helped us identify data gaps that we hope to close in our study. Given the high accuracy of our current models on test data, incorporating additional factors may yield diminishing improvements in predictive performance. We have outlined some of these gaps and their suggested areas of investigation.

Missing Data: First, not all factors found in the literature are captured in the secondary survey data. For example, the literature says that safe, reliable, and affordable transportation is an important enabling factor for school enrolment, but the PSLM and MICS data does not provide information on the relative availability of nearby schools or transportation to a school. Attitudes and experiences related to concerns for gender-based safety are not included in these surveys, nor are the perceptions of local schools.

We hope to uncover the community-level importance of factors found in the literature which lack consolidated data. We also believe this will help us suggest important data points for future collection and analysis to our partners, particularly the Pakistan Institute of Education (PIE).

Intersections and Directionality of the Data: Predictive analysis using neural networks allows us to rank the importance of factors non-parametrically, capturing complex feature interactions and how they influence the outcome. However, there is a limit to this analysis. While the interactions of multiple factors (student’s gender, type of housing, food security, etc.) and their resulting outcomes are captured in the models, we cannot visualise how they interact with each other. For example, how do multiple indicators of socioeconomic status interact with gender to influence outcomes? Our predictive analysis can only tell us that they interact, not how. As a result, our ability to explain complex multi-dimensional interactions is limited.

The study uses primary qualitative data at the family level to unpack and deentangle how these factors interact to influence decision-making for school enrolment.

Granularity of the Data: While the MICS data is representative at the division level, the PSLM data allowed for a more granular look at the districts within the Lahore division. It also revealed some differences between how important features change from district to district. How would features change if we could move to an even smaller administrative unit such as the tehsil or union council?  If we were to look at communities that have high levels of religious and linguistic minorities, how would the results be different? The literature reveals that these demographic differences create an intersectionality of experiences, pressures, and push and pull factors that may influence school enrolment decisions. Therefore, the ability to examine such communities would give insight into how these intersections impact decision-making.

Our team will carefully select sites having historically marginalised communities and conduct qualitative data collection to ensure these intersections are captured. We will also work with government partners to explore the possibility of using more granular data sets for future predictive analysis.  

Contextual Nature of the Data: As this short project focuses on creating an ABM for proof of concept, we chose to work on targeted, manageable, and accessible locations that are relatively stable in the Punjab province. As a result, we only looked at the secondary data from the Lahore division, which is not representative of the nation, or even Punjab as a whole. In other areas of Pakistan, issues of safety, gender, and school access may rank more highly as predictive factors, particularly in remote or unstable areas.

Our team anticipates planning to scale the project to other divisions and provinces in collaboration with key partners.

Next Steps

The Equity Education Project is a multi-stage, iterative project, combining quantitative and qualitative data to create simulations of possible policy outcomes. Our next step is a qualitative investigation of how families perceive and interact with the factors identified in the literature and the secondary analysis to make school enrolment decisions. The findings will help us in enhancing our understanding of the existing data and allow us to identify and close data gaps.

Acknowledgements: We thank Hamna Dogar, Jawad Ali, Anees Amjad, and Shahbakht Mubarik for contributing to this work. Pak Alliance for Maths and Science (PAMS) is the implementation lead for the Equity Equation Project: Predictive Modelling of Policy Impact on School Enrolment.

Authors: Dr Jessica Albrent (Associate Professor, LUMS), Dr Farah Nadeem (Associate Professor, LUMS) and Dr Faisal Bari (Dean, LUMS School of Education)

Copy-Editor: Maryam Beg Mirza (Assistant Consultant, Education at OPM)

Quality Assurance: Dr Sahar Shah (Senior Research Manager, DARE-RC)

Share the Post:

Related Posts