• Research Article
  • Open access
  • Published: 17 February 2023

Learning effectiveness of a flexible learning study programme in a blended learning design: why are some courses more effective than others?

  • Claude Müller   ORCID: orcid.org/0000-0003-1987-6662 1 ,
  • Thoralf Mildenberger   ORCID: orcid.org/0000-0001-7242-1873 1 &
  • Daniel Steingruber 1  

International Journal of Educational Technology in Higher Education volume  20 , Article number:  10 ( 2023 ) Cite this article

18k Accesses

14 Citations

8 Altmetric

Metrics details

Flexible learning addresses students’ needs for more flexibility and autonomy in shaping their learning process, and is often realised through online technologies in a blended learning design. While higher education institutions are increasingly considering replacing classroom time and offering more blended learning, current research is limited regarding its effectiveness and modifying design factors. This study analysed a flexible study programme with 133 courses in a blended learning design in different disciplines over more than 4 years with a mixed-methods approach. In the analysed flexible study programme, classroom instruction time was reduced by 51% and replaced with an online learning environment in a blended learning format ( N students = 278). Student achievement was compared to the conventional study format ( N students = 1068). The estimated summary effect size for the 133 blended learning courses analysed was close to, but not significantly different from, zero ( d  = − 0.0562, p  = 0.3684). Although overall effectiveness was equivalent to the conventional study format, considerable variance in the effect sizes between the courses was observed. Based on the relative effect sizes of the courses and data from detailed analyses and surveys, heterogeneity can be explained by differences in the implementation quality of the educational design factors. Our results indicate that when implementing flexible study programmes in a blended learning design, particular attention should be paid to the following educational design principles: adequate course structure and guidance for students, activating learning tasks, stimulating interaction and social presence of teachers, and timely feedback on learning process and outcomes.

Introduction

Considering the digitalisation of society, there is an increasing need to constantly develop one’s competencies in the sense of continuous lifelong learning (OECD, 2019 ). In this context, higher education should be adapted to the learners' diverse needs and specific live phases (Barnett, 2014 ; Martin & Godonoga, 2020 ) and accessible to broader sections of the population (Dziuban et al., 2018 ; Orr et al., 2020 ). The concept of flexible learning addresses these needs and tries to afford learners more flexibility and autonomy in shaping the learning process regarding when, where, and how they learn (Boer & Collis, 2005 ; Hrastinski, 2019 ; Lockee & Clark-Stallkamp, 2022 ; Smith & Hill, 2019 ; Vanslambrouck et al., 2018 ; Wade, 1994 ). From a pedagogical point of view, different dimensions of flexible learning can be distinguished. Li and Wong ( 2018 ) analysed previous publications and identified the following dimensions of flexible learning—time, content, entry requirement, delivery, instructional approach, performance assessment, resources and support, and orientation or goal. The frequently mentioned dimension of place (e.g. Chen, 2003 ) belongs in this concept to the delivery dimension. By designing the above dimensions according to learners' needs, the students should actually perceive learning as flexible. From a technical perspective, flexible learning has often been attempted through online technologies (Tucker & Morris, 2012 ). According to Allen et al. ( 2007 ) learning environments can be classified according to their proportion of online content delivery either as traditional with no online delivery content, as web-facilitated (with an online delivery proportion of between 1 and 29 per cent), blended learning (with an online delivery proportion of between 30 and 79 per cent) or online learning with more than 80 per cent of online delivery content. Accordingly, flexible learning is often associated and used in connection with blended or online learning (Anthony et al., 2020 ).

The COVID-19 pandemic, with its global shift to remote instruction, has accelerated the demand for flexible learning options in higher education (Lockee & Clark-Stallkamp, 2022 ; Pelletier et al., 2022 ). Current student evaluations have shown that the experienced learning flexibility during ‘emergency distance learning’ (Hodges et al., 2020 ) is appreciated (Gherheș et al., 2021 ; Shim & Lee, 2020 ) and students are demanding more flexible learning options in the aftermath of the pandemic as well (Clary et al., 2022 ; Lockee & Clark-Stallkamp, 2022 ). In response, higher education institutions are now considering replacing classroom time and offering more online and blended learning formats (Kim, 2020 ; Pelletier et al., 2021 ; Peters et al., 2020 ; Saichaie, 2020 ).

Despite the apparent popularity of blended learning, academics are often concerned about the effectiveness of blending for student learning (Huang et al., 2021 ), and educational institutions will only be able to offer and expand blended learning formats when they are confident that students will perform as they would in a conventional classroom setting (Owston & York, 2018 ). Meta-analyses (Bernard et al., 2014 ; Means et al., 2013 ; Müller & Mildenberger, 2021 ; Vo et al., 2017 ) point out that blended learning is not systematically more or less effective than conventional classroom learning. At the same time, they have pointed out that the number of controlled studies is still limited and that the studies have examined mostly single courses with a study period of one semester; there is a particular lack of controlled studies at a degree level (i.e., with many courses taught over a longer period). In addition, variance in the learning effectiveness of the courses found in the studies was large, with a shortage of studies on the implementation and design success factors of blended learning based on objective learning achievement rather than student and lecturer evaluation (Bernard et al., 2019 ; Graham, 2019 ; Means et al., 2014 ).

Research questions

This study addressed the above issues of learning effectiveness and modifying factors of blended learning at the study and course levels. The focus of the researched study programme was to give students more flexibility in the learning process, especially regarding time and place, by replacing classroom time with an online learning environment in a blended learning design (see details in the research context). Accordingly, the term ‘flexible learning’ is used in this paper as desired study characteristics at the programme level. The term ‘blended learning’ is used to describe the educational design of the courses under investigation in the experimental condition.

The two research questions (RQ) were:

What is the impact on student achievement (measured as exam results) of blended learning with classroom time reduced by half at the course level and study programme level in a flexible learning study programme compared with the conventional study format?

What are the modifying factors for the learning effectiveness of blended learning courses with reduced classroom time in a flexible learning study programme?

Literature review

Student achievement.

Several studies have explored the acceptance and effectiveness of blended or online environments with reduced classroom time in recent years. In a study by Asarta and Schmidt ( 2015 ), presence in classroom sessions in a traditional course was compared with an experimental setting where lectures were also made available online. In the two settings, the exams, learning materials, and number of planned classroom sessions were identical, but students could choose whether to attend classroom sessions in the blended learning version. Data analysis showed that students reduced their average attendance to between 49 and 63%. Asarta and Schmidt ( 2015 ) concluded that—in line with the student preferences—the classroom attendance rate in blended learning courses could be reduced by approximately one-half compared with conventional courses. This is one of few studies in which students had control over the blend ratio; usually, the instructor decides and takes responsibility for the proportion of instruction delivered in a blended learning format (Boelens et al., 2017 ).

Owston and York ( 2018 ) investigated the relationship between the proportion of online time spent in blended learning courses and student satisfaction and performance. The clustering was determined by the ratio of time spent on online activities replacing classroom sessions. The results showed that students in courses with high (50%) and medium (between 36 and 50%) online proportions rated their learning environments more positively and performed significantly better than their peers in blended learning courses with low (27–30%) or supplemental online segments. Consequently, Owston and York ( 2018 ) concluded that across a wide variety of subject areas and course levels, student perceptions and performance appeared to be higher when at least one-third to one-half of regular classroom time was replaced with online activities.

Hilliard and Stewart ( 2019 ) came to similar conclusions concerning satisfaction. They examined the student perceptions of the various aspects of the community of inquiry (COI) model, and their findings indicated that students in high blend (50% online) classes perceived higher levels of teaching, social, and cognitive presence than students in medium blend (33% online) classes.

In a recent review, Müller and Mildenberger ( 2021 ) examined the impact of replacing classroom time with an online learning environment. Their meta-analysis of blended learning ( k  = 21 effect sizes) applied strict inclusion criteria concerning research design, learning outcomes measurement, and blended learning implementation. In particular, it was a requirement that the attendance time in the blended learning format was reduced by 30–79% compared to the conventional learning environment, drawing on Allen et al. ( 2007 ). In this meta-analysis, the estimated effect size (Hedge’s g) was 0.0621, although not significantly different from zero. The confidence interval [lower 95th − 0.13, upper 95th 0.25] suggests that overall differences between blended and conventional classroom learning were small, and, at best, very small negative or moderate positive effects were plausible. This implies that despite a reduction in classroom time of between 30 and 79 per cent, equivalent learning outcomes were found. However—in line with authors of other blended learning reviews (Bernard et al., 2014 ; Means et al., 2013 ; Spanjers et al., 2015 ; Vo et al., 2017 )—it was pointed out that the number of controlled studies in the field of blended learning was still limited. More primary studies of the highest methodological quality must be conducted in various disciplines to validate the results further and investigate the effectiveness of blended learning in different disciplines and contexts. Additionally, the authors emphasised considerable heterogeneity in the effect sizes between the various studies. McKenna et al. ( 2020 ) also stated that simply offering a blended learning course is not enough to ensure success; research on blended learning design should, therefore, differentiate specific study contexts to derive practice guidelines from it.

Modifying design factors

To explain the considerable differences in the effect sizes of the primary studies, various potential moderators were analysed in the meta-analysis. Out of a total of 41 potential moderators investigated [ N  = 21 in Means et al. ( 2013 ); N  = 6 in Bernard et al. ( 2014 ); N  = 6 in Spanjers et al. ( 2015 ); N  = 2 in Vo et al. ( 2017 ); and N  = 6 in Müller and Mildenberger ( 2021 )], very few have turned out to be significant. In contrast to other meta-analyses, Vo et al. ( 2017 ) found a significantly higher mean effect size in STEM disciplines compared to that of non-STEM. From an educational design perspective, it is interesting to note that the use of quizzes (or regular tests with feedback for students) has a significant and positive influence on the effectiveness and attractiveness of blended learning (Spanjers et al., 2015 ).

Bernard et al. ( 2019 ) analysed the moderator analysis in more retrospective meta-analyses from 2000 to 2015. They concluded that student interaction, collaboration, and discussion emerged as a moderating influence in several studies. Additionally, practices, feedback, and incremental quizzes (i.e., formative evaluation) also appeared important in several studies. However, they also pointed out that there is a large amount of literature showing that these instructional elements were equally valuable in all educational settings.

The above explanations have shown that the past moderator analyses in meta-analyses could not explain the heterogeneity of the student achievements with the design factors in blended learning, other than confirming that quizzes could enhance effectiveness. Studies based on surveys of students and lecturers—which assess the subjectively perceived learning success and the design factors—can provide further indications for an effective educational design in a blended learning format.

Owston and York ( 2018 ) and Hilliard and Stewart ( 2019 ) emphasised in their student survey-based studies that regardless of the chosen online or face-to-face ratios, care must be taken when designing a learning environment to integrate interactive and cooperative activities between students as well as between students and instructors. Other studies based on student evaluations (Castaño-Muñoz et al., 2014 ; Cundell & Sheepy, 2018 ; McKenna et al., 2020 ) have also emphasised the importance of student interaction in blended learning. According to Cundell and Sheepy ( 2018 ), passive online activities such as videos and readings are not as effective as well-structured activities in which students collaborate with or learn from other students. Content delivery does not equate to a well-designed learning environment or, as Merrill ( 2018 , p. 2) put it, ‘information alone is not instruction’. Thus, students need adequate stimulation, especially in the online part of blended learning (Lai et al., 2016 ; Manwaring et al., 2017 ; Pilcher, 2017 ). Often mentioned is also a thoughtful balance between face-to-face and distance moments (Vanslambrouck et al., 2018 ). Different instructional strategies were proposed for a blended learning format (McKenna et al., 2020 ), but these have not been scientifically analysed (except for the flipped classroom, e.g., Müller and Mildenberger ( 2021 )).

In Cundell and Sheepy ( 2018 ), peer feedback was also found to be effective for learning; students benefit from analysing the work of others and providing feedback to each other. The importance of feedback in the learning process is well known (Hattie & Timperley, 2007 ) and has also been shown as a critical design factor in other blended learning studies (Garcia et al., 2014 ; Martin et al., 2018 ; Vo et al., 2020 ).

In addition, other studies also highlight the importance of the social presence of instructors (Goeman et al., 2020 ; Law et al., 2019 ; Lowenthal & Snelson, 2017 ) and the creation of an affective learning climate (Caskurlu et al., 2021 ; McKenna et al., 2020 ). These aspects should help reduce social isolation (Gillett-Swan, 2017 ) in the online part of blended learning. Further studies (Caskurlu et al., 2021 ; Ellis et al., 2016 ; Han & Ellis, 2019 ; Heilporn et al., 2021 ) have also identified course structure and guidance as important design factors in blended learning.

These last factors, in particular, depend strongly on the teacher’s commitment and understanding of their role. However, implementing a new blended learning format is challenging and time-consuming for instructors and may also provoke resistance (Bruggeman et al., 2021 ; Huang et al., 2021 ). Accordingly, plausible motives need to be presented as to why these changes are necessary, and incentives are required to engage lecturers (Andrade & Alden-Rivers, 2019 ).

Based on the individual studies, the syntheses and reviews (Boelens et al., 2017 ; McGee & Reis, 2012 ; Nortvig et al., 2018 ) come to similar conclusions regarding the key design factors in blended learning. Findings like these indicate which design factors are perceived by students and lecturers as conducive to learning. However, the limitation here is that these factors were surveyed based on subjectively perceived learning success rather than on objectively assessed learning achievement. One such study by Vo et al. ( 2020 ) investigated how design factors assessed by students were related to final grades. Of the eight design factors studied, only ‘clear goals and expectations’ and ‘collaborative learning’ were significant predictors of student performance as measured by final grades in different courses. However, the level of final grades measured in various courses may not only depend on performance or instructional design but be influenced by other factors such as the bell-curve tendency of grading (Brookhart et al., 2016 ), when the grade often represents a student's relative achievement within the whole group (Sadler, 2009 ). It is, therefore, questionable whether course grades alone can be used as an objective measure to compare the effectiveness of different courses. Accordingly, other factors investigated by Vo et al. ( 2020 ), such as instructor feedback, support and facilitation, and face-to-face/online content presentation, may positively affect the quality of the learning environment and student performance; however, they are not adequately captured by comparing grades across courses.

Although research has shown some general patterns across blended learning modalities, the root causes for the learning outcomes in blended learning environments are still not apparent. Graham ( 2019 ) suspected the above in the pedagogical practices of blended learning, requiring research to examine more closely what happens at the activity level in blended learning.

Methodology

Research context.

The Zurich University of Applied Sciences (ZHAW) launched a new flexible learning study programme in a blended learning format (FLEX) in 2015 as part of a comprehensive e-learning strategy (Müller et al., 2018 ). Its Bachelor’s degree programme in Business Administration is a successful, well-established course of study offered both full-time (FT) and part-time (PT). The FLEX format is, therefore, the third study format for this degree programme.

All Bachelor’s programmes have two levels — the ‘Assessment’ level (60 ECTS credits; two semesters for FT students, three semesters for PT and FLEX students) followed by the ‘Main Study’ level (120 ECTS credits; four semesters for FT students, five semesters for PT and FLEX students) with specialisations in Banking & Finance (B&F) and General Management (GM). For the PT and FLEX formats, a part-time job or family commitment of no more than 60%–70% is recommended. The concept for the new blended learning format was developed in 2014 and tested by running a Business Administration FLEX course. After the pilot course was evaluated and found to be effective (Müller et al., 2018 ), a total of 44 courses were transformed for the BSc in Business Administration degree programme (2015–2020). The first cohort of FLEX students graduated in 2019.

The main objective of the new blended learning format FLEX was to offer students the best possible opportunities to combine their work and personal responsibilities with a flexible learning study programme. Regarding the number and distribution of classroom sessions over the 14-week term, compatibility with a distant place of residence was the guiding principle. More specifically, the maximum number of overnight stays away from home that would be acceptable to potential students had to be determined. At the same time, regular physical classroom sessions were also considered essential to enable students to reflect on the online content. As a result of these considerations, face-to-face classes for FLEX were reduced by approximately half (51%) compared with the part-time programme and replaced with a virtual self-study phase. This means that FLEX students attended the campus every three weeks for two days and the interjacent asynchronous self-study phase should allow them to learn flexibly. According to the typology of Allen et al. ( 2007 ) and the inclusion criteria for the meta-analysis of Müller and Mildenberger ( 2021 ), the design can be classified as blended learning. Concerning the dimensions of flexible learning, according to Li and Wong ( 2018 ), the FLEX format offered greater flexibility in terms of time, delivery, instructional approach, resources, and support than the conventional study format; however, the format was the same as a traditional course regarding the dimensions content, entry requirement, orientation or goal, and performance assessment.

After the time structure for the new course of study had been determined, the transition to the blended learning design was carried out at the course level. Considering that the design aspects activation, interaction and formative performance assessment have been found in empirical studies to be important for asynchronous online environments, care was taken to ensure that content was not only delivered (using learning videos, learning texts, etc.), but that students elaborated and reflected on it in the virtual self-study phases. In so-called ‘scripting workshops’ (Müller et al., 2018 ), the content was sequenced, and the educational design was created from scratch (Alammary et al., 2014 ), according to a defined process using a specially developed didactic visualisation language (see also Molina et al., 2009 ). Web-based technologies such as LMS Moodle and other tools were used and the content was delivered in digital form, mainly using learning videos produced in-house. Interaction with the teachers during the three-week self-study phases was possible in asynchronous form using the Moodle tools such as forums and chat, but no scheduled online class sessions via video conferencing tools were provided. Table 1 shows key features of the course designs in terms of the number of activities for the design aspects activation, interaction, and formative performance assessment (feedback) in the self-study phases, per course. Since learning videos are an important element of an asynchronous online learning environment and have proven to be effective for learning in the pilot study (Müller et al., 2018 ) and a recent meta-analysis (Noetel et al., 2021 ), the number of learning videos per course was also assessed. The number of pedagogical design factors was collected in the LMS Moodle, and the results show the range of the design characteristics in the FLEX implementation for the levels ‘Assessment’ (semesters 1–3) and ‘Main Study’ (semesters 4–8), and overall (semesters 1–8).

Research design

The research design consisted of the cohorts of the experimental FLEX group (B&F cohorts 2015–2019 and GM cohorts 2017–2019, N students = 278) with students attending all courses in the new FLEX format and the corresponding cohorts of the control group PT ( N students = 1068). The FLEX format was implemented in a blended learning design with a reduced classroom teaching time, whereas the PT-learning format was implemented conventionally via classroom teaching. Students of the FLEX and PT cohorts were allocated to classes of 30–60 students each. The number of students ( N ) who started the corresponding study programme in the first semester changed over time because of voluntary dropouts, failed exams, transfers between specialisations, and repeaters.

The gender ratio was almost the same in the experimental FLEX cohort as in the control PT cohort (proportion of female FLEX students = 35%; proportion of female PT students = 36%); however, the average age was slightly higher for FLEX students (24.7 years) compared to PT students (22.2 years). Concerning personality traits, various tests were used to investigate whether students differed regarding teamwork affinity (Lauche et al., 1999 ), ICT literacy (Kömmetter, 2010 ), general mental ability (Heller & Perleth, 2000 , only cohorts 17), and the competencies of self-study and study organisation and learning-relevant emotions including motivation (Schmied & Hänze, 2016 , only cohorts 17). These tests all showed no significant difference between the experimental FLEX group and the PT control group. With the entrance qualification of the vocational baccalaureate, students of a university of applied sciences have similar prior knowledge. To check this assumption, prior knowledge was tested in a pre-test on the topic of business administration for cohort 17 (with specialisations B&F and GM). The questions corresponded to the questions on the topic of business administration in past examinations for the vocational baccalaureate. The results of the pre-test showed no significant differences in prior knowledge between students in the FLEX and PT format in either BF [ t (94) = 0.619, p  = 0.537] or the GM [ t (69) = 0.182, p  = 0.856] specialisation.

The student eligibility requirements, lecture content, exam questions, and grading scales were identical for all students in the experimental FLEX and the control PT conditions. FLEX students took precisely the same examinations and at the same time as students in the conventional PT programme and the exams were not marked by the class teacher but by an independent pool of lecturers, allowing for a comparison of the exam results with high empirical significance.

Analysis methods for student achievement

To assess the effectiveness of the blended learning FLEX format, the exam results of the FLEX students ( N  = 2822 exams) were compared with those of PT students ( N  = 11638) in 133 courses between 2015 and 2019 (nine semesters). The effect size (standardised mean difference, also known as Cohen’s d ) was calculated for each course (i.e., the deviation of the experimental group FLEX test results from the control group PT). A t -test for the difference between the two groups (at α  = 0.05, two-tailed) was performed for each course. Additionally, a test for equivalence with equivalence defined as being between ± 0.5 standard deviations was examined (see also Mueller et al., 2020 ).

To analyse the overall learning effectiveness of the FLEX study format, the results from each course were aggregated using regression analysis (roughly similar to a meta-analysis). A linear mixed-effects regression analysis was performed with the calculated effect sizes as the response, and potential moderator variables study level, specialisation, and discipline as factors (fixed effects). In addition, a random effect for the cohort was included to control for the dependency arising from the same students attending courses. Assessing the size and significance of the random cohort effect was also of interest. Since good estimates of the standard error of the calculated effect sizes can be calculated from the raw grades, a weighted regression was performed where each effect size was weighted by its inverse estimated variance. This corresponds to the usual weighing scheme in fixed-effect meta-analysis. Using the lme4 package for R (Bates et al., 2020 ), estimation was performed using restricted maximum likelihood.

Analysis methods for the modifying factors

An analysis of potential moderating variables that might explain the heterogeneity of the effect sizes was conducted, investigating study level, specialisation of the study programme, disciplines (e.g., quantitative subjects, foreign language, social sciences, or management), and cohorts. As a first step, correlations between various contextual variables (student and lecturer perceptions, educational design characteristics) and the effect sizes of the courses were analysed, and then the critical factors were related to effect size using a multiple linear regression model.

Student perceptions of the new learning design and learning process were analysed through a student course evaluation. At the end of each course, the FLEX group completed a questionnaire consisting of nine items of different instruments—structure, guidance and motivation, coherence (SCEQ), usability (own item), support and learning outcome (HILVE, Rindermann & Amelang, 1994 ), interest/enjoyment (Intrinsic Motivation Inventory, Ryan, 1982 ), and two open-ended questions (‘What do you like about the way the course is designed?’, ‘What do you like less?’). Additionally, student attendance in on-campus classes was determined. The surveys took place after the classes had been completed but before the examination period.

Lecturers also rated the implementation conditions with a specially developed 20-item instrument according to the change dimensions in Knoster et al. ( 2000 ). This survey took place at the end of the semester when a course was first implemented. Only courses whose instructors were involved in both the development and the implementation of the courses were included in the correlation. Because instructors for individual courses changed in some cases during the test period, a smaller number of courses was analysed than the total number of courses (see Table 6 ).

The qualitative analysis aimed to discover which factors (especially educational design characteristics) were crucial for the success of a FLEX course. For this purpose, the courses were divided into groups according to their effect size and student evaluation ratings. For the student evaluation criteria (scale 1–5), the courses were divided into three clusters (terciles) with high, medium, and low student ratings. ‘Good practice’ courses were defined as courses with a positive effect size and a high student rating (first tercile). ‘Bad practice’ courses were defined as courses with negative effect size and low student ratings (third tercile). For the qualitative analysis, from a total of 133 FLEX courses, 27 ‘good practice’ courses with a total of 493 student comments (to the question ‘What do you like about the way the course is designed?’), and 30 ‘bad practice’ courses with a total of 429 student comments (to the open-ended questions ‘What do you like less?’ and ‘Do you have ideas on how the course could be developed further?’) were included. These data were imported into MAXQDA, and each student comment was labelled with the study specialisation, semester, student number, course name, and good/bad-practice course designation (e.g., ‘SBF15_HS15_8BWL_good’).

An initial version of a category system was created, which was theory-driven and based on the principles for designing the FLEX courses. The following five categories were defined—educational design (with subcodes: content sequencing, guidance, blend online/classroom-learning), activation (with subcodes: tasks/exercises, cases, solutions), learning resources (with subcodes: textbooks, learning videos), interaction (with subcodes: with peers, with instructor), and performance assessment.

The entire dataset was coded independently by two coders. Because the category system we developed was being applied for the first time, intercoder agreement checks were started after only a few codings in two iterations to identify weaknesses (Kuckartz & Rädiker, 2019 ). An initial review was based on 10 ‘good practice’ and 10 ‘bad practice’ comments randomly selected from the dataset. A second review took place based on another 15 ‘good practice’ and 15 ‘bad practice’ comments, which were deliberately drawn according to the criterion of completing the theory-based coding guide. In both iteration cycles, the coding was checked for mismatches. The segments where non-matches occurred formed the starting point for a systematic discussion between the two coders about the disagreement, which resulted in an adaptation of the category system and the coding guide (Kuckartz & Rädiker, 2019 ). Comments that belonged to two subcategories were assigned to the main category.

Next, the two coders independently coded the entire data set. The intercoder agreement was checked at the segment level with a setting of 90% overlap, which resulted in a kappa value of 0.57. One of the coders analysed the mismatched segments and standardised them with reference to the coding guide. The coded segments were then analysed. Initially, a frequency analysis (descriptive counting of code frequency) was conducted by counting the individual codes using MAXQDA. Then, the most important aspects of the respective categories were summarised and provided with appropriate quotations.

Student achievement at the course level

The FLEX and PT samples were independent, and the sample size and histograms of the test results did not indicate a violation of the requirements of normal distribution and uniformity of variance. The effect size of the students’ exam results (Cohen’s d ) was calculated by comparing the FLEX courses with the respective PT courses. The direction was indicated by the sign of the effect size (Cohen’s d ); for example, in 61 of the 133 courses examined, the mean values of the FLEX cohort were higher than those of the PT, corresponding to positive values for the effect size (see Table 2 ).

The courses were categorised into four subject groups—quantitative subjects (statistics, mathematics, quantitative methods), foreign language (English), social sciences (law, skills, communication, leadership & ethics), and management (e.g., strategy, accounting, marketing). The distribution of the effect sizes according to the study level, course of study (BF or GM), and subject domain is shown in Fig.  1 .

figure 1

Standardised mean differences (effect sizes) of analysed courses ( N  = 133)

The results for the 133 courses in the ‘Assessment’ and the ‘Main Study’ levels showed that there is little difference in the exam scores of students in the FLEX format compared with the PT format (see Table 2 ). A t -test ( α  = 0.05, two-tailed) indicated a significant difference in only 24 of the 133 courses; FLEX students showed significantly higher exam scores in 10 courses and PT students in 14 courses. To compare FLEX and PT learning performance, it is important to consider that comparative studies usually aim to demonstrate significant change. More precisely, the goal is to reject the H 0 hypothesis (no differences between groups) and confirm the H 1 hypothesis (difference between groups exists at a particular significance level). The experimental group (in our case, the FLEX cohort) would, therefore, be expected to perform significantly different from the control group (PT cohort). However, in the research context, this was not a priority. Due to the changed conditions caused by the reduction of classroom time by more than 50 per cent, the goal was instead to ensure that students achieved equivalent exam results with the self-study assignments in the blended learning format compared with the control group, despite the reduction in classroom time. Where the aim is to prove that there are no differences between the results of the two groups, an equivalence test is used. We regard standardised mean differences as equivalent if they are smaller than 0.5 in absolute value, and a statistical equivalence was found in 36 courses. In 73 courses, the difference was inconclusive (no statement possible about statistical difference or equivalence).

Student achievement at the programme level

The estimated coefficients of the linear mixed-effects regression analysis can be found in Appendix , Table 6 . The estimated summary effect size d is close to and not significantly different from zero (see also Table 3 ). The confidence interval [− 0.206, 0.094] suggests that overall differences between the blended learning format FLEX and the conventional classroom format PT are small and, at best, moderately negative or very small positive effects are plausible. This means that equivalent learning outcomes were found despite a reduction in classroom time for FLEX compared with PT students of over 50 per cent.

Modifying factors

Moderator analysis.

In Table 4 , similar to the moderator analysis in a meta-analysis, the results are presented as group means with corresponding standard errors and 95% confidence intervals. These are not averages of the raw data per group, but calculated from the regression results using the emmeans package for R (Lenth, 2021 ); for each moderator variable, the other factors were held constant at the proportion in the data set. The overall effect was similarly obtained from the regression estimate, not from averaging the original effect sizes. The significance of the effects of potential moderators was assessed using the Likelihood Ratio Test as implemented in lme4 for R (Bates et al., 2020 ), with none of the variables having a significant effect.

The significance of the random cohort effect was tested by comparing the full model to a classical linear model including all variables except the cohort effect, again using the Likelihood Ratio Test; this was not significant either ( LR  = 2.098, df  = 1, p  = 0.1475). Moreover, the estimated standard deviation for the cohort effect is 0.1186, which is only roughly one-third of the estimated residual standard deviation of 0.3502.

Correlation and regression analysis of contextual variables

Although the implementation context of the courses (conceptualisation of blended learning, measurement of learning outcomes, and implementation period of one semester) was quite similar, the effect sizes showed a considerable variance between the courses (see Fig.  1 ). A correlation analysis was therefore conducted to examine to what extent the student evaluation of the course quality (including attendance rate), the quantitative educational design characteristics, or the survey on the implementation conditions among the lecturers showed a correlation with the effect sizes.

The results of the correlation analysis (Pearson, 2-tailed) indicate the strongest correlation between student course evaluations and effect sizes (see Appendix , Table 7 ). All items show a significant correlation between student evaluation of course quality and the effect size (e.g., item ‘I like the course’ r  = 0.289, p  = 0.001). The course quality assessed by the students, thus, has a significant correlation with the learning effectiveness measured as standardized mean differences between blended and conventional courses. This is remarkable because the course evaluation took place at the time when classes had been completed but before the examination period.

There is also a significant correlation with the reported attendance of the classes; courses whose classroom sessions were attended more frequently show a higher effect size. In contrast, the number of different learning resources and activities in the courses—such as the number of tasks, forum posts, formative quizzes, or learning videos—has no significant correlation with the effect size of the courses.

The correlation between the implementation conditions and the effect size of the courses shows a differentiated picture. For example, the dimensions ‘incentives’ and ‘resources’ do not show a significant correlation with the effect size; however, a significant correlation is reported for the ‘competences’, ‘vision’, ‘action plan’, and ‘satisfaction’ (e.g., item ‘I am satisfied with the introduction of FLEX at the ZHAW’ r  = 0.303, p  = 0.013).

A multiple linear regression model was used to evaluate the contribution the data collected from students and lecturers make to the standardised mean difference. Because of substantial correlations between the evaluation variables (‘student course evaluation’ and ‘implementation survey instructors’), the items covering different aspects were averaged to form one aggregated variable for the student evaluation (i.e., ‘student evaluation’) and six aggregated variables for aspects of the instructor evaluation (‘incentives’, ‘resources’, ‘skills’, ‘vision’, ‘action plan’, and ‘satisfaction with the implementation’). To avoid collinearity issues, a stepwise forward procedure was used. Starting from an intercept-only model, all models adding one of the variables were fitted, but only ‘student evaluation’ ( F  = 11.2449, df  = 1, p  = 0.0014) and ‘action plan’ were significant ( F  = 7.2867, df  = 1, p  = 0090). Starting from a model containing only an intercept and ‘student evaluation’, adding ‘action plan’ did not significantly improve the fit ( F  = 2.3329, df  = 1, p  = 0.1320), but adding ‘student evaluation’ to a model that only included ‘action plan’ does ( F  = 5.9408, df  = 1, 0.0178). In a model including both variables, ‘student evaluation’ is significant ( t  = 2.437, df = 1, p  = 0.0178) while ‘action plan’ is not ( t  = 1.527, df  = 1, p  = 0.1320). The optimal model was obtained by the forward selection, containing only an intercept and ‘student evaluation’, although the adjusted R-squared value is not high (0.1438). For this reason, the results are not reported here in detail.

Qualitative analysis of educational design quality

The frequency of coded student comments on educational design quality is reported in Table 5 . The student comments contained a vast number of mentions related to educational design in both the ‘good practice’ and the ‘bad practice’ courses (60.4% and 50.0% of all mentions). Within this category, it is also noticeable that many comments referred to the blending of online and classroom components (20.9% and 28.6%). Furthermore, many comments addressed the guidance provided (10.6% and 9.3%). There were a similar number of mentions in the learning videos subcategory (9.9% and 9.3%). Noticeably fewer mentions were related to the textbook/other texts (6.2% and 10.5%), assignments (6.2% and 6.5%), and performance assessment (6.2% and 6.0%). In the case of the ‘bad practice’ courses, the subcategory solutions also stand out (8.9%). There were a very low number of mentions related to interaction with peers (0.4% and 1.2%).

Student comments indicate that an adequate structure and guidance are essential for the quality of the FLEX blended learning courses. The structure is described as the clear distinctness of topics and their logical sequencing as follows: ‘ The exact structuring of the topics’ (SBF15_HS15_8BWL_good) and ‘ better delimitation and structuring of individual topics’ (SGM17_FS18_1FAC_bad). As guidance, the focus concerning exam relevance in the classroom course is mentioned as ‘The content is clearly linked to the exams, and it is clear what is expected’ (SBF17_HS17_9MAR_good). This aspect also includes the desire for mock exams or the availability of exams from previous years. In addition, guidance is described as a review and outlook by the lecturers and the indication of the learning progress in the learning management system.

The subcategory ‘blending’ contains the appropriate combination of the online and classroom phase(s) (and vice-versa). This link can be achieved by taking up and deepening certain content from the online phase in the classroom or by linking to it and continuing with it. A diverging picture emerges concerning the design of the classroom phase. While some students would have liked to repeat the content from the online phase and set a focus, others would have preferred to consolidate and deepen the content from the online phase through exercises and discussions. The following statements well illustrate this divide: ‘I did not like the fact that some students came to the lectures unprepared and asked basic questions. In this way, the other students did not benefit. […] I talked to many students, and many of them had done very little preparation before the lecture and then asked many questions in the lecture. That really doesn’t work, in my opinion’. (SBF17_HS17_19MAT_bad); ‘More complex topics are treated in the classroom phase’. (SGM17_HS17_1MAR_good); ‘Teaching could be more efficient. It cannot be assumed that all FLEX students have solved everything that is on Moodle [tasks on the Learning Management System]. A misconception’. (SBF15_FS17_7MAC_bad); and ‘Repetition of the material learned in the online phase’. (SGM17_HS17_14MAR_good).

The following student statements also raise the question of optimal allocation of scarce classroom time: ‘The lecturer asks few questions and delivers many monologues. For that, I could actually watch a video instead’. (SGM19_HS19_10WIR_bad) and ‘ The way the classroom sessions are structured is good. At the beginning, a short repetition of the theory and then working on tasks. This helps us to repeat and apply all the material learned’. (SGM18_HS19_3MIK_good).

In the category ‘content delivery’, the compactness of the learning resources and their alignment with the online weeks was mentioned. The linking of instructional texts, PowerPoint slides, and learning videos was brought up in the context of learning resources. In the case of instructional texts, students mentioned their comprehensibility and, in the case of learning videos, their existence, quality, and adequate length: ‘Good structure with linking of book, slides, and videos’. (SBF19_HS19_19MAR_good).

In the ‘activation’ category, the number and variety of exercises and their consistency with the theory learned were mentioned. In addition, the existence of solutions to tasks and exercises was cited as crucial for the online phase in three respects—the solutions must be complete (i.e., solutions to all tasks), sufficiently detailed (i.e., with solution path included), and readily available (i.e., at the time when students solve the tasks); ‘Not having a complete solution script inhibits the learning process very much if I always have to ask for the solution in the forum every time I have [already] finished an assignment. When then the answer finally comes, I am already somewhere else again—very counterproductive’! (SBF16_HS17_14MIK_bad).

In the ‘interaction’ category, the opportunity to ask questions and get a quick answer from the lecturers was frequently mentioned for both the classroom and the online phases. A well-maintained forum (opportunity to place questions in the LMS system) was also mentioned for the online phase. Although there were few comments about peer interaction, it was noticeable that group work was seriously questioned: ‘In general, the obligation to participate in group performance assessments is paradoxical and pointless in the context of the goals of this part-time FLEX course’. (SBF15_FS17_19EBF_bad).

In the ‘performance assessment’ category, formative tests with automatic and immediate feedback were mentioned: ‘I also like the small exams for self-testing because you can check what you have understood’. (SGM18_HS18_1WIR_good).

Discussion and conclusions

Results from the first research question demonstrated that the estimated effect size for a flexible learning study programme in a blended learning design with a 51% reduced on-site classroom time was close to and not significantly different from zero. This result is in line with previous studies (e.g., Müller & Mildenberger, 2021 ), suggesting that a blended learning format with reduced classroom time is not systematically more or less effective than a conventional study format. This study also indirectly confirmed the recommendations of various authors (Hilliard & Stewart, 2019 ; Owston & York, 2018 ) to divide the online and face-to-face portions of blended learning in half. Similar to the results of other studies and reviews on blended learning (Bernard et al., 2014 ; Means et al., 2013 ; Müller & Mildenberger, 2021 ; Spanjers et al., 2015 ; Vo et al., 2017 ), the effect sizes of the courses were broadly scattered around zero, with almost one standard deviation in the minus to over one standard deviation in the plus.

Findings from the second research question addressed the modifying factors for the learning effectiveness of blended learning courses with reduced classroom time. The analysed moderators of ‘study level’, ‘specialisation’, and ‘disciplines’ can be classified as moderating effects of condition (Means et al., 2013 ). The non-significant results for the study level are in line with the findings of systematic reviews by Bernard et al. ( 2014 ) and Means et al. ( 2013 ), who found no moderation effects on the course level (undergraduate vs graduate course). The non-significant result of the moderator ‘discipline’ corroborates the systematic reviews of Müller and Mildenberger ( 2021 ) and Bernard et al. ( 2014 ). However, it is not in line with Vo et al. ( 2017 ), who found a significantly higher effect size for STEM disciplines. Different definitions of these disciplines may explain the differences in these findings.

Based on the results of this study and the systematic reviews conducted in the past, it can be concluded that the heterogeneity of the results is not likely to be attributable to conditional factors such as the study level or discipline. However, significant correlations were reported between the effect sizes of the courses and the educational quality and design evaluated by students, the implementation conditions evaluated by lecturers, and on-site class attendance. There is collinearity between these aspects, and it can be assumed that there is a causal relationship in the sense that on-site attendance is influenced by the educational design and the quality of the course. Furthermore, the latter, in turn, is impacted by the attitude and motivation of the lecturers towards the FLEX programme. However, apart from the educational quality as evaluated by the students, significant direct and indirect effects could not be established with the fitted multiple linear regression model.

The importance of the educational design for the effectiveness of blended learning was supported by the significant moderator analyses of Spanjers et al. ( 2015 ) regarding the use of quizzes. In contrast, no correlation was shown between the number of online learning resources and activities in the courses, such as the number of assignments, forum posts, formative quizzes, or learning videos, on the one hand, and the effect sizes, on the other. This indicates that educational quality goes beyond the mere number of activities or particular learning resources and that an appropriate educational design is decisive (Graham, 2019 ; Nortvig et al., 2018 ).

The qualitative design analysis of the courses with high vs low learning effectiveness identified several crucial design factors for learning-effective blended courses. Regarding educational design, an adequate course structure and guidance for students are recognised as essential. In the context of an undergraduate programme, this means, in particular, that the learning environment has a clear structure, and that sufficient guidance is provided. This factor is significant in blended learning because the combination of online and face-to-face teaching and the partial distance between teachers and students increase the complexity of the learning environment. In this regard, a thoughtful alignment of the online and on-site learning phases was also mentioned; however, the feedback was contradictory concerning the instructional strategy (McKenna et al., 2020 ). While some students prefer to consolidate and deepen the content from the online phase through exercises and discussions, others simply prefer to repeat it. Such feedback must be seen in the context of the flexible learning study programme FLEX, which offers students opportunities to combine their work and personal responsibilities with study and, therefore, possibly attracts students who place a high priority on pedagogic efficiency. The delicate balance between work, private life, and education is, therefore, more keenly felt by these students and could result in insufficient time to complete all the online tasks. Consequently, guidance also means that instructors should explain how the online and on-site phases are integrated and help their students understand that the online environment is an essential part of the blended learning experience (see also Ellis et al., 2016 ; Han & Ellis, 2019 ).

Regarding content delivery, good practice is characterised by learning resources that are well linked and aligned with other elements, such as the tasks in the learning environments. In line with the pilot study (Müller et al., 2018 ) and a recent systematic review (Noetel et al., 2021 ), learning videos are appreciated by students and considered to have many educational benefits.

The relevance of activation was also pointed out in the qualitative analysis. These learning activities enable students to transform the information they have acquired into knowledge and skills and facilitate their ability to apply learned knowledge and skills in new and real-life situations. In addition to previous studies (Cundell & Sheepy, 2018 ; Lai et al., 2016 ; Manwaring et al., 2017 ; Pilcher, 2017 ), the instant availability of complete and detailed solutions when students learn with tasks and exercises is essential for the learning process and its effectiveness.

Regarding the aspects of interaction and assessment, the results corroborate previous studies as the good practice is associated with the social presence of instructors and their prompt feedback (Goeman et al., 2020 ; Law et al., 2019 ; Lowenthal & Snelson, 2017 ) and the availability of formative tests with immediate, often automatic feedback (Garcia et al., 2014 ; Martin et al., 2018 ). At the same time, the interaction between students is controversial, and group work is questioned. This may result from the previously discussed need for efficiency in a flexible learning study programme. However, other studies (Gillett-Swan, 2017 ; Vanslambrouck et al., 2018 ; Vo et al., 2020 ) have also pointed out that a blended learning design may also be associated with specific costs, such as the practical issue of organising group work.

Theoretical and practical implications

The presented work in this study has theoretical and practical contributions and implications. Theoretically, this study expanded the database regarding the learning effectiveness of blended learning with reduced attendance time in several ways and provides important findings. First, past studies on blended learning with reduced classroom time were, with a few exceptions (e.g. Chingos et al., 2017 ), designed as single studies with a limited duration of usually one semester (Müller & Mildenberger, 2021 ). In contrast, this study extended these findings at the study programme level encompassing many courses (133 courses) in different disciplines over more than four years (nine semesters). Additionally, it was not designed as a model project with privileged conditions such as selected lecturers and additional resources but introduced using existing equipment and regular teaching staff. Accordingly, a high ecological validity can be assumed.

Similar to the meta-analyses on blended learning (Bernard et al., 2014 ; Means et al., 2013 ; Müller & Mildenberger, 2021 ; Vo et al., 2017 ), the observed variance in the learning effectiveness of the individual courses was large. The findings of this study demonstrated that the heterogeneity of the effect sizes could be explained by differences in the implementation quality of the educational design factors. This study is the first we are aware of that investigated design factors based on the relative effect sizes of individual courses and not only on student and lecturer evaluation.

The results of this study provide institutions and administrators with practical guidance for their flexible learning initiatives, especially concerning learning effectiveness and the related design principles of a flexible learning programme in a blended learning format. Based on our findings, we recommend paying particular attention to the following educational design principles when implementing blended learning courses:

Adequate course structure and guidance for students.

Activating learning tasks.

Stimulating interaction and social presence of teachers.

Timely feedback on the learning process and outcomes.

Instructors are responsible for designing and implementing these factors, and this study showed that the quality of the educational design was significantly related to lecturer attitudes towards blended learning with reduced on-site classroom time. Accordingly, when introducing blended learning to an educational institution, it is vital not only to provide the necessary infrastructure and resources and develop the skills needed to teach a blended learning format but also to provide lecturers with incentives for engagement. At the same time, a shared vision of a flexible learning environment in a blended learning design should be developed to initiate and establish a new learning culture.

Finally, the student evaluation of the course quality has a significant correlation with the relative effect sizes of the individual courses. Thus, students seem to have a good sense of what blended learning conditions they require to succeed. Accordingly, we recommend educational institutions actively involve students in developing blended learning designs, even to the extent of forming pedagogical partnerships (Cook-Sather et al., 2019 ).

Limitations and future directions

The design of this study was strictly controlled for a field study in an educational area. Due to identical learning objectives and exams, the framework conditions of the two study formats were comparable, the presence of a control group ensured a quasi-experimental design, and selection bias was controlled. Additionally, as this study was not carried out in a model project with unique resources, support, and incentives, a high ecological validity can be assumed in an authentic university setting with regular lecturers. Nevertheless, the study is subject to the inherent limitations of a real-life setting.

Concerning the data set, because the university had to switch from a mainly on-site format to exclusively hybrid and online formats during the COVID-19 pandemic, cohorts could be surveyed at different study levels, and only one complete cohort could be observed, uninterrupted, from entry to graduation. Accordingly, relatively few courses from the upper semesters of the ‘Main Study’ level were included compared to ‘Assessment’ level courses.

Another limitation of this study is that the flexible learning study programme in a blended learning design we analysed appeals necessarily to a particular student population, namely those with limited time and/or a greater need for spatial flexibility, often because of a demanding job or family commitments. As a result, although the FLEX and PT groups were similar in terms of the control variables and the pre-test, bias due to self-selection could not be ruled out. It should, therefore, be acknowledged that the results concerning the blended learning format are of limited generalizability beyond a context of a flexible learning study programme. It was also shown that the needs of students regarding flexible learning programmes can be highly specific. Therefore, in the future, it would be essential to differentiate research on the design of blended learning depending on the particular study context.

Furthermore, this study identified design factors for blended learning courses based on the relative effect sizes of individual courses. Future studies should verify and differentiate the results of this study to arrive at validated practice guidelines.

Conclusions

This work contributes to the growing literature on the implementation of flexible learning study programs in a blended learning design. Overall, this study found equivalent overall learning effectiveness in a blended learning format with reduced classroom time by 51% compared with the conventional study format. The study provides evidence that making education more flexible by offering blended learning with reduced classroom time can improve access to education without compromising learning effectiveness. Additionally, the learning effectiveness of the individual courses was found to be moderated by the implementation quality of the educational design factors. Specifically, an adequate course structure and guidance for students, activating learning tasks, stimulating interaction and social presence of teachers, as well as timely feedback on the learning process and outcomes, were identified as crucial design principles for learning-effective blended learning courses.

The results encourage higher education institutions to offer flexible study programmes in a blended learning format with reduced classroom time but also underscore the importance of the educational design quality.

Availability of data and materials

The datasets are available from the corresponding author on reasonable request.

Abbreviations

Banking & Finance

Coronavirus disease 2019

Flexible learning study programme

Full-time study programme

General management

Learning management system

Part-time study programme

Science, Technology, Engineering, and Mathematics (subjects)

Alammary, A., Sheard, J., & Carbone, A. (2014). Blended learning in higher education: Three different design approaches. Australasian Journal of Educational Technology, 30 (4).  https://doi.org/10.14742/ajet.693 .

Article   Google Scholar  

Allen, I. E., Seaman, J., & Garrett, R. (2007). Blending in: The extent and promise of blended education in the United States . Sloan Consortium.

Andrade, M. S., & Alden-Rivers, B. (2019). Developing a framework for sustainable growth of flexible learning opportunities. Higher Education Pedagogies, 4 (1), 1–16. https://doi.org/10.1080/23752696.2018.1564879

Anthony, B., Kamaludin, A., Romli, A., Raffei, A. F., Phon, D. N., Abdullah, A., & Ming, G. L. (2020). Blended learning adoption and implementation in higher education: A theoretical and systematic review. Technology, Knowledge and Learning . https://doi.org/10.1007/s10758-020-09477-z .

Asarta, C. J., & Schmidt, J. R. (2015). The choice of reduced seat time in a blended course. The Internet and Higher Education, 27 , 24–31. https://doi.org/10.1016/j.iheduc.2015.04.006

Barnett, R. (2014). Conditions of flexibility: Securing a more responsive higher education system . Higher Education Academy. https://www.heacademy.ac.uk/resource/conditions-flexibility-securing-more-responsive-higher-education-system

Bates, D., Maechler, M., Bolker, B., & Walker, S. (2020). lme4: Linear Mixed-Effects Models using ‘Eigen’ and S4. R package Version 1.1–26 . In https://CRAN.R-project.org/package=lme4

Bernard, R. M., Borokhovski, E., Schmid, R. F., Tamim, R. M., & Abrami, P. C. (2014). A meta-analysis of blended learning and technology use in higher education: From the general to the applied. Journal of Computing in Higher Education, 26 (1), 87–122. https://doi.org/10.1007/s12528-013-9077-3

Bernard, R. M., Borokhovski, E., & Tamim, R. M. (2019). The state of research on distance, online, and blended Learning: Meta-analyses and qualitative systematic reviews. In M. G. Moore & W. C. Diehl (Eds.), Handbook of Distance Education (4th ed., pp. 92–104). Routledge.

Google Scholar  

Boelens, R., De Wever, B., & Voet, M. (2017). Four key challenges to the design of blended learning: A systematic literature review. Educational Research Review, 22 (Supplement C), 1–18. https://doi.org/10.1016/j.edurev.2017.06.001

Boer, W. D., & Collis, B. (2005). Becoming more systematic about flexible learning: Beyond time and distance. ALT-J, 13 (1), 33–48. https://doi.org/10.1080/0968776042000339781

Brookhart, S. M., Guskey, T. R., Bowers, A. J., McMillan, J. H., Smith, J. K., Smith, L. F., Stevens, M. T., & Welsh, M. E. (2016). A Century of Grading Research: Meaning and Value in the Most Common Educational Measure. Review of Educational Research, 86 (4), 803–848. https://doi.org/10.3102/0034654316672069 .

Bruggeman, B., Tondeur, J., Struyven, K., Pynoo, B., Garone, A., & Vanslambrouck, S. (2021). Experts speaking: Crucial teacher attributes for implementing blended learning in higher education. The Internet and Higher Education, 48 , 100772. https://doi.org/10.1016/j.iheduc.2020.100772

Caskurlu, S., Richardson, J. C., Maeda, Y., & Kozan, K. (2021). The qualitative evidence behind the factors impacting online learning experiences as informed by the community of inquiry framework: A thematic synthesis. Computers & Education, 165 , 104111. https://doi.org/10.1016/j.compedu.2020.104111

Castaño-Muñoz, J., Duart, J. M., & Sancho-Vinuesa, T. (2014). The Internet in face-to-face higher education: Can interactive learning improve academic achievement? British Journal of Educational Technology, 45 (1), 149–159. https://doi.org/10.1111/bjet.12007

Chen, D.-T. (2003). Uncovering the provisos behind flexible learning. Educational Technology & Society, 6 (2), 25–30.

Chingos, M. M., Griffiths, R. J., Mulhern, C., & Spies, R. R. (2017). Interactive online learning on campus: Comparing students’ outcomes in hybrid and traditional courses in the university system of Maryland. The Journal of Higher Education, 88 (2), 210–233. https://doi.org/10.1080/00221546.2016.1244409

Clary, G., Dick, G., Akbulut, A. Y., & Van Slyke, C. (2022). The after times: college students’ desire to continue with distance learning post pandemic. Communications of the Association for Information Systems, 50, 52–85. https://doi.org/10.17705/1CAIS.05003 .

Cook-Sather, A., Bahti, M., & Ntem, A. (2019). Pedagogical Partnerships . Elon University Center for Engaged Learning. https://doi.org/10.36284/celelon.oa1

Cundell, A., & Sheepy, E. (2018). Student perceptions of the most effective and engaging online learning activities in a blended graduate seminar. Online Learning, 22 (3), 87–102. https://doi.org/10.24059/olj.v22i3.1467

Dziuban, C., Graham, C. R., Moskal, P. D., Norberg, A., & Sicilia, N. (2018). Blended learning: The new normal and emerging technologies. International Journal of Educational Technology in Higher Education, 15 (3), 1–16. https://doi.org/10.1186/s41239-017-0087-5

Ellis, R. A., Pardo, A., & Han, F. (2016). Quality in blended learning environments—Significant differences in how students approach learning collaborations. Computers & Education, 102 , 90–102. https://doi.org/10.1016/j.compedu.2016.07.006

Garcia, A., Abrego, J., & Calvillo, M. M. (2014). A study of hybrid instructional delivery for graduate students in an educational leadership course. International Journal of E-Learning & Distance Education, 29 (1), 1–15.

Gherheș, V., Stoian, C. E., Fărcașiu, M. A., & Stanici, M. (2021). E-Learning vs. face-to-face learning: Analyzing students’ preferences and behaviors. Sustainability, 13 (8), 4381. https://www.mdpi.com/2071-1050/13/8/4381 .

Gillett-Swan, J. (2017). The challenges of online learning: Supporting and engaging the isolated learner. Journal of Learning Design, 10 (1), 20–30. https://doi.org/10.5204/jld.v9i3.293

Goeman, K., De Grez, L., van den Muijsenberg, E., & Elen, J. (2020). Investigating the enactment of social presence in blended adult education. Educational Research, 62 (3), 340–356. https://doi.org/10.1080/00131881.2020.1796517

Graham, C. R. (2019). Current research in blended learning. In M. G. Moore & W. C. Diehl (Eds.), Handbook of distance education (4th ed., pp. 173–188). Routledge.

Han, F., & Ellis, R. A. (2019). Identifying consistent patterns of quality learning discussions in blended learning. The Internet and Higher Education, 40 , 12–19. https://doi.org/10.1016/j.iheduc.2018.09.002

Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77 (1), 81–112. https://doi.org/10.3102/003465430298487

Heilporn, G., Lakhal, S., & Bélisle, M. (2021). An examination of teachers’ strategies to foster student engagement in blended learning in higher education. International Journal of Educational Technology in Higher Education, 18 (1), 25. https://doi.org/10.1186/s41239-021-00260-3

Heller, K. A., & Perleth, C. (2000). Kognitiver Fähigkeitstest für 4. bis 12. Klassen, Revision: KFT 4–12+ R . Beltz.

Hilliard, L. P., & Stewart, M. K. (2019). Time well spent: Creating a community of inquiry in blended first-year writing courses. The Internet and Higher Education, 41 , 11–24. https://doi.org/10.1016/j.iheduc.2018.11.002 .

Hodges, C. B., Moore, S., Lockee, B. B., Trust, T., & Bond, M. A. (2020). The difference between emergency remote teaching and online learning. EDUCAUSE Review . https://er.educause.edu/articles/2020/3/the-difference-between-emergency-remote-teaching-and-online-learning .

Hrastinski, S. (2019). What do we mean by blended learning? TechTrends . https://doi.org/10.1007/s11528-019-00375-5

Huang, J., Matthews, K. E., & Lodge, J. M. (2021). ‘The university doesn’t care about the impact it is having on us’: Academic experiences of the institutionalisation of blended learning. Higher Education Research & Development, 41 (5), 1557–1571.  https://doi.org/10.1080/07294360.2021.1915965 .

Kim, J. (2020). Teaching and learning after COVID-19. Inside Higher Ed , 1 . https://www.insidehighered.com/digital-learning/blogs/learning-innovation/teaching-and-learning-after-covid-19

Knoster, T. P., Villa, R. A., & Thousand, J. (2000). A framework for thinking about systems change. In R. A. Villa & J. Thousand (Eds.), Restructuring for caring and effective education: Piecing the puzzle together (pp. 93–128). Paul H. Brookes Publishing.

Kömmetter, S. (2010). Strukturelle Äquivalenz von Skalen zur Messung von studienrelevanten Kompetenzen und Einstellungen [Doctoral dissertation Vienna University]. Wien. http://othes.univie.ac.at/10028/1/2010-05-17_0202045.pdf

Kuckartz, U., & Rädiker, S. (2019). Analyzing qualitative data with MAXQDA. Springer Nature . https://doi.org/10.1007/978-3-030-15671-8

Lai, M., Lam, K. M., & Lim, C. P. (2016). Design principles for the blend in blended learning: A collective case study. Teaching in Higher Education, 21 (6), 716–729. https://doi.org/10.1080/13562517.2016.1183611

Lauche, K., Verbeck, A., & Weber, W. (1999). Multifunktionale Teams in der Produkt-und Prozessentwicklung. In Zentrum für Integrierte Produktionssysteme (Ed.), Optimierung der Produkt- und Prozessentwicklung (pp. 99–118). vdf Hochschulverlag.

Law, K. M. Y., Geng, S., & Li, T. (2019). Student enrollment, motivation and learning performance in a blended learning environment: The mediating effects of social, teaching, and cognitive presence. Computers & Education, 136 , 1–12. https://doi.org/10.1016/j.compedu.2019.02.021

Lenth, R. V. (2021). emmeans: Estimated Marginal Means, aka Least-Squares Means. R package version 1.6.0. In https://CRAN.R-project.org/package=emmeans

Li, K. C., & Wong, B. Y. Y. (2018). Revisiting the definitions and implementation of flexible learning. In K. C. Li, K. S. Yuen, & B. T. M. Wong (Eds.), Innovations in open and flexible education (pp. 3–13). Springer. https://doi.org/10.1007/978-981-10-7995-5_1

Chapter   Google Scholar  

Lockee, B. B., & Clark-Stallkamp, R. (2022). Pressure on the system: increasing flexible learning through distance education. Distance Education, 43 (2), 342–348.  https://doi.org/10.1080/01587919.2022.2064829 .

Lowenthal, P. R., & Snelson, C. (2017). In search of a better understanding of social presence: An investigation into how researchers define social presence. Distance Education, 38 (2), 141–159. https://doi.org/10.1080/01587919.2017.1324727

Manwaring, K. C., Larsen, R., Graham, C. R., Henrie, C. R., & Halverson, L. R. (2017). Investigating student engagement in blended learning settings using experience sampling and structural equation modeling. The Internet and Higher Education, 35 , 21–33. https://doi.org/10.1016/j.iheduc.2017.06.002

Martin, M., & Godonoga, A. (2020). SDG 4 -Policies for Flexible Learning Pathways in Higher Education Taking Stock of Good Practices Internationally . UNESCO. https://doi.org/10.13140/RG.2.2.31907.81449

Martin, F., Wang, C., & Sadaf, A. (2018). Student perception of helpfulness of facilitation strategies that enhance instructor presence, connectedness, engagement and learning in online courses. The Internet and Higher Education, 37 , 52–65. https://doi.org/10.1016/j.iheduc.2018.01.003

McGee, P., & Reis, A. (2012). Blended course design: A synthesis of best practices. Online Learning, 16 (4). https://doi.org/10.24059/olj.v16i4.239 .

McKenna, K., Gupta, K., Kaiser, L., Lopes, T., & Zarestky, J. (2020). Blended learning: Balancing the best of both worlds for adult learners. Adult Learning, 31 (4), 139–149. https://doi.org/10.1177/1045159519891997

Means, B., Bakia, M., & Murphy, R. (2014). Learning online: What research tells us about whether, when and how . Routledge.

Book   Google Scholar  

Means, B., Toyama, Y., Murphy, R., & Baki, M. (2013). The effectiveness of online and blended learning: A meta-analysis of the empirical literature. Teachers College Record , 115 (3), 1–47. http://www.tcrecord.org/Content.asp?ContentId=16882

Merrill, M. D. (2018). Using the first principles of instruction to make instruction effective, efficient, and engaging. In R. E. West (Ed.), Foundations of Learning and Instructional Design Technology: The Past, Present, and Future of Learning and Instructional Design Technology . EdTech Books. https://edtechbooks.org/lidtfoundations/using_the_first_principles_of_instruction

Molina, A. I., Jurado, F., de la Cruz, I., Redondo, M. Á., & Ortega, M. (2009). Tools to support the design, execution and visualization of instructional designs. In Y. Luo (Ed.), Cooperative design, visualization, and engineering (pp. 232–235). Springer.

Mueller, C., Mildenberger, T., & Lübcke, M. (2020). Do we always need a difference? Testing equivalence in a blended learning setting. International Journal of Research & Method in Education, 43 (3), 283–295. https://doi.org/10.1080/1743727X.2019.1680621

Müller, C., & Mildenberger, T. (2021). Facilitating flexible learning by replacing classroom time with an online learning environment: A systematic review of blended learning in higher education neu. Educational Research Review, 34 , 100394. https://doi.org/10.1016/j.edurev.2021.100394

Müller, C., Stahl, M., Alder, M., & Müller, M. (2018). Learning effectiveness and students’ perceptions in a flexible learning course. European Journal of Open, Distance and E-Learning, 21 (2), 44–53. https://doi.org/10.21256/zhaw-3189

Noetel, M., Griffith, S., Delaney, O., Sanders, T., Parker, P., del Pozo Cruz, B., & Lonsdale, C. (2021). Video improves learning in higher education: A systematic review. Review of Educational Research, 91 (2), 204-236. https://doi.org/10.3102/0034654321990713 .

Nortvig, A. M., Petersen, A. K., & Balle, S. H. (2018). A literature review of the factors influencing e-learning and blended learning in relation to learning outcome, student satisfaction and engagement. The Electronic Journal of E-learning , 16 (1), 46–55. www.ejel.org

OECD. (2019). Going digital: shaping policies . OECD Publishing. https://doi.org/10.1787/9789264312012-en

Orr, D., Luebcke, M., Schmidt, J. P., Ebner, M., Wannemacher, K., Ebner, M., & Dohmen, D. (2020). A university landscape for the digital world. In D. Orr, M. Luebcke, J. P. Schmidt, M. Ebner, K. Wannemacher, M. Ebner, & D. Dohmen (Eds.), Higher Education Landscape 2030: A trend analysis based on the AHEAD international horizon scanning (pp. 1–4). Springer International Publishing. https://doi.org/10.1007/978-3-030-44897-4_1

Owston, R., & York, D. N. (2018). The nagging question when designing blended courses: Does the proportion of time devoted to online activities matter? The Internet and Higher Education, 36 (Supplement C), 22–32. https://doi.org/10.1016/j.iheduc.2017.09.001

Pelletier, K., Brown, M., Brooks, D. C., McCormack, M., Reeves, J., Arbino, N., Bozkurt, A., Crawford, S., Czerniewicz, L., & Gibson, R. (2021). 2021 EDUCAUSE Horizon Report Teaching and Learning Edition. https://www.learntechlib.org/p/219489/ .

Pelletier, K., McCormack, M., Reeves, J., Robert, J., Arbino, N., Al-Freih, M., Dickson-Deane, C., Guevara, C., Koster, L., Sanchez-Mendiola, M., Skallerup Bessette, L., & Stine, J. (2022). 2022 EDUCAUSE Horizon Report Teaching and Learning Edition . www.learntechlib.org/p/221033/ .

Peters, M. A., Rizvi, F., McCulloch, G., Gibbs, P., Gorur, R., Hong, M., Hwang, Y., Zipin, L., Brennan, M., Robertson, S., Quay, J., Malbon, J., Taglietti, D., Barnett, R., Chengbing, W., McLaren, P., Apple, R., Papastephanou, M., Burbules, N., … Misiaszek, L. (2020). Reimagining the new pedagogical possibilities for universities post-Covid-19. Educational Philosophy and Theory, 1-44. https://doi.org/10.1080/00131857.2020.1777655 .

Pilcher, S. C. (2017). Hybrid course design: A different type of polymer blend. Journal of Chemical Education, 94 (11), 1696–1701. https://doi.org/10.1021/acs.jchemed.6b00809

Rindermann, H., & Amelang, M. (1994). Entwicklung und Erprobung eines Fragebogens zur studentischen Veranstaltungsevaluation. Empirische Pädagogik, 8 (2), 131–151.

Ryan, R. M. (1982). Control and information in the intrapersonal sphere: An extension of cognitive evaluation theory. Journal of Personality and Social Psychology, 43 (3), 450.

Sadler, D. R. (2009). Grade integrity and the representation of academic achievement. Studies in Higher Education, 34 (7), 807–826. https://doi.org/10.1080/03075070802706553

Saichaie, K. (2020). Blended, flipped, and hybrid learning: Definitions, developments, and directions. New Directions for Teaching and Learning, 2020 (164), 95–104. https://doi.org/10.1002/tl.20428

Schmied, V., & Hänze, M. (2016). Testtheoretische Überprüfung eines Fragebogens zu Kompetenzen der Selbst-und Studienorganisation und lernrelevanten Emotionen bei Studierenden. Die Hochschullehre, 2 (16), 1–16.

Shim, T. E., & Lee, S. Y. (2020). College students’ experience of emergency remote teaching due to COVID-19. Children and Youth Services Review, 119 , 105578. https://doi.org/10.1016/j.childyouth.2020.105578

Smith, K., & Hill, J. (2019). Defining the nature of blended learning through its depiction in current research. Higher Education Research & Development, 38 (2), 383–397. https://doi.org/10.1080/07294360.2018.1517732

Spanjers, I., Könings, K., Leppink, J., Verstegen, D., de Jong, N., Czabanowska, K., & van Merrienboer, J. (2015). The promised land of blended learning: Quizzes as a moderator. Educational Research Review, 15 , 59–74. https://doi.org/10.1016/j.edurev.2015.05.001

Tucker, R., & Morris, G. (2012). By design: Negotiating flexible learning in the built environment discipline. Research in Learning Technology, 20 (1), n1. https://doi.org/10.3402/rlt.v20i0.14404

Vanslambrouck, S., Zhu, C., Lombaerts, K., Philipsen, B., & Tondeur, J. (2018). Students’ motivation and subjective task value of participating in online and blended learning environments. The Internet and Higher Education, 36 , 33–40. https://doi.org/10.1016/j.iheduc.2017.09.002

Vo, H. M., Zhu, C., & Diep, N. A. (2017). The effect of blended learning on student performance at course-level in higher education: A meta-analysis. Studies in Educational Evaluation, 53 (Supplement C), 17–28. https://doi.org/10.1016/j.stueduc.2017.01.002

Vo, H. M., Zhu, C., & Diep, N. A. (2020). Students’ performance in blended learning: Disciplinary difference and instructional design factors. Journal of Computers in Education, 7 (4), 487–510. https://doi.org/10.1007/s40692-020-00164-7

Wade, W. (1994). Introduction. In W. Wade, K. Hodgkinson, A. Smith, & J. Arfield (Eds.), Flexible Learning in Higher Education (pp. 12–17). Routledge.

Download references

Acknowledgements

We thank the students who participated in this study.

Author information

Authors and affiliations.

Zurich University of Applied Sciences, St. Georgenplatz 2, 8401, Winterthur, Switzerland

Claude Müller, Thoralf Mildenberger & Daniel Steingruber

You can also search for this author in PubMed   Google Scholar

Contributions

CM: Conceptualization, Methodology, Data curation, Formal analysis, Writing—Original draft preparation and Reviewing & Editing. TM: Methodology, Data curation, Formal analysis, Software, Writing—Original draft preparation and Reviewing & Editing. DS: Methodology, Data curation, Formal analysis, Software, Writing—Original draft preparation and Reviewing & Editing. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Claude Müller .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

See Tables 6 , 7 .

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Müller, C., Mildenberger, T. & Steingruber, D. Learning effectiveness of a flexible learning study programme in a blended learning design: why are some courses more effective than others?. Int J Educ Technol High Educ 20 , 10 (2023). https://doi.org/10.1186/s41239-022-00379-x

Download citation

Received : 13 September 2022

Accepted : 20 December 2022

Published : 17 February 2023

DOI : https://doi.org/10.1186/s41239-022-00379-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Blended learning
  • Flexible learning
  • Learning effectiveness
  • Higher education
  • Educational design

research paper about flexible learning

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

Flexible learning spaces facilitate interaction, collaboration and behavioural engagement in secondary school

Contributed equally to this work with: Katharina E. Kariippanon, Dylan P. Cliff, Sarah J. Lancaster, Anthony D. Okely, Anne-Maree Parrish

Roles Conceptualization, Data curation, Formal analysis, Methodology, Project administration, Resources, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

Affiliation Early Start, School of Health and Society, Faculty of Social Sciences, University of Wollongong, Wollongong, NSW, Australia

ORCID logo

Roles Conceptualization, Formal analysis, Methodology, Supervision, Writing – review & editing

Affiliations Early Start, School of Education, Faculty of Social Sciences, University of Wollongong, Wollongong, NSW, Australia, Illawarra Health and Medical Research Institute, University of Wollongong, Wollongong, NSW, Australia

Roles Conceptualization, Methodology, Project administration, Writing – review & editing

Roles Conceptualization, Methodology, Supervision, Writing – review & editing

Affiliation Early Start, School of Education, Faculty of Social Sciences, University of Wollongong, Wollongong, NSW, Australia

  • Katharina E. Kariippanon, 
  • Dylan P. Cliff, 
  • Sarah J. Lancaster, 
  • Anthony D. Okely, 
  • Anne-Maree Parrish

PLOS

  • Published: October 4, 2019
  • https://doi.org/10.1371/journal.pone.0223607
  • Reader Comments

Fig 1

Globally, many schools are replacing traditional classrooms with innovative flexible learning spaces to improve academic outcomes. Little is known about the effect on classroom behaviour. Students from nine secondary schools (n = 60, M age = 13.2±1.0y) were observed via momentary time sampling for a 30 minute period, in both a traditionally furnished and arranged classroom and a flexible learning space containing a variety of furniture options to accommodate different pedagogical approaches and learning styles. The teaching approaches in both conditions were documented. In traditional classrooms the approach was predominantly teacher-led and in the flexible learning space it was student-centred. Students in flexible learning spaces spent significantly more time in large group settings (d = 0.61, p = 0.001), collaborating (d = 1.33, p = 0.001), interacting with peers (d = 0.88, p = 0.001) and actively engaged (d = 0.50, p = 0.001) than students in traditional classrooms. Students also spent significantly less class time being taught in a whole class setting (d = -0.65, p = 0.001), engaged in teacher-led instruction (d = -0.75, p = 0.001), working individually (d = -0.79, p = 0.001), verbally off-task (d = -0.44, p = 0.016), and using technology (d = -0.26, p = 0.022) than in traditional classrooms. The results suggest that the varied, adaptable nature of flexible learning spaces coupled with the use of student-centred pedagogies, facilitated a higher proportion of class time interacting, collaborating and engaging with the lesson content. This may translate into beneficial learning outcomes in the long-term.

Citation: Kariippanon KE, Cliff DP, Lancaster SJ, Okely AD, Parrish A-M (2019) Flexible learning spaces facilitate interaction, collaboration and behavioural engagement in secondary school. PLoS ONE 14(10): e0223607. https://doi.org/10.1371/journal.pone.0223607

Editor: Heather Erwin, University of Kentucky, UNITED STATES

Received: July 9, 2019; Accepted: September 24, 2019; Published: October 4, 2019

Copyright: © 2019 Kariippanon et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: There are ethical restrictions on sharing the de-identified data set as the Participant Information Sheets and Consent Forms did not state that the data will be publicly available through a storage depository. The following ethics committees would need to be contacted to request the data: - The University of Wollongong Ethics Committee (contact via [email protected] ); - NSW State Education Research Applications Process (SERAP) (contact via [email protected] ).

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Internationally, the education sector is undergoing a paradigm shift that encompasses both innovative built learning environments and significant reform of the pedagogical core [ 1 , 2 ], to better prepare students across all curriculum areas and learning stages to succeed in a rapidly changing and interconnected world [ 3 , 4 ]. An array of learning environments are emerging across educational institutions as educators strive to adapt their teaching practices and enhance learning outcomes [ 5 ]. These ‘flexible learning spaces’ contain a variety of furniture options in a relatively open space, which can be configured in various ways to facilitate a range of teaching and learning experiences [ 6 ]. They stand in stark contrast to traditional classrooms which are characterized by rows of desks and chairs, facing a teacher at the front who employs predominantly didactic teaching approaches. These traditional environments are now considered inadequate to deliver 21st-century competencies for learners [ 7 – 10 ]. The shift in practice is mirrored by an emerging research focus on the spatiality of education [ 11 ].

In Australia, in parallel to the shift underway in the education sector, a significant policy initiative Building the Education Revolution (BER), was launched as part of the Federal Government’s response to the global financial crisis. This resulted in funding managed at a State/Territory level, for schools to develop new learning spaces [ 12 ]. The initial investment has been followed by further financial commitments, and broad support at the departmental level for schools to transition to more flexible learning spaces.

The vision for future-focused learning environments is that learning will be enhanced through increasingly employing student-centred pedagogies. In the context of the schools participating in this study, this umbrella term incorporates project-based and personalised learning experiences that support deeper investigation into areas of personal interest beyond what is delivered to the whole class [ 5 ]. Further, these approaches enable students to be engaged as co-creators of the learning experience, both independently and collaboratively. This is much like the secondary education reforms in the Netherlands where a shift is occurring from learning environments based on knowledge transmission to those designed for knowledge construction [ 13 ]. Further the spaces and how they are used, facilitate ample opportunities to enhance student creativity, innovation, communication and problem solving skills, which are deemed increasingly crucial for the workplaces of the future that students are being prepared for by schools. These learning environments ideally support student choice of where and how to learn and enable easy access to a range of educational technologies designed to facilitate learning [ 6 ]. The incorporation of virtual space into learning environments necessitates additional modifications to both the built environment and the pedagogical approach to capitalise on the affordances of technology [ 14 ].

It is purported that flexible learning spaces inherently support educators to employ student-centred teaching approaches [ 15 ], and that these spaces can accommodate and facilitate learning modes such as collaboration, explicit instruction, independent work, feedback and reflection as well as experiential learning, which are believed to lead to improvements in students’ engagement and motivation [ 16 ]. In turn, the cognitive, social and behavioural domains of student engagement are collectively associated with improved learning outcomes such as retention of knowledge, test scores and grades [ 17 ]. It is broadly assumed that the teaching and learning approach used in flexible learning spaces will ultimately lead to improvements in academic outcomes.

Although an estimated 25% of Australian classrooms are now no longer classified as ‘traditional’ [ 18 ], there is limited empirical evidence on the effects that flexible learning spaces have on adolescent classroom behaviour and ultimately learning outcomes in the secondary school setting [ 19 , 20 ]. Despite the dearth of evidence, significant funds are being invested across Australia at Federal and State levels to both refurbish existing classrooms and fit out new builds [ 21 ]. Changes to the built environment are increasingly accompanied by an array of professional development opportunities for teachers. However, limited inter-disciplinary research exists that draws on learnings from the built environment literature and current understanding of school improvement and educational change processes, to ensure that teachers are effectively prepared and supported to transition to flexible learning spaces [ 19 ].

School educators are now faced with the challenge of navigating evolving teaching landscapes in these innovative environments, are required to adopt a flexible and adaptive pedagogical approach and provide increasingly personalised support to students. However, previous research has shown that regardless of improvements in spatial configuration, physical features or classroom furnishing, direct instruction remains the dominant pedagogical approach used in schools [ 22 ], highlighting that pedagogical adaptation is not necessarily a natural flow-on from changes to the built environment. This may be attributed to teachers’ environmental competence, with many teachers lacking the ability to manipulate the learning environment to capitalize on the affordances of the space to maximise pedagogical gain [ 23 ]. In addition, the structure of space within buildings is thought to influence the formation of relationships between people [ 24 ], yet little is known about the nature of interactions that occur with these spaces.

While acceptability of flexible learning environments is relatively high, and teachers and students report perceived benefits to teaching, learning and wellbeing [ 5 ], few studies have observed flexible learning spaces in action or have systematically documented student behaviour to determine the impact that flexibility of space and mobility of technology and furniture have on space use [ 19 ]. Effective design of leaning spaces has been found to facilitate constructivist pedagogy and student engagement [ 25 , 26 ] and research suggests that how classroom space is arranged has implications for student performance [ 27 ]. However, the modes of learning students engage in, the physical settings they choose, how they interact with their teachers and peers, and the effect of these innovative environments on behavioural engagement remain largely unexplored [ 28 ]. The aim of this study was therefore to objectively measure and compare adolescent classroom behaviour between traditional classrooms and flexible learning spaces and assess the effect of the space and teaching approach on a range of classroom behaviours.

The protocol was approved by the University of Wollongong’s Human Ethics Research Committee (HE16/021) and the New South Wales (NSW) State Education Research Applications Process (SERAP).

Participants

Purposive sampling was used to identify schools that had created at least one flexible learning space within their school, which students used on a regular basis. Changes included modifications to both the physical environment and the pedagogical approaches used in the space. Invited schools had made these changes prior to the launch of the funding initiative by the NSW Education Department, often with limited resources, and prior to this study (independent of the researchers). The study was a school-based cross-over trial, with Grade Seven-Nine classes from 12 public schools in NSW Australia, invited to participate. Informed parental consent was obtained and data collection included the students’ age, sex, cultural background and postcode of residence, which was used to determine socioeconomic status.

Learning space conditions

Traditional classrooms–built environment..

Traditional classrooms ( Fig 1 ) were a standard single classroom (M = 50m 2 ), which typically contained a desk and chair for each student, arranged in rows of paired desks or a u-shape facing the front. Students chose their seat upon entering the classroom and generally remained there for the duration of the lesson.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0223607.g001

Flexible learning spaces–built environment.

Flexible learning spaces ( Fig 2 ) were a combination of standard- and double-sized classrooms (M = 83m 2 ) and incorporated a range of furniture such as grouped tables, standing workstations, ottomans, couches, and write-able tables and walls. The majority of flexible learning spaces lacked a distinct front of the classroom, with resources including smart boards and whiteboard walls available around the room, giving the teacher greater flexibility to move around the space.

thumbnail

https://doi.org/10.1371/journal.pone.0223607.g002

Teacher professional development

Prior to the study, teachers had participated in various professional learning experiences ranging from tours of other schools that had transitioned to flexible learning spaces, conferences, short courses on designing and teaching in flexible learning spaces and informal teacher networks, both within and among schools, that gave rise to collective reflexivity and reciprocal learning [ 5 ]. All participating teachers fully embraced the underlying principles of what a flexible learning space could enable and shared a common vision for providing students with a learning environment designed to enhanced learning. These teachers were considered change agents within their respective schools [ 5 ].

Students’ in-class behaviour was systematically observed using momentary time-sampling. The instrument was based on a previously validated observational tool, the Classroom Observation System , COS-5 Pianta [ 29 ], which aims to record the frequency of a range of behaviours and experiences that may typically be observed in a school classroom. In consultation with the research team, child level setting , child academic behaviour , and child social behaviour were deemed behavioural categories relevant to the aims of this study. To provide additional detail and to ensure the tool was able to capture elements of interests that exist within a flexible learning space, a further two categories, namely mode of learning , and use of technology were added to the instrument. Table 1 provides detail on the categories, codes, and descriptions used in this study. To maximise the validity and reliability of the observations one researcher completed all observations. The researcher received two hours of training in identifying and classifying the behavioural codes and undertook practice observations from video recordings of lessons and during classroom lessons prior to data collection to develop a consistent understanding of the categories and to become familiar with the procedure.

thumbnail

https://doi.org/10.1371/journal.pone.0223607.t001

Schools were required to timetable the same group of students into both a traditional classroom and on another day a flexible learning space, for the duration of a double lessons (M = 72min). Prior to the commencement of data collection in each respective learning environment, a discussion was held with the teacher about the pedagogical approach for the lesson, the structure, content and what activities would be occurring.

A class list featuring consenting students in alphabetical order of surname was used to identify students to be observed. For the first observation in each school, the first three female and male students on the list were selected; if any were absent the next student of that sex was chosen. At the second data collection time point, the same six students were again observed. If a previously observed student was absent, the next student of the same sex on the class list would be selected. Neither the students nor their teacher were aware of who had been selected for observation. The lesson then proceeded as planned.

The observation began approximately 10 minutes after the lesson commenced, once students had settled into the lesson. The observer used headphones to alert them to observe and categorize the six student’s behaviour at 30-second intervals, on a rotational basis over a 30 minute period. Each student was therefore observed 10 times over the course of the 30 minutes. This procedure has been found to be effective when seeking to describe students’ classroom behaviour [ 30 ].

The teachers were all familiar with teaching both in their schools’ traditional classrooms and flexible learning spaces and recognized how their teaching approaches varied between the two learning environments [ 5 ]. Teachers were required to teach in a manner typical of how they would normally conduct their lesson in the respective learning spaces. The students and teachers had all spent significant time teaching and learning in both traditional classrooms and their school’s flexible learning space and quickly adjusted to the distinct ways of working in the two different environments. The lesson content and teacher were consistent across both conditions. Observations were conducted in subject areas including English, mathematics, geography, and history.

The two data collection instances in each school took place within 1–2 weeks of one another, between 2016 and 2017. Schools determined which condition was assessed first. Participating teachers were aware of the broad categories of behaviour and experiences being observed but had not seen the tool itself.

Statistical analysis

Ten observations were recorded for each of the six students over a 30 minute period in both conditions. Raw data were entered into an excel spreadsheet. For each of the six behavioural categories, the number of times each code within the category occurred was converted into a percentage, for each participant, for each observation period. Analyses were conducted in SPSS (Version 21) and STATA (Version 13). Data were analyzed using mixed-effect multi-linear regression to calculate the differences between traditional classrooms and flexible learning spaces for all codes for each behavioural category. The model analyzed the data for within-child differences. To account for clustering, schools were used as a random effect in the model. The mean differences in outcome variables between the two conditions were considered statistically significant at p < 0.05. To demonstrate the magnitude of the difference between the means of the two conditions, standardized effect sizes (Cohen’s d ) were calculated from the means and standard deviations of the two conditions; using the traditional classroom as the denominator. Effect sizes of approximately 0.2, 0.5 and 0.8 were considered small, medium and large respectively [ 31 ].

A total of 243 students from nine participating schools were invited to participate in the study and 203 provided informed consent (83%). Eight schools were co-educational and one school was an all-boys school. Schools typically selected their top academic class of the year level to participate in the research. Of the 54 students who were selected to be observed in the two conditions in each of the nine schools, a total of six students were absent at the second data collection time point, so six additional students were selected. Valid data were therefore obtained from 60 students. Of the sample 45% were female, students had a mean age of 13.2 years (SD = 1.0), and were from a range of socio-economic backgrounds, representing over 13 cultural and ethnic groups ( Table 2 ).

thumbnail

https://doi.org/10.1371/journal.pone.0223607.t002

Two quite distinct pedagogical approaches were evident in the two learning environments across the nine schools.

Traditional classrooms–pedagogical approach

In the traditional classrooms the teaching style was primarily teacher-led, with the teacher largely remaining at the front of the classroom, often near the teacher’s desk. Students generally worked individually on set tasks and received frequent input and additional instruction from the teacher. Students had limited reasons or options to stand or move around the room, or find an alternative place to work throughout the lesson or to engage with one another.

Flexible learning spaces–pedagogical approach

The teaching approach in the flexible learning spaces was student-centred and group-work focused. Students were given instructions from the teacher regarding the lesson plan and objectives at the commencement of the class, and further guidance throughout the lesson as needed. In addition students were afforded considerable freedom to choose how to go about their learning. Together with the furniture available, this teaching approach created opportunities and incentives for students to move throughout the lesson. Students were given the autonomy to choose where in the space to work, what furniture and resource to use and typically formed groups or worked independently out of their own volition.

Classroom behaviours

Significant differences were observed among multiple codes of classroom behaviour in all but the interaction with teacher category. ( Table 3 ).

thumbnail

https://doi.org/10.1371/journal.pone.0223607.t003

With respect to the learning setting , students in flexible learning spaces spent less class time working as a whole class (d = -0.65, p = 0.001), more time working in groups of more than six students (d = 0.46, p = 0.004) and in groups of up to six students (d = 0.61, p = 0.001) compared with students in traditional classrooms, resulting in moderate effect sizes. The difference in time spent sitting individually between the two conditions was not significant.

Regarding the modes of learning category, significant differences were seen in 4 of the 6 behaviour codes (all p = 0.001). Students in flexible learning spaces spent significantly more time collaborating than in traditional classrooms (d = 1.33) resulting in a very large effect. Further students in flexible learning spaces spent less time being engaged in teacher-led instruction (d = -0.75), working independently (d = -0.79) and engaged in presentation-based work (d = 0.65) resulting in moderate to large effect sizes. Difference in time spent engaged in reflective learning and research-based work were non-significant between the two conditions.

With respect to students’ type of engagement , students in flexible learning spaces spent significantly more time actively engaged with the lesson content (d = 0.50, p = 0.001) and significantly less time verbally off-task (d = -0.44, p = 0.016) than in traditional classrooms, resulting in moderate and small effect sizes, respectively. Difference in passive engagement and passive and motor off-task behaviour were non-significant between the two conditions.

In relation to students’ interaction with peers , students in flexible learning spaces spent significantly more time interacting positively (d = 0.88, p = 0.001) and significantly less time not interacting (d = -0.85, p = 0.001) traditional classrooms, resulting in large effect sizes. There was no significant difference in time spent engaged in interactions of a negative nature between the two conditions.

With respect to the proportion of time students spent using technology , students in flexible learning spaces spent significantly less time using technology (d = 0.26, p = 0.022) than in traditional classrooms, resulting in a small effect size. The differences in time spent using technology both actively and passively, between conditions, was not significant.

This study evaluated differences in student classroom behaviour between traditional classrooms and flexible learning spaces. Students in flexible learning spaces spent significantly less time in a whole-class setting, and more time working in groups, relative to traditional classrooms. In flexible learning spaces, students spent more time collaborating and interacting positively with their peers, as well as more time presenting work back to the class. Further, students spent less time being taught explicitly and working individually, than in traditional classrooms. Overall students in flexible learning spaces spent a greater proportion of class time actively engaged with the lesson. This was demonstrated through verbal and physical behaviours appropriate to the task set by the teacher, such as raising hands, writing or discussing the activity. If a child looked bored, they were still coded as engaged so long as they were doing what was asked of them by the teacher. Students were less likely to be verbally off-task and spent less time engaging with technology relative to students in traditional classrooms.

Student disengagement and lack of motivation are among the key elements that underpin the narrative around why schools are adapting their pedagogical approaches, rethinking the built classroom environment and creating flexible learning spaces [ 32 ]. Disengagement is of concern not least because it places students at risk of school dropout [ 33 ] but, because low school engagement has been shown to be a correlate and predictor of problem behaviour and poor health among adolescents [ 34 ]. On the flip side, fully engaged students report better mental and physical health in addition to improved academic grades [ 35 ]. Disengagement has been found to be particularly acute during early adolescence and persists into the secondary school years [ 36 ]. While it is recognised that engagement occurs on a cognitive, emotional and behavioural level [ 37 , 38 ], behavioural engagement–which can be classified according to how students interact with the teacher, their peers and the lesson content–is assessed most frequently as it is directly observable [ 39 ]. Student engagement is generally categorised and measured as a binary–engaged or disengaged [ 39 ]. However, engagement is not constant, but context-specific [ 37 ], temporal and thus exists along a continuum [ 39 ]. Students often fluctuate multiple times between being actively or passively engaged to being passively, verbally or motor off-task throughout a lesson. This study measured engagement along this continuum and found no differences in time spent off-task (passive or motor) between the two conditions. Instead it was found that students spent a greater proportion of class time actively engaged with the lesson content and less time verbally off-task when in the flexible spaces. Further there was no significant difference in negative interactions among students between the two conditions, but significantly more positive interaction in the flexible learning condition. Since teachers commonly report disruptive behaviour as talking out of turn or disturbing/hindering other students [ 40 ], these findings may be reassuring to educators who are concerned about possible challenges around managing disruptive classroom behaviour in these more autonomy-permissive, interactive environments.

Disengagement and detachment from school have been shown to increase as students progress through the grades [ 41 ]. Further, an association has been found between middle school instructional environments that increasingly include the whole-class setting and classroom disengagement among youth [ 41 ]. In flexible learning spaces students spent a greater proportion of class time working in group settings and collaborating, and were typically given autonomy to interact with one another and discuss academic tasks. It is suggested that this contributed to creating the conditions that fostered the high level of active engagement [ 42 ] that were observed. This is supported by findings that suggest that when peer to peer classroom interaction contributes to the creation of a positive interpersonal environment, student engagement increases [ 43 ]. It is therefore encouraging to note the greater level of positive interaction among students observed in the flexible learning spaces.

Previous research has demonstrated that a strong student-teacher relationship fosters behavioural engagement [ 44 ], and that in classrooms where teachers facilitate dialogue and discussion, student engagement is enhanced [ 45 ]. In addition it has been shown that among classroom level factors, teacher-student interactions are the greatest predictor of learning outcomes in standardised tests [ 46 ]. While this study did not observe a difference in the proportion of class time spent in teacher-student interaction between the two conditions, the observed interactions that did occur were overwhelmingly positive; i.e., they were related to academic content or rapport building rather than disciplinary in nature. This finding is possibly due to the fact that teachers participating in this study all valued the importance of teacher-student interaction and therefore may have demonstrated greater levels of engagement with their students in both types of conditions, than teachers in traditional classroom typically would.

These outcomes indicate that modifications to the built learning environment of secondary school classrooms, coupled with student-centred pedagogy, can positively influence adolescent behaviour during class time. In this context student-centred is defined as encouraging students to become active participants, engaged in their own learning experiences. This rationale aligns with research in environmental psychology which has long purported that human behaviour and the built environment are closely interrelated [ 47 ]. A key difference between these two contrasting learning environments is that in flexible learning spaces teachers actively relinquish their control over where and how students work [ 5 ]. This shift in teaching approach, coupled with the affordances of the built environment facilitate student autonomy and engagement with the space and its users. Previous research suggests that students who perceived their teacher to be autonomy-supportive exhibit higher levels of engagement [ 48 ]. The student-centred approach allows students to capitalize on opportunities created by the variety of furniture and resources such as the group tables, standing workstations, and writeable walls. Greater interaction and collaboration then flow on from breaking up the whole class setting and creating conditions that foster group work.

It would be simplistic, however, to suggest a linear causal relationship between flexible learning spaces and the outcomes being measured in this study. Rather a complex interplay exists between the built learning environment, the pedagogical approach, the subject being studied, and the student. The findings suggest that a teacher with the environmental competency to maximise the affordances of flexible learning spaces is able to achieve the results found in this study. This has implications for the nature of professional development that is offered to teachers as well as the ongoing support provided at the departmental and local school level, as teachers and students transition into flexible learning spaces. This area is currently underdeveloped [ 19 ].

A strength of this study is that the same teacher and students were observed in both conditions. As such, the differences observed in student behaviour can likely be attributed to the changes in the built environment and teaching approach, rather than to differences between cohorts of students. A limitation is that students were only observed on one occasion per school in each of the two conditions, due to the limited time available for researchers to be in the schools. Because this study design is not the gold standard for establishing cause and effect, further experimental research such as randomized trials are needed, to examine the effects of habitual behaviour over longer time frames in flexible learning spaces. Additionally it could be important to investigate the effect of employing a student-centred approach in traditional classrooms since the majority of secondary school classrooms remain traditional and pedagogical changes in themselves may have a beneficial impact on the outcomes measured in this study. This was not measured in the present study as schools were moving from traditional classrooms with teacher-led approaches to flexible learning spaces with student-centred pedagogy, so it was deemed a priority to investigate these two ends of the space/pedagogy spectrum.

Overall these findings add to the limited research from secondary schools that has shown enhanced engagement among students undertaking lessons in innovative learning environments (20). The results suggest that the varied, adaptable nature of flexible learning spaces and the greater use of student-centred pedagogies, facilitate students spending a greater proportion of class time engaging, interacting and collaborating. This may translate into beneficial learning outcomes in the long-term. Further research is required to unpack the complexity of the interplay between the built environment and the pedagogical approaches and how best to support teachers’ environmental competencies to maximise the benefits that flexible learning spaces can offer adolescents.

Acknowledgments

We acknowledge the Futures Learning Unit of the NSW Department of Education and Training, especially Kathleen Donohoe and Robert Fraser for their support with identifying eligible schools. We also thank the schools, teachers and students who participated in this research. We thank Byron Kemp for his assistance with the data analysis.

  • 1. Jonassen D., & Land S. Theoretical foundations of learning environments. 2nd ed. Abingdon: Routledge; 2012.
  • 2. Prain V., Cox P., Deed C., Edwards D., Farrelly C., Keeffe M. Characterising personalised learning. In: Prain V., editor. Personalising learning in open-plan schools. 2015.
  • 3. Organisation for Economic Co-operation and Development. The OECD Handbook for Innovative Learning Environments [Internet]. 2017. Available from: https://doi.org/10.1787/9789264277274-en
  • 4. Kuhlthau C. Guided Inquiry: Learning in the 21st Century. In Center for International Scholarship in School Libraries (CISSL), Rutgers University, USA; 2015. p. 1–8.
  • View Article
  • Google Scholar
  • 6. NSW Department of Education. Future-focused learning and teaching [Internet]. 2018. Available from: https://education.nsw.gov.au/teaching-and-learning/curriculum/learning-for-the-future/future-focused-learning-and-teaching
  • 10. Organisation for Economic Co-operation and Development. Education Policy Outlook 2015: Making Reforms Happen [Internet]. 2015. Available from: https://doi.org/10.1787/9789264225442-en
  • 11. Fisher K. The translational design of schools: An evidence-based approach to aligning pedagogy and learning environments. Sense Publishing, Rotterdam; 2016.
  • 18. Imms, W., Mahat, M., Byers, T. & Murphy D. Technical Report: Type and Use of Innovative Learning Environments in Australasian Schools ILETC Survey 1 [Internet]. 2017 [cited 2018 Nov 21]. Available from: http://www.iletc.com.au/publications/reports .
  • 22. Sanoff H. School Building Assessment Methods. 2001.
  • 24. Hillier B, Hanson J. The social logic of space. Cambridge: Cambridge University Press; 1993.
  • 25. Cleveland B. Addressing the spatial to catalyse socio-pedagogical reform in middle years education. In: Ken F, editor. The translational design of schools: An evidence-based approach to aligning pedagogy and learning environments. Rotterdam: Sense Publishers.; 2016.
  • 27. Gifford R. Environmental psychology: Principles and practice. Colville: Colville: Optimal Books; 2002.
  • 31. Cohen J. Statistical Power Analysis of the Behavioural Sciences. J C, editor. Academic Press, New York.; 1988.
  • 32. MCEETYA. Melbourne Declaration on Educational Goals for Young Australians. Education [Internet]. 2008;(December):20. Available from: http://www.curriculum.edu.au/verve/_resources/national_declaration_on_the_educational_goals_for_young_australians.pdf
  • PubMed/NCBI
  • 36. Wigfield A., Eccles J. S., Schiefele U., Roeser R., & Davis-Kean P. Development of achievement motivation. In: N DW& E, editor. Handbook of child psychology: Vol 3 Social, emotional, and personality development. 6th ed., p. New York: John Wiley.; 2006.
  • 37. Fredricks J. A., McColskey W. The measurement of student engagement: A comparative analysis of various methods and student self-report instruments. In: Christenson SL. Reschly AWC, editor. Handbook of research on student engagement. 2012.
  • 41. Eccles J. S., Midgley C. Stage/environment fit: Developmentally appropriate classrooms for early adolescents. In: Ames RA& C, editor. Research on motivation in education: Goals and cognitions. Vol 3. Academic Press, San Diego, CA:; 1989. p. 139–181.
  • 42. Jones, RD, Marrazo, MJ, & Love C. Student engagement—creating a culture of academic achievement. 2008.
  • 43. Davis M. H., McPartland JM. High school reform and student engagement. In: Christenson A. L. Reschly CW, editor. Handbook of research on student engagement. Boston, MA: Springer.; 2012. p. 515–539.
  • 46. Hattie J. Visible learning: A synthesis of over 800 meta-analyses relating to achievement. Routledge, London; 2009.

Effectiveness of an Online Classroom for Flexible Learning

International Journal of Academic Multidisciplinary Research (IJAMR), Vol. 4, Issue 8, August – 2020, Pages: 100-107

8 Pages Posted: 11 Nov 2020

Christopher Francisco

La Consolacion University Philippines

Date Written: 2020

This study aimed at investigating the role of Eliademy as a web-based classroom in designing an alternative learning tool in times of emergencies. To achieve this aim, a qualitative interview was conducted to the selected graduate school students of La Consolacion University Philippines who had experienced three (3) consecutive trimesters of Eliademy in their courses during the academic year 2018-2019. The results revealed that students strongly agreed that Eliademy can be used as an alternative tool for teaching and learning as evidenced by their perceived advantages and disadvantages of such platform. The study found out that Eliademy is accessible, can promote time management, promptness and a challenge for the users although it requires strong internet connections and time pressured. The researcher offered three easy steps in using creating this web-based classroom, to wit: “Signing in”, “Designing it”, and, “Managing out”. Finally, the researcher also presented other potential alternative learning tools by which teachers may utilize depending on the needs on the learners since such platforms have their own special features (e.g. CourseSites, iTunes U, Latitude Learning, Myicourse, Schoology, ATutor, Dokeos, Moodle, etc.). It was concluded that in times of calamities, educators and other institutions may consider the utilization of Eliademy so as not to compromise classes and even in a regular routine. Doing this may promote schools’ learning management system (LMS) which is required by the different accrediting agencies (e.g. PAASCU, PACUCOA, ISO).

Keywords: online classroom, flexible learning

Suggested Citation: Suggested Citation

Christopher Francisco (Contact Author)

La consolacion university philippines ( email ).

Catmon Rd, Capitol View Park Subdivision Malolos, Bulacan, Bulacan 3000 Philippines

Do you have a job opening that you would like to promote on SSRN?

Paper statistics, related ejournals, pedagogy ejournal.

Subscribe to this fee journal for more curated articles on this topic

Libraries & Information Technology eJournal

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Springer Nature - PMC COVID-19 Collection

Logo of phenaturepg

Learning effectiveness of a flexible learning study programme in a blended learning design: why are some courses more effective than others?

Claude müller.

Zurich University of Applied Sciences, St. Georgenplatz 2, 8401 Winterthur, Switzerland

Thoralf Mildenberger

Daniel steingruber, associated data.

The datasets are available from the corresponding author on reasonable request.

Flexible learning addresses students’ needs for more flexibility and autonomy in shaping their learning process, and is often realised through online technologies in a blended learning design. While higher education institutions are increasingly considering replacing classroom time and offering more blended learning, current research is limited regarding its effectiveness and modifying design factors. This study analysed a flexible study programme with 133 courses in a blended learning design in different disciplines over more than 4 years with a mixed-methods approach. In the analysed flexible study programme, classroom instruction time was reduced by 51% and replaced with an online learning environment in a blended learning format ( N students = 278). Student achievement was compared to the conventional study format ( N students = 1068). The estimated summary effect size for the 133 blended learning courses analysed was close to, but not significantly different from, zero ( d  = − 0.0562, p  = 0.3684). Although overall effectiveness was equivalent to the conventional study format, considerable variance in the effect sizes between the courses was observed. Based on the relative effect sizes of the courses and data from detailed analyses and surveys, heterogeneity can be explained by differences in the implementation quality of the educational design factors. Our results indicate that when implementing flexible study programmes in a blended learning design, particular attention should be paid to the following educational design principles: adequate course structure and guidance for students, activating learning tasks, stimulating interaction and social presence of teachers, and timely feedback on learning process and outcomes.

Introduction

Considering the digitalisation of society, there is an increasing need to constantly develop one’s competencies in the sense of continuous lifelong learning (OECD, 2019 ). In this context, higher education should be adapted to the learners' diverse needs and specific live phases (Barnett, 2014 ; Martin & Godonoga, 2020 ) and accessible to broader sections of the population (Dziuban et al., 2018 ; Orr et al., 2020 ). The concept of flexible learning addresses these needs and tries to afford learners more flexibility and autonomy in shaping the learning process regarding when, where, and how they learn (Boer & Collis, 2005 ; Hrastinski, 2019 ; Lockee & Clark-Stallkamp, 2022 ; Smith & Hill, 2019 ; Vanslambrouck et al., 2018 ; Wade, 1994 ). From a pedagogical point of view, different dimensions of flexible learning can be distinguished. Li and Wong ( 2018 ) analysed previous publications and identified the following dimensions of flexible learning—time, content, entry requirement, delivery, instructional approach, performance assessment, resources and support, and orientation or goal. The frequently mentioned dimension of place (e.g. Chen, 2003 ) belongs in this concept to the delivery dimension. By designing the above dimensions according to learners' needs, the students should actually perceive learning as flexible. From a technical perspective, flexible learning has often been attempted through online technologies (Tucker & Morris, 2012 ). According to Allen et al. ( 2007 ) learning environments can be classified according to their proportion of online content delivery either as traditional with no online delivery content, as web-facilitated (with an online delivery proportion of between 1 and 29 per cent), blended learning (with an online delivery proportion of between 30 and 79 per cent) or online learning with more than 80 per cent of online delivery content. Accordingly, flexible learning is often associated and used in connection with blended or online learning (Anthony et al., 2020 ).

The COVID-19 pandemic, with its global shift to remote instruction, has accelerated the demand for flexible learning options in higher education (Lockee & Clark-Stallkamp, 2022 ; Pelletier et al., 2022 ). Current student evaluations have shown that the experienced learning flexibility during ‘emergency distance learning’ (Hodges et al., 2020 ) is appreciated (Gherheș et al., 2021 ; Shim & Lee, 2020 ) and students are demanding more flexible learning options in the aftermath of the pandemic as well (Clary et al., 2022 ; Lockee & Clark-Stallkamp, 2022 ). In response, higher education institutions are now considering replacing classroom time and offering more online and blended learning formats (Kim, 2020 ; Pelletier et al., 2021 ; Peters et al., 2020 ; Saichaie, 2020 ).

Despite the apparent popularity of blended learning, academics are often concerned about the effectiveness of blending for student learning (Huang et al., 2021 ), and educational institutions will only be able to offer and expand blended learning formats when they are confident that students will perform as they would in a conventional classroom setting (Owston & York, 2018 ). Meta-analyses (Bernard et al., 2014 ; Means et al., 2013 ; Müller & Mildenberger, 2021 ; Vo et al., 2017 ) point out that blended learning is not systematically more or less effective than conventional classroom learning. At the same time, they have pointed out that the number of controlled studies is still limited and that the studies have examined mostly single courses with a study period of one semester; there is a particular lack of controlled studies at a degree level (i.e., with many courses taught over a longer period). In addition, variance in the learning effectiveness of the courses found in the studies was large, with a shortage of studies on the implementation and design success factors of blended learning based on objective learning achievement rather than student and lecturer evaluation (Bernard et al., 2019 ; Graham, 2019 ; Means et al., 2014 ).

Research questions

This study addressed the above issues of learning effectiveness and modifying factors of blended learning at the study and course levels. The focus of the researched study programme was to give students more flexibility in the learning process, especially regarding time and place, by replacing classroom time with an online learning environment in a blended learning design (see details in the research context). Accordingly, the term ‘flexible learning’ is used in this paper as desired study characteristics at the programme level. The term ‘blended learning’ is used to describe the educational design of the courses under investigation in the experimental condition.

The two research questions (RQ) were:

  • What is the impact on student achievement (measured as exam results) of blended learning with classroom time reduced by half at the course level and study programme level in a flexible learning study programme compared with the conventional study format?
  • What are the modifying factors for the learning effectiveness of blended learning courses with reduced classroom time in a flexible learning study programme?

Literature review

Student achievement.

Several studies have explored the acceptance and effectiveness of blended or online environments with reduced classroom time in recent years. In a study by Asarta and Schmidt ( 2015 ), presence in classroom sessions in a traditional course was compared with an experimental setting where lectures were also made available online. In the two settings, the exams, learning materials, and number of planned classroom sessions were identical, but students could choose whether to attend classroom sessions in the blended learning version. Data analysis showed that students reduced their average attendance to between 49 and 63%. Asarta and Schmidt ( 2015 ) concluded that—in line with the student preferences—the classroom attendance rate in blended learning courses could be reduced by approximately one-half compared with conventional courses. This is one of few studies in which students had control over the blend ratio; usually, the instructor decides and takes responsibility for the proportion of instruction delivered in a blended learning format (Boelens et al., 2017 ).

Owston and York ( 2018 ) investigated the relationship between the proportion of online time spent in blended learning courses and student satisfaction and performance. The clustering was determined by the ratio of time spent on online activities replacing classroom sessions. The results showed that students in courses with high (50%) and medium (between 36 and 50%) online proportions rated their learning environments more positively and performed significantly better than their peers in blended learning courses with low (27–30%) or supplemental online segments. Consequently, Owston and York ( 2018 ) concluded that across a wide variety of subject areas and course levels, student perceptions and performance appeared to be higher when at least one-third to one-half of regular classroom time was replaced with online activities.

Hilliard and Stewart ( 2019 ) came to similar conclusions concerning satisfaction. They examined the student perceptions of the various aspects of the community of inquiry (COI) model, and their findings indicated that students in high blend (50% online) classes perceived higher levels of teaching, social, and cognitive presence than students in medium blend (33% online) classes.

In a recent review, Müller and Mildenberger ( 2021 ) examined the impact of replacing classroom time with an online learning environment. Their meta-analysis of blended learning ( k  = 21 effect sizes) applied strict inclusion criteria concerning research design, learning outcomes measurement, and blended learning implementation. In particular, it was a requirement that the attendance time in the blended learning format was reduced by 30–79% compared to the conventional learning environment, drawing on Allen et al. ( 2007 ). In this meta-analysis, the estimated effect size (Hedge’s g) was 0.0621, although not significantly different from zero. The confidence interval [lower 95th − 0.13, upper 95th 0.25] suggests that overall differences between blended and conventional classroom learning were small, and, at best, very small negative or moderate positive effects were plausible. This implies that despite a reduction in classroom time of between 30 and 79 per cent, equivalent learning outcomes were found. However—in line with authors of other blended learning reviews (Bernard et al., 2014 ; Means et al., 2013 ; Spanjers et al., 2015 ; Vo et al., 2017 )—it was pointed out that the number of controlled studies in the field of blended learning was still limited. More primary studies of the highest methodological quality must be conducted in various disciplines to validate the results further and investigate the effectiveness of blended learning in different disciplines and contexts. Additionally, the authors emphasised considerable heterogeneity in the effect sizes between the various studies. McKenna et al. ( 2020 ) also stated that simply offering a blended learning course is not enough to ensure success; research on blended learning design should, therefore, differentiate specific study contexts to derive practice guidelines from it.

Modifying design factors

To explain the considerable differences in the effect sizes of the primary studies, various potential moderators were analysed in the meta-analysis. Out of a total of 41 potential moderators investigated [ N  = 21 in Means et al. ( 2013 ); N  = 6 in Bernard et al. ( 2014 ); N  = 6 in Spanjers et al. ( 2015 ); N  = 2 in Vo et al. ( 2017 ); and N  = 6 in Müller and Mildenberger ( 2021 )], very few have turned out to be significant. In contrast to other meta-analyses, Vo et al. ( 2017 ) found a significantly higher mean effect size in STEM disciplines compared to that of non-STEM. From an educational design perspective, it is interesting to note that the use of quizzes (or regular tests with feedback for students) has a significant and positive influence on the effectiveness and attractiveness of blended learning (Spanjers et al., 2015 ).

Bernard et al. ( 2019 ) analysed the moderator analysis in more retrospective meta-analyses from 2000 to 2015. They concluded that student interaction, collaboration, and discussion emerged as a moderating influence in several studies. Additionally, practices, feedback, and incremental quizzes (i.e., formative evaluation) also appeared important in several studies. However, they also pointed out that there is a large amount of literature showing that these instructional elements were equally valuable in all educational settings.

The above explanations have shown that the past moderator analyses in meta-analyses could not explain the heterogeneity of the student achievements with the design factors in blended learning, other than confirming that quizzes could enhance effectiveness. Studies based on surveys of students and lecturers—which assess the subjectively perceived learning success and the design factors—can provide further indications for an effective educational design in a blended learning format.

Owston and York ( 2018 ) and Hilliard and Stewart ( 2019 ) emphasised in their student survey-based studies that regardless of the chosen online or face-to-face ratios, care must be taken when designing a learning environment to integrate interactive and cooperative activities between students as well as between students and instructors. Other studies based on student evaluations (Castaño-Muñoz et al., 2014 ; Cundell & Sheepy, 2018 ; McKenna et al., 2020 ) have also emphasised the importance of student interaction in blended learning. According to Cundell and Sheepy ( 2018 ), passive online activities such as videos and readings are not as effective as well-structured activities in which students collaborate with or learn from other students. Content delivery does not equate to a well-designed learning environment or, as Merrill ( 2018 , p. 2) put it, ‘information alone is not instruction’. Thus, students need adequate stimulation, especially in the online part of blended learning (Lai et al., 2016 ; Manwaring et al., 2017 ; Pilcher, 2017 ). Often mentioned is also a thoughtful balance between face-to-face and distance moments (Vanslambrouck et al., 2018 ). Different instructional strategies were proposed for a blended learning format (McKenna et al., 2020 ), but these have not been scientifically analysed (except for the flipped classroom, e.g., Müller and Mildenberger ( 2021 )).

In Cundell and Sheepy ( 2018 ), peer feedback was also found to be effective for learning; students benefit from analysing the work of others and providing feedback to each other. The importance of feedback in the learning process is well known (Hattie & Timperley, 2007 ) and has also been shown as a critical design factor in other blended learning studies (Garcia et al., 2014 ; Martin et al., 2018 ; Vo et al., 2020 ).

In addition, other studies also highlight the importance of the social presence of instructors (Goeman et al., 2020 ; Law et al., 2019 ; Lowenthal & Snelson, 2017 ) and the creation of an affective learning climate (Caskurlu et al., 2021 ; McKenna et al., 2020 ). These aspects should help reduce social isolation (Gillett-Swan, 2017 ) in the online part of blended learning. Further studies (Caskurlu et al., 2021 ; Ellis et al., 2016 ; Han & Ellis, 2019 ; Heilporn et al., 2021 ) have also identified course structure and guidance as important design factors in blended learning.

These last factors, in particular, depend strongly on the teacher’s commitment and understanding of their role. However, implementing a new blended learning format is challenging and time-consuming for instructors and may also provoke resistance (Bruggeman et al., 2021 ; Huang et al., 2021 ). Accordingly, plausible motives need to be presented as to why these changes are necessary, and incentives are required to engage lecturers (Andrade & Alden-Rivers, 2019 ).

Based on the individual studies, the syntheses and reviews (Boelens et al., 2017 ; McGee & Reis, 2012 ; Nortvig et al., 2018 ) come to similar conclusions regarding the key design factors in blended learning. Findings like these indicate which design factors are perceived by students and lecturers as conducive to learning. However, the limitation here is that these factors were surveyed based on subjectively perceived learning success rather than on objectively assessed learning achievement. One such study by Vo et al. ( 2020 ) investigated how design factors assessed by students were related to final grades. Of the eight design factors studied, only ‘clear goals and expectations’ and ‘collaborative learning’ were significant predictors of student performance as measured by final grades in different courses. However, the level of final grades measured in various courses may not only depend on performance or instructional design but be influenced by other factors such as the bell-curve tendency of grading (Brookhart et al., 2016 ), when the grade often represents a student's relative achievement within the whole group (Sadler, 2009 ). It is, therefore, questionable whether course grades alone can be used as an objective measure to compare the effectiveness of different courses. Accordingly, other factors investigated by Vo et al. ( 2020 ), such as instructor feedback, support and facilitation, and face-to-face/online content presentation, may positively affect the quality of the learning environment and student performance; however, they are not adequately captured by comparing grades across courses.

Although research has shown some general patterns across blended learning modalities, the root causes for the learning outcomes in blended learning environments are still not apparent. Graham ( 2019 ) suspected the above in the pedagogical practices of blended learning, requiring research to examine more closely what happens at the activity level in blended learning.

Methodology

Research context.

The Zurich University of Applied Sciences (ZHAW) launched a new flexible learning study programme in a blended learning format (FLEX) in 2015 as part of a comprehensive e-learning strategy (Müller et al., 2018 ). Its Bachelor’s degree programme in Business Administration is a successful, well-established course of study offered both full-time (FT) and part-time (PT). The FLEX format is, therefore, the third study format for this degree programme.

All Bachelor’s programmes have two levels — the ‘Assessment’ level (60 ECTS credits; two semesters for FT students, three semesters for PT and FLEX students) followed by the ‘Main Study’ level (120 ECTS credits; four semesters for FT students, five semesters for PT and FLEX students) with specialisations in Banking & Finance (B&F) and General Management (GM). For the PT and FLEX formats, a part-time job or family commitment of no more than 60%–70% is recommended. The concept for the new blended learning format was developed in 2014 and tested by running a Business Administration FLEX course. After the pilot course was evaluated and found to be effective (Müller et al., 2018 ), a total of 44 courses were transformed for the BSc in Business Administration degree programme (2015–2020). The first cohort of FLEX students graduated in 2019.

The main objective of the new blended learning format FLEX was to offer students the best possible opportunities to combine their work and personal responsibilities with a flexible learning study programme. Regarding the number and distribution of classroom sessions over the 14-week term, compatibility with a distant place of residence was the guiding principle. More specifically, the maximum number of overnight stays away from home that would be acceptable to potential students had to be determined. At the same time, regular physical classroom sessions were also considered essential to enable students to reflect on the online content. As a result of these considerations, face-to-face classes for FLEX were reduced by approximately half (51%) compared with the part-time programme and replaced with a virtual self-study phase. This means that FLEX students attended the campus every three weeks for two days and the interjacent asynchronous self-study phase should allow them to learn flexibly. According to the typology of Allen et al. ( 2007 ) and the inclusion criteria for the meta-analysis of Müller and Mildenberger ( 2021 ), the design can be classified as blended learning. Concerning the dimensions of flexible learning, according to Li and Wong ( 2018 ), the FLEX format offered greater flexibility in terms of time, delivery, instructional approach, resources, and support than the conventional study format; however, the format was the same as a traditional course regarding the dimensions content, entry requirement, orientation or goal, and performance assessment.

After the time structure for the new course of study had been determined, the transition to the blended learning design was carried out at the course level. Considering that the design aspects activation, interaction and formative performance assessment have been found in empirical studies to be important for asynchronous online environments, care was taken to ensure that content was not only delivered (using learning videos, learning texts, etc.), but that students elaborated and reflected on it in the virtual self-study phases. In so-called ‘scripting workshops’ (Müller et al., 2018 ), the content was sequenced, and the educational design was created from scratch (Alammary et al., 2014 ), according to a defined process using a specially developed didactic visualisation language (see also Molina et al., 2009 ). Web-based technologies such as LMS Moodle and other tools were used and the content was delivered in digital form, mainly using learning videos produced in-house. Interaction with the teachers during the three-week self-study phases was possible in asynchronous form using the Moodle tools such as forums and chat, but no scheduled online class sessions via video conferencing tools were provided. Table ​ Table1 1 shows key features of the course designs in terms of the number of activities for the design aspects activation, interaction, and formative performance assessment (feedback) in the self-study phases, per course. Since learning videos are an important element of an asynchronous online learning environment and have proven to be effective for learning in the pilot study (Müller et al., 2018 ) and a recent meta-analysis (Noetel et al., 2021 ), the number of learning videos per course was also assessed. The number of pedagogical design factors was collected in the LMS Moodle, and the results show the range of the design characteristics in the FLEX implementation for the levels ‘Assessment’ (semesters 1–3) and ‘Main Study’ (semesters 4–8), and overall (semesters 1–8).

Educational design characteristics of the virtual self-study phases (activities per course)

Study level‘Assessment’ level (Courses  = 80)‘Main study’ level (Courses  = 53)All courses (Courses  = 133)
Content delivery
 Learning video23.615.00–5414.914.20–4920.215.30–54
Activation
 Assignments5.58.50–313.74.90–154.87.30–31
Interaction
 Forum students12.211.90–523.34.40–208.610.50–52
 Forum instructors8.910.00–452.74.00–186.48.70–45
Performance assessment (formative)
 Quizzes10.811.70–407.47.50–309.410.30–40

Research design

The research design consisted of the cohorts of the experimental FLEX group (B&F cohorts 2015–2019 and GM cohorts 2017–2019, N students = 278) with students attending all courses in the new FLEX format and the corresponding cohorts of the control group PT ( N students = 1068). The FLEX format was implemented in a blended learning design with a reduced classroom teaching time, whereas the PT-learning format was implemented conventionally via classroom teaching. Students of the FLEX and PT cohorts were allocated to classes of 30–60 students each. The number of students ( N ) who started the corresponding study programme in the first semester changed over time because of voluntary dropouts, failed exams, transfers between specialisations, and repeaters.

The gender ratio was almost the same in the experimental FLEX cohort as in the control PT cohort (proportion of female FLEX students = 35%; proportion of female PT students = 36%); however, the average age was slightly higher for FLEX students (24.7 years) compared to PT students (22.2 years). Concerning personality traits, various tests were used to investigate whether students differed regarding teamwork affinity (Lauche et al., 1999 ), ICT literacy (Kömmetter, 2010 ), general mental ability (Heller & Perleth, 2000 , only cohorts 17), and the competencies of self-study and study organisation and learning-relevant emotions including motivation (Schmied & Hänze, 2016 , only cohorts 17). These tests all showed no significant difference between the experimental FLEX group and the PT control group. With the entrance qualification of the vocational baccalaureate, students of a university of applied sciences have similar prior knowledge. To check this assumption, prior knowledge was tested in a pre-test on the topic of business administration for cohort 17 (with specialisations B&F and GM). The questions corresponded to the questions on the topic of business administration in past examinations for the vocational baccalaureate. The results of the pre-test showed no significant differences in prior knowledge between students in the FLEX and PT format in either BF [ t (94) = 0.619, p  = 0.537] or the GM [ t (69) = 0.182, p  = 0.856] specialisation.

The student eligibility requirements, lecture content, exam questions, and grading scales were identical for all students in the experimental FLEX and the control PT conditions. FLEX students took precisely the same examinations and at the same time as students in the conventional PT programme and the exams were not marked by the class teacher but by an independent pool of lecturers, allowing for a comparison of the exam results with high empirical significance.

Analysis methods for student achievement

To assess the effectiveness of the blended learning FLEX format, the exam results of the FLEX students ( N  = 2822 exams) were compared with those of PT students ( N  = 11638) in 133 courses between 2015 and 2019 (nine semesters). The effect size (standardised mean difference, also known as Cohen’s d ) was calculated for each course (i.e., the deviation of the experimental group FLEX test results from the control group PT). A t -test for the difference between the two groups (at α  = 0.05, two-tailed) was performed for each course. Additionally, a test for equivalence with equivalence defined as being between ± 0.5 standard deviations was examined (see also Mueller et al., 2020 ).

To analyse the overall learning effectiveness of the FLEX study format, the results from each course were aggregated using regression analysis (roughly similar to a meta-analysis). A linear mixed-effects regression analysis was performed with the calculated effect sizes as the response, and potential moderator variables study level, specialisation, and discipline as factors (fixed effects). In addition, a random effect for the cohort was included to control for the dependency arising from the same students attending courses. Assessing the size and significance of the random cohort effect was also of interest. Since good estimates of the standard error of the calculated effect sizes can be calculated from the raw grades, a weighted regression was performed where each effect size was weighted by its inverse estimated variance. This corresponds to the usual weighing scheme in fixed-effect meta-analysis. Using the lme4 package for R (Bates et al., 2020 ), estimation was performed using restricted maximum likelihood.

Analysis methods for the modifying factors

An analysis of potential moderating variables that might explain the heterogeneity of the effect sizes was conducted, investigating study level, specialisation of the study programme, disciplines (e.g., quantitative subjects, foreign language, social sciences, or management), and cohorts. As a first step, correlations between various contextual variables (student and lecturer perceptions, educational design characteristics) and the effect sizes of the courses were analysed, and then the critical factors were related to effect size using a multiple linear regression model.

Student perceptions of the new learning design and learning process were analysed through a student course evaluation. At the end of each course, the FLEX group completed a questionnaire consisting of nine items of different instruments—structure, guidance and motivation, coherence (SCEQ), usability (own item), support and learning outcome (HILVE, Rindermann & Amelang, 1994 ), interest/enjoyment (Intrinsic Motivation Inventory, Ryan, 1982 ), and two open-ended questions (‘What do you like about the way the course is designed?’, ‘What do you like less?’). Additionally, student attendance in on-campus classes was determined. The surveys took place after the classes had been completed but before the examination period.

Lecturers also rated the implementation conditions with a specially developed 20-item instrument according to the change dimensions in Knoster et al. ( 2000 ). This survey took place at the end of the semester when a course was first implemented. Only courses whose instructors were involved in both the development and the implementation of the courses were included in the correlation. Because instructors for individual courses changed in some cases during the test period, a smaller number of courses was analysed than the total number of courses (see Table ​ Table6 6 ).

Regression coefficients

Fixed effectEst. coefficientStd. errort value
(Intercept)− 0.073260.09821− 0.746
Study Level: “Main Study” vs “Assessment” level− 0.058360.07463− 0.782
Specialization: GM vs BF− 0.066390.11175− 0.594
Discipline: Foreign Languages vs Quantitative0.234900.110192.132
Discipline: Social Sciences vs Quantitative0.080940.099460.814
Discipline: Management vs Quantitative0.018270.086260.212
Random effectVarianceStandard deviation
Cohort (Intercept)0.014060.1186
Residual0.122660.3502

The qualitative analysis aimed to discover which factors (especially educational design characteristics) were crucial for the success of a FLEX course. For this purpose, the courses were divided into groups according to their effect size and student evaluation ratings. For the student evaluation criteria (scale 1–5), the courses were divided into three clusters (terciles) with high, medium, and low student ratings. ‘Good practice’ courses were defined as courses with a positive effect size and a high student rating (first tercile). ‘Bad practice’ courses were defined as courses with negative effect size and low student ratings (third tercile). For the qualitative analysis, from a total of 133 FLEX courses, 27 ‘good practice’ courses with a total of 493 student comments (to the question ‘What do you like about the way the course is designed?’), and 30 ‘bad practice’ courses with a total of 429 student comments (to the open-ended questions ‘What do you like less?’ and ‘Do you have ideas on how the course could be developed further?’) were included. These data were imported into MAXQDA, and each student comment was labelled with the study specialisation, semester, student number, course name, and good/bad-practice course designation (e.g., ‘SBF15_HS15_8BWL_good’).

An initial version of a category system was created, which was theory-driven and based on the principles for designing the FLEX courses. The following five categories were defined—educational design (with subcodes: content sequencing, guidance, blend online/classroom-learning), activation (with subcodes: tasks/exercises, cases, solutions), learning resources (with subcodes: textbooks, learning videos), interaction (with subcodes: with peers, with instructor), and performance assessment.

The entire dataset was coded independently by two coders. Because the category system we developed was being applied for the first time, intercoder agreement checks were started after only a few codings in two iterations to identify weaknesses (Kuckartz & Rädiker, 2019 ). An initial review was based on 10 ‘good practice’ and 10 ‘bad practice’ comments randomly selected from the dataset. A second review took place based on another 15 ‘good practice’ and 15 ‘bad practice’ comments, which were deliberately drawn according to the criterion of completing the theory-based coding guide. In both iteration cycles, the coding was checked for mismatches. The segments where non-matches occurred formed the starting point for a systematic discussion between the two coders about the disagreement, which resulted in an adaptation of the category system and the coding guide (Kuckartz & Rädiker, 2019 ). Comments that belonged to two subcategories were assigned to the main category.

Next, the two coders independently coded the entire data set. The intercoder agreement was checked at the segment level with a setting of 90% overlap, which resulted in a kappa value of 0.57. One of the coders analysed the mismatched segments and standardised them with reference to the coding guide. The coded segments were then analysed. Initially, a frequency analysis (descriptive counting of code frequency) was conducted by counting the individual codes using MAXQDA. Then, the most important aspects of the respective categories were summarised and provided with appropriate quotations.

Student achievement at the course level

The FLEX and PT samples were independent, and the sample size and histograms of the test results did not indicate a violation of the requirements of normal distribution and uniformity of variance. The effect size of the students’ exam results (Cohen’s d ) was calculated by comparing the FLEX courses with the respective PT courses. The direction was indicated by the sign of the effect size (Cohen’s d ); for example, in 61 of the 133 courses examined, the mean values of the FLEX cohort were higher than those of the PT, corresponding to positive values for the effect size (see Table ​ Table2 2 ).

Learning effectiveness of experimental FLEX courses compared with conventional PT courses

Effect size ‘Assessment’ level (Courses  = 80)‘Main Study’ level (Courses  = 53)Total (Courses  = 133)
Effect size > 0421961
 = 
Effect size < 0383371
 = 
Effect size = 0011
 = 

The courses were categorised into four subject groups—quantitative subjects (statistics, mathematics, quantitative methods), foreign language (English), social sciences (law, skills, communication, leadership & ethics), and management (e.g., strategy, accounting, marketing). The distribution of the effect sizes according to the study level, course of study (BF or GM), and subject domain is shown in Fig.  1 .

An external file that holds a picture, illustration, etc.
Object name is 41239_2022_379_Fig1_HTML.jpg

Standardised mean differences (effect sizes) of analysed courses ( N  = 133)

The results for the 133 courses in the ‘Assessment’ and the ‘Main Study’ levels showed that there is little difference in the exam scores of students in the FLEX format compared with the PT format (see Table ​ Table2). 2 ). A t -test ( α  = 0.05, two-tailed) indicated a significant difference in only 24 of the 133 courses; FLEX students showed significantly higher exam scores in 10 courses and PT students in 14 courses. To compare FLEX and PT learning performance, it is important to consider that comparative studies usually aim to demonstrate significant change. More precisely, the goal is to reject the H 0 hypothesis (no differences between groups) and confirm the H 1 hypothesis (difference between groups exists at a particular significance level). The experimental group (in our case, the FLEX cohort) would, therefore, be expected to perform significantly different from the control group (PT cohort). However, in the research context, this was not a priority. Due to the changed conditions caused by the reduction of classroom time by more than 50 per cent, the goal was instead to ensure that students achieved equivalent exam results with the self-study assignments in the blended learning format compared with the control group, despite the reduction in classroom time. Where the aim is to prove that there are no differences between the results of the two groups, an equivalence test is used. We regard standardised mean differences as equivalent if they are smaller than 0.5 in absolute value, and a statistical equivalence was found in 36 courses. In 73 courses, the difference was inconclusive (no statement possible about statistical difference or equivalence).

Student achievement at the programme level

The estimated coefficients of the linear mixed-effects regression analysis can be found in Appendix , Table ​ Table6. 6 . The estimated summary effect size d is close to and not significantly different from zero (see also Table ​ Table3). 3 ). The confidence interval [− 0.206, 0.094] suggests that overall differences between the blended learning format FLEX and the conventional classroom format PT are small and, at best, moderately negative or very small positive effects are plausible. This means that equivalent learning outcomes were found despite a reduction in classroom time for FLEX compared with PT students of over 50 per cent.

Summary effect size for the mixed-effects regression model

Effect size and standard errorConfidence interval -Test
Overall effect133− 0.05620.0562− 0.20600.0936− 1.00000.3684

Modifying factors

Moderator analysis.

In Table ​ Table4, 4 , similar to the moderator analysis in a meta-analysis, the results are presented as group means with corresponding standard errors and 95% confidence intervals. These are not averages of the raw data per group, but calculated from the regression results using the emmeans package for R (Lenth, 2021 ); for each moderator variable, the other factors were held constant at the proportion in the data set. The overall effect was similarly obtained from the regression estimate, not from averaging the original effect sizes. The significance of the effects of potential moderators was assessed using the Likelihood Ratio Test as implemented in lme4 for R (Bates et al., 2020 ), with none of the variables having a significant effect.

Likelihood ratio tests for moderators

ModeratorsEffect size and SEConfidence interval
 =   =   = 
 ‘Assessment’ level80− 0.03300.0582− 0.16970.1038
 ‘Main Study’ level53− 0.09130.0799− 0.27100.0883
 =   =   = 
 Banking and Finance95− 0.03720.0684− 0.22600.1515
 General Management38− 0.10360.0944− 0.32090.1137
 =   =   = 
 Quantitative subjects21− 0.11550.0876− 0.29520.0642
 Foreign language200.11940.0952− 0.07550.3143
 Social sciences24− 0.0450.0836− 0.20780.1387
 Management68− 0.09720.0638− 0.24620.0518

The significance of the random cohort effect was tested by comparing the full model to a classical linear model including all variables except the cohort effect, again using the Likelihood Ratio Test; this was not significant either ( LR  = 2.098, df  = 1, p  = 0.1475). Moreover, the estimated standard deviation for the cohort effect is 0.1186, which is only roughly one-third of the estimated residual standard deviation of 0.3502.

Correlation and regression analysis of contextual variables

Although the implementation context of the courses (conceptualisation of blended learning, measurement of learning outcomes, and implementation period of one semester) was quite similar, the effect sizes showed a considerable variance between the courses (see Fig.  1 ). A correlation analysis was therefore conducted to examine to what extent the student evaluation of the course quality (including attendance rate), the quantitative educational design characteristics, or the survey on the implementation conditions among the lecturers showed a correlation with the effect sizes.

The results of the correlation analysis (Pearson, 2-tailed) indicate the strongest correlation between student course evaluations and effect sizes (see Appendix , Table ​ Table7). 7 ). All items show a significant correlation between student evaluation of course quality and the effect size (e.g., item ‘I like the course’ r  = 0.289, p  = 0.001). The course quality assessed by the students, thus, has a significant correlation with the learning effectiveness measured as standardized mean differences between blended and conventional courses. This is remarkable because the course evaluation took place at the time when classes had been completed but before the examination period.

Correlation analysis effect sizes for FLEX courses

Pearson-correlationSig (2-tailed)
 Overall student course evaluation
: The content structure of the module is logical and comprehensible to me0.251**0.005123
: It is usually clear to me where I stand and what is expected of me0.244**0.007123
: There is good support during the self-study phase0.344**0.000123
: I find the self-study phase motivating0.313**0.000123
: I find the classroom sessions motivating0.307**0.002102
s: I learn a lot in the self-study phase0.180*0.047123
: I learn a lot in the classroom sessions0.312**0.001102
: The learning activities in the self-study phase are well aligned with the classroom sessions0.349**0.000123
: I like the course0.289**0.001123
 Attendance0.261**0.004119
 Learning videos− 0.1110.205132
 Assignments− 0.0170.845133
 Forum students− 0.0070.939132
 Forum instructors− 0.0210.810131
 Quizzes− 0.0500.570132
 Developing teaching quality is of great concern to me0.1150.35068
 Compensation (in hours) for developing and delivering FLEX courses is appropriate− 0.0810.51168
 My engagement in developing and delivering FLEX classes is rewarded in other ways besides compensation (in hours) at the ZHAW0.1780.14768
 The information technologies available (Moodle, etc.) and related support were adequate for my needs0.0530.66668
 Sufficient time was available to develop the FLEX module0.1310.28568
 During the development and implementation of the FLEX module, I was effectively guided and supported as needed0.1970.12562
 The introductory/continuing education sessions on FLEX were valuable for the development and implementation of the FLEX learning environment0.240*0.04968
 I feel able to realise my didactic ideas with Moodle and other e-learning tools− 0.0440.72468
 I feel able to develop good learning resources (e.g., learning videos, etc.) for the online self-study phase0.335**0.00568
 I feel able to design and manage the student learning process in FLEX effectively0.2270.06268
 I feel able to guide FLEX students well in the self-study phase and provide feedback0.2050.09468
 I feel able to anticipate the student learning process in a FLEX learning environment and adjust the didactic-methodical design accordingly0.290*0.01767
 The implementation of FLEX is compatible with the desired learning culture at the ZHAW0.345**0.00661
 There is a consensus among the lecturers regarding FLEX goals and didactic implementation0.273*0.03361
 The school’s executive committee supports the introduction of FLEX0.406**0.00167
 The introduction of FLEX is organised and managed well0.333**0.00667
 The timeline for the introduction of FLEX was good0.439**0.00067
 There were sufficient opportunities for the faculty to discuss and reflect on the experience of developing and implementing FLEX courses0.327**0.00767
 I am satisfied with the development and implementation of my FLEX course0.303*0.01367
 I am satisfied with the introduction of FLEX at the ZHAW0.303*0.01367

* p  < 0.05. ** p  < 0.01

There is also a significant correlation with the reported attendance of the classes; courses whose classroom sessions were attended more frequently show a higher effect size. In contrast, the number of different learning resources and activities in the courses—such as the number of tasks, forum posts, formative quizzes, or learning videos—has no significant correlation with the effect size of the courses.

The correlation between the implementation conditions and the effect size of the courses shows a differentiated picture. For example, the dimensions ‘incentives’ and ‘resources’ do not show a significant correlation with the effect size; however, a significant correlation is reported for the ‘competences’, ‘vision’, ‘action plan’, and ‘satisfaction’ (e.g., item ‘I am satisfied with the introduction of FLEX at the ZHAW’ r  = 0.303, p  = 0.013).

A multiple linear regression model was used to evaluate the contribution the data collected from students and lecturers make to the standardised mean difference. Because of substantial correlations between the evaluation variables (‘student course evaluation’ and ‘implementation survey instructors’), the items covering different aspects were averaged to form one aggregated variable for the student evaluation (i.e., ‘student evaluation’) and six aggregated variables for aspects of the instructor evaluation (‘incentives’, ‘resources’, ‘skills’, ‘vision’, ‘action plan’, and ‘satisfaction with the implementation’). To avoid collinearity issues, a stepwise forward procedure was used. Starting from an intercept-only model, all models adding one of the variables were fitted, but only ‘student evaluation’ ( F  = 11.2449, df  = 1, p  = 0.0014) and ‘action plan’ were significant ( F  = 7.2867, df  = 1, p  = 0090). Starting from a model containing only an intercept and ‘student evaluation’, adding ‘action plan’ did not significantly improve the fit ( F  = 2.3329, df  = 1, p  = 0.1320), but adding ‘student evaluation’ to a model that only included ‘action plan’ does ( F  = 5.9408, df  = 1, 0.0178). In a model including both variables, ‘student evaluation’ is significant ( t  = 2.437, df = 1, p  = 0.0178) while ‘action plan’ is not ( t  = 1.527, df  = 1, p  = 0.1320). The optimal model was obtained by the forward selection, containing only an intercept and ‘student evaluation’, although the adjusted R-squared value is not high (0.1438). For this reason, the results are not reported here in detail.

Qualitative analysis of educational design quality

The frequency of coded student comments on educational design quality is reported in Table ​ Table5. 5 . The student comments contained a vast number of mentions related to educational design in both the ‘good practice’ and the ‘bad practice’ courses (60.4% and 50.0% of all mentions). Within this category, it is also noticeable that many comments referred to the blending of online and classroom components (20.9% and 28.6%). Furthermore, many comments addressed the guidance provided (10.6% and 9.3%). There were a similar number of mentions in the learning videos subcategory (9.9% and 9.3%). Noticeably fewer mentions were related to the textbook/other texts (6.2% and 10.5%), assignments (6.2% and 6.5%), and performance assessment (6.2% and 6.0%). In the case of the ‘bad practice’ courses, the subcategory solutions also stand out (8.9%). There were a very low number of mentions related to interaction with peers (0.4% and 1.2%).

Frequency of student comments for each category/subcategory

Category and subcategoryGood practiceBad practice
Guidance2910.6239.3
Content sequencing114.041.6
Blending of the online and classroom sessions5720.97128.6
Textbooks176.22610.5
Learning videos279.9239.3
Assignments176.2166.5
Cases10.462.4
Solutions10.4228.9
With peers10.431.2
With instructor186.672.8
Total comments273100.0248100.0

Student comments indicate that an adequate structure and guidance are essential for the quality of the FLEX blended learning courses. The structure is described as the clear distinctness of topics and their logical sequencing as follows: ‘ The exact structuring of the topics’ (SBF15_HS15_8BWL_good) and ‘ better delimitation and structuring of individual topics’ (SGM17_FS18_1FAC_bad). As guidance, the focus concerning exam relevance in the classroom course is mentioned as ‘The content is clearly linked to the exams, and it is clear what is expected’ (SBF17_HS17_9MAR_good). This aspect also includes the desire for mock exams or the availability of exams from previous years. In addition, guidance is described as a review and outlook by the lecturers and the indication of the learning progress in the learning management system.

The subcategory ‘blending’ contains the appropriate combination of the online and classroom phase(s) (and vice-versa). This link can be achieved by taking up and deepening certain content from the online phase in the classroom or by linking to it and continuing with it. A diverging picture emerges concerning the design of the classroom phase. While some students would have liked to repeat the content from the online phase and set a focus, others would have preferred to consolidate and deepen the content from the online phase through exercises and discussions. The following statements well illustrate this divide: ‘I did not like the fact that some students came to the lectures unprepared and asked basic questions. In this way, the other students did not benefit. […] I talked to many students, and many of them had done very little preparation before the lecture and then asked many questions in the lecture. That really doesn’t work, in my opinion’. (SBF17_HS17_19MAT_bad); ‘More complex topics are treated in the classroom phase’. (SGM17_HS17_1MAR_good); ‘Teaching could be more efficient. It cannot be assumed that all FLEX students have solved everything that is on Moodle [tasks on the Learning Management System]. A misconception’. (SBF15_FS17_7MAC_bad); and ‘Repetition of the material learned in the online phase’. (SGM17_HS17_14MAR_good).

The following student statements also raise the question of optimal allocation of scarce classroom time: ‘The lecturer asks few questions and delivers many monologues. For that, I could actually watch a video instead’. (SGM19_HS19_10WIR_bad) and ‘ The way the classroom sessions are structured is good. At the beginning, a short repetition of the theory and then working on tasks. This helps us to repeat and apply all the material learned’. (SGM18_HS19_3MIK_good).

In the category ‘content delivery’, the compactness of the learning resources and their alignment with the online weeks was mentioned. The linking of instructional texts, PowerPoint slides, and learning videos was brought up in the context of learning resources. In the case of instructional texts, students mentioned their comprehensibility and, in the case of learning videos, their existence, quality, and adequate length: ‘Good structure with linking of book, slides, and videos’. (SBF19_HS19_19MAR_good).

In the ‘activation’ category, the number and variety of exercises and their consistency with the theory learned were mentioned. In addition, the existence of solutions to tasks and exercises was cited as crucial for the online phase in three respects—the solutions must be complete (i.e., solutions to all tasks), sufficiently detailed (i.e., with solution path included), and readily available (i.e., at the time when students solve the tasks); ‘Not having a complete solution script inhibits the learning process very much if I always have to ask for the solution in the forum every time I have [already] finished an assignment. When then the answer finally comes, I am already somewhere else again—very counterproductive’! (SBF16_HS17_14MIK_bad).

In the ‘interaction’ category, the opportunity to ask questions and get a quick answer from the lecturers was frequently mentioned for both the classroom and the online phases. A well-maintained forum (opportunity to place questions in the LMS system) was also mentioned for the online phase. Although there were few comments about peer interaction, it was noticeable that group work was seriously questioned: ‘In general, the obligation to participate in group performance assessments is paradoxical and pointless in the context of the goals of this part-time FLEX course’. (SBF15_FS17_19EBF_bad).

In the ‘performance assessment’ category, formative tests with automatic and immediate feedback were mentioned: ‘I also like the small exams for self-testing because you can check what you have understood’. (SGM18_HS18_1WIR_good).

Discussion and conclusions

Results from the first research question demonstrated that the estimated effect size for a flexible learning study programme in a blended learning design with a 51% reduced on-site classroom time was close to and not significantly different from zero. This result is in line with previous studies (e.g., Müller & Mildenberger, 2021 ), suggesting that a blended learning format with reduced classroom time is not systematically more or less effective than a conventional study format. This study also indirectly confirmed the recommendations of various authors (Hilliard & Stewart, 2019 ; Owston & York, 2018 ) to divide the online and face-to-face portions of blended learning in half. Similar to the results of other studies and reviews on blended learning (Bernard et al., 2014 ; Means et al., 2013 ; Müller & Mildenberger, 2021 ; Spanjers et al., 2015 ; Vo et al., 2017 ), the effect sizes of the courses were broadly scattered around zero, with almost one standard deviation in the minus to over one standard deviation in the plus.

Findings from the second research question addressed the modifying factors for the learning effectiveness of blended learning courses with reduced classroom time. The analysed moderators of ‘study level’, ‘specialisation’, and ‘disciplines’ can be classified as moderating effects of condition (Means et al., 2013 ). The non-significant results for the study level are in line with the findings of systematic reviews by Bernard et al. ( 2014 ) and Means et al. ( 2013 ), who found no moderation effects on the course level (undergraduate vs graduate course). The non-significant result of the moderator ‘discipline’ corroborates the systematic reviews of Müller and Mildenberger ( 2021 ) and Bernard et al. ( 2014 ). However, it is not in line with Vo et al. ( 2017 ), who found a significantly higher effect size for STEM disciplines. Different definitions of these disciplines may explain the differences in these findings.

Based on the results of this study and the systematic reviews conducted in the past, it can be concluded that the heterogeneity of the results is not likely to be attributable to conditional factors such as the study level or discipline. However, significant correlations were reported between the effect sizes of the courses and the educational quality and design evaluated by students, the implementation conditions evaluated by lecturers, and on-site class attendance. There is collinearity between these aspects, and it can be assumed that there is a causal relationship in the sense that on-site attendance is influenced by the educational design and the quality of the course. Furthermore, the latter, in turn, is impacted by the attitude and motivation of the lecturers towards the FLEX programme. However, apart from the educational quality as evaluated by the students, significant direct and indirect effects could not be established with the fitted multiple linear regression model.

The importance of the educational design for the effectiveness of blended learning was supported by the significant moderator analyses of Spanjers et al. ( 2015 ) regarding the use of quizzes. In contrast, no correlation was shown between the number of online learning resources and activities in the courses, such as the number of assignments, forum posts, formative quizzes, or learning videos, on the one hand, and the effect sizes, on the other. This indicates that educational quality goes beyond the mere number of activities or particular learning resources and that an appropriate educational design is decisive (Graham, 2019 ; Nortvig et al., 2018 ).

The qualitative design analysis of the courses with high vs low learning effectiveness identified several crucial design factors for learning-effective blended courses. Regarding educational design, an adequate course structure and guidance for students are recognised as essential. In the context of an undergraduate programme, this means, in particular, that the learning environment has a clear structure, and that sufficient guidance is provided. This factor is significant in blended learning because the combination of online and face-to-face teaching and the partial distance between teachers and students increase the complexity of the learning environment. In this regard, a thoughtful alignment of the online and on-site learning phases was also mentioned; however, the feedback was contradictory concerning the instructional strategy (McKenna et al., 2020 ). While some students prefer to consolidate and deepen the content from the online phase through exercises and discussions, others simply prefer to repeat it. Such feedback must be seen in the context of the flexible learning study programme FLEX, which offers students opportunities to combine their work and personal responsibilities with study and, therefore, possibly attracts students who place a high priority on pedagogic efficiency. The delicate balance between work, private life, and education is, therefore, more keenly felt by these students and could result in insufficient time to complete all the online tasks. Consequently, guidance also means that instructors should explain how the online and on-site phases are integrated and help their students understand that the online environment is an essential part of the blended learning experience (see also Ellis et al., 2016 ; Han & Ellis, 2019 ).

Regarding content delivery, good practice is characterised by learning resources that are well linked and aligned with other elements, such as the tasks in the learning environments. In line with the pilot study (Müller et al., 2018 ) and a recent systematic review (Noetel et al., 2021 ), learning videos are appreciated by students and considered to have many educational benefits.

The relevance of activation was also pointed out in the qualitative analysis. These learning activities enable students to transform the information they have acquired into knowledge and skills and facilitate their ability to apply learned knowledge and skills in new and real-life situations. In addition to previous studies (Cundell & Sheepy, 2018 ; Lai et al., 2016 ; Manwaring et al., 2017 ; Pilcher, 2017 ), the instant availability of complete and detailed solutions when students learn with tasks and exercises is essential for the learning process and its effectiveness.

Regarding the aspects of interaction and assessment, the results corroborate previous studies as the good practice is associated with the social presence of instructors and their prompt feedback (Goeman et al., 2020 ; Law et al., 2019 ; Lowenthal & Snelson, 2017 ) and the availability of formative tests with immediate, often automatic feedback (Garcia et al., 2014 ; Martin et al., 2018 ). At the same time, the interaction between students is controversial, and group work is questioned. This may result from the previously discussed need for efficiency in a flexible learning study programme. However, other studies (Gillett-Swan, 2017 ; Vanslambrouck et al., 2018 ; Vo et al., 2020 ) have also pointed out that a blended learning design may also be associated with specific costs, such as the practical issue of organising group work.

Theoretical and practical implications

The presented work in this study has theoretical and practical contributions and implications. Theoretically, this study expanded the database regarding the learning effectiveness of blended learning with reduced attendance time in several ways and provides important findings. First, past studies on blended learning with reduced classroom time were, with a few exceptions (e.g. Chingos et al., 2017 ), designed as single studies with a limited duration of usually one semester (Müller & Mildenberger, 2021 ). In contrast, this study extended these findings at the study programme level encompassing many courses (133 courses) in different disciplines over more than four years (nine semesters). Additionally, it was not designed as a model project with privileged conditions such as selected lecturers and additional resources but introduced using existing equipment and regular teaching staff. Accordingly, a high ecological validity can be assumed.

Similar to the meta-analyses on blended learning (Bernard et al., 2014 ; Means et al., 2013 ; Müller & Mildenberger, 2021 ; Vo et al., 2017 ), the observed variance in the learning effectiveness of the individual courses was large. The findings of this study demonstrated that the heterogeneity of the effect sizes could be explained by differences in the implementation quality of the educational design factors. This study is the first we are aware of that investigated design factors based on the relative effect sizes of individual courses and not only on student and lecturer evaluation.

The results of this study provide institutions and administrators with practical guidance for their flexible learning initiatives, especially concerning learning effectiveness and the related design principles of a flexible learning programme in a blended learning format. Based on our findings, we recommend paying particular attention to the following educational design principles when implementing blended learning courses:

  • Adequate course structure and guidance for students.
  • Activating learning tasks.
  • Stimulating interaction and social presence of teachers.
  • Timely feedback on the learning process and outcomes.

Instructors are responsible for designing and implementing these factors, and this study showed that the quality of the educational design was significantly related to lecturer attitudes towards blended learning with reduced on-site classroom time. Accordingly, when introducing blended learning to an educational institution, it is vital not only to provide the necessary infrastructure and resources and develop the skills needed to teach a blended learning format but also to provide lecturers with incentives for engagement. At the same time, a shared vision of a flexible learning environment in a blended learning design should be developed to initiate and establish a new learning culture.

Finally, the student evaluation of the course quality has a significant correlation with the relative effect sizes of the individual courses. Thus, students seem to have a good sense of what blended learning conditions they require to succeed. Accordingly, we recommend educational institutions actively involve students in developing blended learning designs, even to the extent of forming pedagogical partnerships (Cook-Sather et al., 2019 ).

Limitations and future directions

The design of this study was strictly controlled for a field study in an educational area. Due to identical learning objectives and exams, the framework conditions of the two study formats were comparable, the presence of a control group ensured a quasi-experimental design, and selection bias was controlled. Additionally, as this study was not carried out in a model project with unique resources, support, and incentives, a high ecological validity can be assumed in an authentic university setting with regular lecturers. Nevertheless, the study is subject to the inherent limitations of a real-life setting.

Concerning the data set, because the university had to switch from a mainly on-site format to exclusively hybrid and online formats during the COVID-19 pandemic, cohorts could be surveyed at different study levels, and only one complete cohort could be observed, uninterrupted, from entry to graduation. Accordingly, relatively few courses from the upper semesters of the ‘Main Study’ level were included compared to ‘Assessment’ level courses.

Another limitation of this study is that the flexible learning study programme in a blended learning design we analysed appeals necessarily to a particular student population, namely those with limited time and/or a greater need for spatial flexibility, often because of a demanding job or family commitments. As a result, although the FLEX and PT groups were similar in terms of the control variables and the pre-test, bias due to self-selection could not be ruled out. It should, therefore, be acknowledged that the results concerning the blended learning format are of limited generalizability beyond a context of a flexible learning study programme. It was also shown that the needs of students regarding flexible learning programmes can be highly specific. Therefore, in the future, it would be essential to differentiate research on the design of blended learning depending on the particular study context.

Furthermore, this study identified design factors for blended learning courses based on the relative effect sizes of individual courses. Future studies should verify and differentiate the results of this study to arrive at validated practice guidelines.

Conclusions

This work contributes to the growing literature on the implementation of flexible learning study programs in a blended learning design. Overall, this study found equivalent overall learning effectiveness in a blended learning format with reduced classroom time by 51% compared with the conventional study format. The study provides evidence that making education more flexible by offering blended learning with reduced classroom time can improve access to education without compromising learning effectiveness. Additionally, the learning effectiveness of the individual courses was found to be moderated by the implementation quality of the educational design factors. Specifically, an adequate course structure and guidance for students, activating learning tasks, stimulating interaction and social presence of teachers, as well as timely feedback on the learning process and outcomes, were identified as crucial design principles for learning-effective blended learning courses.

The results encourage higher education institutions to offer flexible study programmes in a blended learning format with reduced classroom time but also underscore the importance of the educational design quality.

Acknowledgements

We thank the students who participated in this study.

Abbreviations

B&FBanking & Finance
COVID-19Coronavirus disease 2019
FLEXFlexible learning study programme
FTFull-time study programme
GMGeneral management
LMSLearning management system
PTPart-time study programme
STEMScience, Technology, Engineering, and Mathematics (subjects)

See Tables ​ Tables6, 6 , ​ ,7 7 .

Author contributions

CM: Conceptualization, Methodology, Data curation, Formal analysis, Writing—Original draft preparation and Reviewing & Editing. TM: Methodology, Data curation, Formal analysis, Software, Writing—Original draft preparation and Reviewing & Editing. DS: Methodology, Data curation, Formal analysis, Software, Writing—Original draft preparation and Reviewing & Editing. All authors read and approved the final manuscript.

Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.

Availability of data and materials

Declarations.

The authors declare that they have no competing interests.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Claude Müller, Email: hc.wahz@weum .

Thoralf Mildenberger, Email: hc.wahz@dlim .

Daniel Steingruber, Email: hc.wahz@dits .

  • Alammary A, Sheard J, Carbone A. Blended learning in higher education: Three different design approaches. Australasian Journal of Educational Technology. 2014 doi: 10.14742/ajet.693. [ CrossRef ] [ Google Scholar ]
  • Allen, I. E., Seaman, J., & Garrett, R. (2007). Blending in: The extent and promise of blended education in the United States . Sloan Consortium.
  • Andrade MS, Alden-Rivers B. Developing a framework for sustainable growth of flexible learning opportunities. Higher Education Pedagogies. 2019; 4 (1):1–16. doi: 10.1080/23752696.2018.1564879. [ CrossRef ] [ Google Scholar ]
  • Anthony B, Kamaludin A, Romli A, Raffei AF, Phon DN, Abdullah A, Ming GL. Blended learning adoption and implementation in higher education: A theoretical and systematic review. Tech Know Learn. 2020 doi: 10.1007/s10758-020-09477-z. [ CrossRef ] [ Google Scholar ]
  • Asarta CJ, Schmidt JR. The choice of reduced seat time in a blended course. The Internet and Higher Education. 2015; 27 :24–31. doi: 10.1016/j.iheduc.2015.04.006. [ CrossRef ] [ Google Scholar ]
  • Barnett, R. (2014). Conditions of flexibility: Securing a more responsive higher education system . Higher Education Academy. https://www.heacademy.ac.uk/resource/conditions-flexibility-securing-more-responsive-higher-education-system
  • Bates, D., Maechler, M., Bolker, B., & Walker, S. (2020). lme4: Linear Mixed-Effects Models using ‘Eigen’ and S4. R package Version 1.1–26 . In https://CRAN.R-project.org/package=lme4
  • Bernard RM, Borokhovski E, Schmid RF, Tamim RM, Abrami PC. A meta-analysis of blended learning and technology use in higher education: From the general to the applied. Journal of Computing in Higher Education. 2014; 26 (1):87–122. doi: 10.1007/s12528-013-9077-3. [ CrossRef ] [ Google Scholar ]
  • Bernard RM, Borokhovski E, Tamim RM. The state of research on distance, online, and blended Learning: Meta-analyses and qualitative systematic reviews. In: Moore MG, Diehl WC, editors. Handbook of Distance Education. 4. Routledge; 2019. pp. 92–104. [ Google Scholar ]
  • Boelens R, De Wever B, Voet M. Four key challenges to the design of blended learning: A systematic literature review. Educational Research Review. 2017; 22 (Supplement C):1–18. doi: 10.1016/j.edurev.2017.06.001. [ CrossRef ] [ Google Scholar ]
  • Boer WD, Collis B. Becoming more systematic about flexible learning: Beyond time and distance. ALT-J. 2005; 13 (1):33–48. doi: 10.1080/0968776042000339781. [ CrossRef ] [ Google Scholar ]
  • Brookhart, S. M., Guskey, T. R., Bowers, A. J., McMillan, J. H., Smith, J. K., Smith, L. F., Stevens, M. T., & Welsh, M. E. (2016). A Century of Grading Research: Meaning and Value in the Most Common Educational Measure. Review of Educational Research, 86 (4), 803–848. 10.3102/0034654316672069.
  • Bruggeman B, Tondeur J, Struyven K, Pynoo B, Garone A, Vanslambrouck S. Experts speaking: Crucial teacher attributes for implementing blended learning in higher education. The Internet and Higher Education. 2021; 48 :100772. doi: 10.1016/j.iheduc.2020.100772. [ CrossRef ] [ Google Scholar ]
  • Caskurlu S, Richardson JC, Maeda Y, Kozan K. The qualitative evidence behind the factors impacting online learning experiences as informed by the community of inquiry framework: A thematic synthesis. Computers & Education. 2021; 165 :104111. doi: 10.1016/j.compedu.2020.104111. [ CrossRef ] [ Google Scholar ]
  • Castaño-Muñoz J, Duart JM, Sancho-Vinuesa T. The Internet in face-to-face higher education: Can interactive learning improve academic achievement? British Journal of Educational Technology. 2014; 45 (1):149–159. doi: 10.1111/bjet.12007. [ CrossRef ] [ Google Scholar ]
  • Chen D-T. Uncovering the provisos behind flexible learning. Educational Technology & Society. 2003; 6 (2):25–30. [ Google Scholar ]
  • Chingos MM, Griffiths RJ, Mulhern C, Spies RR. Interactive online learning on campus: Comparing students’ outcomes in hybrid and traditional courses in the university system of Maryland. The Journal of Higher Education. 2017; 88 (2):210–233. doi: 10.1080/00221546.2016.1244409. [ CrossRef ] [ Google Scholar ]
  • Clary G, Dick G, Akbulut AY, Van Slyke C. The after times: college students’ desire to continue with distance learning post pandemic. Communications of the Association for Information Systems. 2022; 50 :52–85. doi: 10.17705/1CAIS.05003. [ CrossRef ] [ Google Scholar ]
  • Cook-Sather, A., Bahti, M., & Ntem, A. (2019). Pedagogical Partnerships . Elon University Center for Engaged Learning. 10.36284/celelon.oa1
  • Cundell A, Sheepy E. Student perceptions of the most effective and engaging online learning activities in a blended graduate seminar. Online Learning. 2018; 22 (3):87–102. doi: 10.24059/olj.v22i3.1467. [ CrossRef ] [ Google Scholar ]
  • Dziuban C, Graham CR, Moskal PD, Norberg A, Sicilia N. Blended learning: The new normal and emerging technologies. International Journal of Educational Technology in Higher Education. 2018; 15 (3):1–16. doi: 10.1186/s41239-017-0087-5. [ CrossRef ] [ Google Scholar ]
  • Ellis RA, Pardo A, Han F. Quality in blended learning environments—Significant differences in how students approach learning collaborations. Computers & Education. 2016; 102 :90–102. doi: 10.1016/j.compedu.2016.07.006. [ CrossRef ] [ Google Scholar ]
  • Garcia A, Abrego J, Calvillo MM. A study of hybrid instructional delivery for graduate students in an educational leadership course. International Journal of E-Learning & Distance Education. 2014; 29 (1):1–15. [ Google Scholar ]
  • Gherheș, V., Stoian, C. E., Fărcașiu, M. A., & Stanici, M. (2021). E-Learning vs. face-to-face learning: Analyzing students’ preferences and behaviors. Sustainability, 13 (8), 4381. https://www.mdpi.com/2071-1050/13/8/4381 .
  • Gillett-Swan J. The challenges of online learning: Supporting and engaging the isolated learner. Journal of Learning Design. 2017; 10 (1):20–30. doi: 10.5204/jld.v9i3.293. [ CrossRef ] [ Google Scholar ]
  • Goeman K, De Grez L, van den Muijsenberg E, Elen J. Investigating the enactment of social presence in blended adult education. Educational Research. 2020; 62 (3):340–356. doi: 10.1080/00131881.2020.1796517. [ CrossRef ] [ Google Scholar ]
  • Graham CR. Current research in blended learning. In: Moore MG, Diehl WC, editors. Handbook of distance education. 4. Routledge; 2019. pp. 173–188. [ Google Scholar ]
  • Han F, Ellis RA. Identifying consistent patterns of quality learning discussions in blended learning. The Internet and Higher Education. 2019; 40 :12–19. doi: 10.1016/j.iheduc.2018.09.002. [ CrossRef ] [ Google Scholar ]
  • Hattie J, Timperley H. The power of feedback. Review of Educational Research. 2007; 77 (1):81–112. doi: 10.3102/003465430298487. [ CrossRef ] [ Google Scholar ]
  • Heilporn G, Lakhal S, Bélisle M. An examination of teachers’ strategies to foster student engagement in blended learning in higher education. International Journal of Educational Technology in Higher Education. 2021 doi: 10.1186/s41239-021-00260-3. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Heller, K. A., & Perleth, C. (2000). Kognitiver Fähigkeitstest für 4. bis 12. Klassen, Revision: KFT 4–12+ R . Beltz.
  • Hilliard LP, Stewart MK. Time well spent: Creating a community of inquiry in blended first-year writing courses. The Internet and Higher Education. 2019; 41 :11–24. doi: 10.1016/j.iheduc.2018.11.002. [ CrossRef ] [ Google Scholar ]
  • Hodges, C. B., Moore, S., Lockee, B. B., Trust, T., & Bond, M. A. (2020). The difference between emergency remote teaching and online learning. EDUCAUSE Review . https://er.educause.edu/articles/2020/3/the-difference-between-emergency-remote-teaching-and-online-learning .
  • Hrastinski S. What do we mean by blended learning? TechTrends. 2019 doi: 10.1007/s11528-019-00375-5. [ CrossRef ] [ Google Scholar ]
  • Huang J, Matthews KE, Lodge JM. ‘The university doesn’t care about the impact it is having on us’: Academic experiences of the institutionalisation of blended learning. Higher Education Research & Development. 2021; 41 (5):1557–1571. doi: 10.1080/07294360.2021.1915965. [ CrossRef ] [ Google Scholar ]
  • Kim, J. (2020). Teaching and learning after COVID-19. Inside Higher Ed , 1 . https://www.insidehighered.com/digital-learning/blogs/learning-innovation/teaching-and-learning-after-covid-19
  • Knoster TP, Villa RA, Thousand J. A framework for thinking about systems change. In: Villa RA, Thousand J, editors. Restructuring for caring and effective education: Piecing the puzzle together. Paul H. Brookes Publishing; 2000. pp. 93–128. [ Google Scholar ]
  • Kömmetter, S. (2010). Strukturelle Äquivalenz von Skalen zur Messung von studienrelevanten Kompetenzen und Einstellungen [Doctoral dissertation Vienna University]. Wien. http://othes.univie.ac.at/10028/1/2010-05-17_0202045.pdf
  • Kuckartz U, Rädiker S. Analyzing qualitative data with MAXQDA. Springer Nature. 2019 doi: 10.1007/978-3-030-15671-8. [ CrossRef ] [ Google Scholar ]
  • Lai M, Lam KM, Lim CP. Design principles for the blend in blended learning: A collective case study. Teaching in Higher Education. 2016; 21 (6):716–729. doi: 10.1080/13562517.2016.1183611. [ CrossRef ] [ Google Scholar ]
  • Lauche, K., Verbeck, A., & Weber, W. (1999). Multifunktionale Teams in der Produkt-und Prozessentwicklung. In Zentrum für Integrierte Produktionssysteme (Ed.), Optimierung der Produkt- und Prozessentwicklung (pp. 99–118). vdf Hochschulverlag.
  • Law KMY, Geng S, Li T. Student enrollment, motivation and learning performance in a blended learning environment: The mediating effects of social, teaching, and cognitive presence. Computers & Education. 2019; 136 :1–12. doi: 10.1016/j.compedu.2019.02.021. [ CrossRef ] [ Google Scholar ]
  • Lenth, R. V. (2021). emmeans: Estimated Marginal Means, aka Least-Squares Means. R package version 1.6.0. In https://CRAN.R-project.org/package=emmeans
  • Li KC, Wong BYY. Revisiting the definitions and implementation of flexible learning. In: Li KC, Yuen KS, Wong BTM, editors. Innovations in open and flexible education. Springer; 2018. pp. 3–13. [ Google Scholar ]
  • Lockee BB, Clark-Stallkamp R. Pressure on the system: increasing flexible learning through distance education. Distance Education. 2022; 43 (2):342–348. doi: 10.1080/01587919.2022.2064829. [ CrossRef ] [ Google Scholar ]
  • Lowenthal PR, Snelson C. In search of a better understanding of social presence: An investigation into how researchers define social presence. Distance Education. 2017; 38 (2):141–159. doi: 10.1080/01587919.2017.1324727. [ CrossRef ] [ Google Scholar ]
  • Manwaring KC, Larsen R, Graham CR, Henrie CR, Halverson LR. Investigating student engagement in blended learning settings using experience sampling and structural equation modeling. The Internet and Higher Education. 2017; 35 :21–33. doi: 10.1016/j.iheduc.2017.06.002. [ CrossRef ] [ Google Scholar ]
  • Martin, M., & Godonoga, A. (2020). SDG 4 -Policies for Flexible Learning Pathways in Higher Education Taking Stock of Good Practices Internationally . UNESCO. 10.13140/RG.2.2.31907.81449
  • Martin F, Wang C, Sadaf A. Student perception of helpfulness of facilitation strategies that enhance instructor presence, connectedness, engagement and learning in online courses. The Internet and Higher Education. 2018; 37 :52–65. doi: 10.1016/j.iheduc.2018.01.003. [ CrossRef ] [ Google Scholar ]
  • McGee, P., & Reis, A. (2012). Blended course design: A synthesis of best practices. Online Learning, 16 (4). 10.24059/olj.v16i4.239.
  • McKenna K, Gupta K, Kaiser L, Lopes T, Zarestky J. Blended learning: Balancing the best of both worlds for adult learners. Adult Learning. 2020; 31 (4):139–149. doi: 10.1177/1045159519891997. [ CrossRef ] [ Google Scholar ]
  • Means B, Bakia M, Murphy R. Learning online: What research tells us about whether, when and how. Routledge; 2014. [ Google Scholar ]
  • Means, B., Toyama, Y., Murphy, R., & Baki, M. (2013). The effectiveness of online and blended learning: A meta-analysis of the empirical literature. Teachers College Record , 115 (3), 1–47. http://www.tcrecord.org/Content.asp?ContentId=16882
  • Merrill, M. D. (2018). Using the first principles of instruction to make instruction effective, efficient, and engaging. In R. E. West (Ed.), Foundations of Learning and Instructional Design Technology: The Past, Present, and Future of Learning and Instructional Design Technology . EdTech Books. https://edtechbooks.org/lidtfoundations/using_the_first_principles_of_instruction
  • Molina AI, Jurado F, de la Cruz I, Redondo MÁ, Ortega M. Tools to support the design, execution and visualization of instructional designs. In: Luo Y, editor. Cooperative design, visualization, and engineering. Springer; 2009. pp. 232–235. [ Google Scholar ]
  • Mueller C, Mildenberger T, Lübcke M. Do we always need a difference? Testing equivalence in a blended learning setting. International Journal of Research & Method in Education. 2020; 43 (3):283–295. doi: 10.1080/1743727X.2019.1680621. [ CrossRef ] [ Google Scholar ]
  • Müller C, Mildenberger T. Facilitating flexible learning by replacing classroom time with an online learning environment: A systematic review of blended learning in higher education neu. Educational Research Review. 2021; 34 :100394. doi: 10.1016/j.edurev.2021.100394. [ CrossRef ] [ Google Scholar ]
  • Müller C, Stahl M, Alder M, Müller M. Learning effectiveness and students’ perceptions in a flexible learning course. European Journal of Open, Distance and E-Learning. 2018; 21 (2):44–53. doi: 10.21256/zhaw-3189. [ CrossRef ] [ Google Scholar ]
  • Noetel, M., Griffith, S., Delaney, O., Sanders, T., Parker, P., del Pozo Cruz, B., & Lonsdale, C. (2021). Video improves learning in higher education: A systematic review. Review of Educational Research, 91 (2), 204-236. 10.3102/0034654321990713.
  • Nortvig, A. M., Petersen, A. K., & Balle, S. H. (2018). A literature review of the factors influencing e-learning and blended learning in relation to learning outcome, student satisfaction and engagement. The Electronic Journal of E-learning , 16 (1), 46–55. www.ejel.org
  • OECD . Going digital: shaping policies. OECD Publishing; 2019. [ Google Scholar ]
  • Orr D, Luebcke M, Schmidt JP, Ebner M, Wannemacher K, Ebner M, Dohmen D. A university landscape for the digital world. In: Orr D, Luebcke M, Schmidt JP, Ebner M, Wannemacher K, Ebner M, Dohmen D, editors. Higher Education Landscape 2030: A trend analysis based on the AHEAD international horizon scanning. Springer International Publishing; 2020. pp. 1–4. [ Google Scholar ]
  • Owston R, York DN. The nagging question when designing blended courses: Does the proportion of time devoted to online activities matter? The Internet and Higher Education. 2018; 36 (Supplement C):22–32. doi: 10.1016/j.iheduc.2017.09.001. [ CrossRef ] [ Google Scholar ]
  • Pelletier, K., Brown, M., Brooks, D. C., McCormack, M., Reeves, J., Arbino, N., Bozkurt, A., Crawford, S., Czerniewicz, L., & Gibson, R. (2021). 2021 EDUCAUSE Horizon Report Teaching and Learning Edition. https://www.learntechlib.org/p/219489/ .
  • Pelletier, K., McCormack, M., Reeves, J., Robert, J., Arbino, N., Al-Freih, M., Dickson-Deane, C., Guevara, C., Koster, L., Sanchez-Mendiola, M., Skallerup Bessette, L., & Stine, J. (2022). 2022 EDUCAUSE Horizon Report Teaching and Learning Edition . www.learntechlib.org/p/221033/ .
  • Peters, M. A., Rizvi, F., McCulloch, G., Gibbs, P., Gorur, R., Hong, M., Hwang, Y., Zipin, L., Brennan, M., Robertson, S., Quay, J., Malbon, J., Taglietti, D., Barnett, R., Chengbing, W., McLaren, P., Apple, R., Papastephanou, M., Burbules, N., … Misiaszek, L. (2020). Reimagining the new pedagogical possibilities for universities post-Covid-19. Educational Philosophy and Theory, 1-44. 10.1080/00131857.2020.1777655.
  • Pilcher SC. Hybrid course design: A different type of polymer blend. Journal of Chemical Education. 2017; 94 (11):1696–1701. doi: 10.1021/acs.jchemed.6b00809. [ CrossRef ] [ Google Scholar ]
  • Rindermann H, Amelang M. Entwicklung und Erprobung eines Fragebogens zur studentischen Veranstaltungsevaluation. Empirische Pädagogik. 1994; 8 (2):131–151. [ Google Scholar ]
  • Ryan RM. Control and information in the intrapersonal sphere: An extension of cognitive evaluation theory. Journal of Personality and Social Psychology. 1982; 43 (3):450. doi: 10.1037/0022-3514.43.3.450. [ CrossRef ] [ Google Scholar ]
  • Sadler DR. Grade integrity and the representation of academic achievement. Studies in Higher Education. 2009; 34 (7):807–826. doi: 10.1080/03075070802706553. [ CrossRef ] [ Google Scholar ]
  • Saichaie K. Blended, flipped, and hybrid learning: Definitions, developments, and directions. New Directions for Teaching and Learning. 2020; 2020 (164):95–104. doi: 10.1002/tl.20428. [ CrossRef ] [ Google Scholar ]
  • Schmied V, Hänze M. Testtheoretische Überprüfung eines Fragebogens zu Kompetenzen der Selbst-und Studienorganisation und lernrelevanten Emotionen bei Studierenden. Die Hochschullehre. 2016; 2 (16):1–16. [ Google Scholar ]
  • Shim TE, Lee SY. College students’ experience of emergency remote teaching due to COVID-19. Children and Youth Services Review. 2020; 119 :105578. doi: 10.1016/j.childyouth.2020.105578. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Smith K, Hill J. Defining the nature of blended learning through its depiction in current research. Higher Education Research & Development. 2019; 38 (2):383–397. doi: 10.1080/07294360.2018.1517732. [ CrossRef ] [ Google Scholar ]
  • Spanjers I, Könings K, Leppink J, Verstegen D, de Jong N, Czabanowska K, van Merrienboer J. The promised land of blended learning: Quizzes as a moderator. Educational Research Review. 2015; 15 :59–74. doi: 10.1016/j.edurev.2015.05.001. [ CrossRef ] [ Google Scholar ]
  • Tucker R, Morris G. By design: Negotiating flexible learning in the built environment discipline. Research in Learning Technology. 2012; 20 (1):n1. doi: 10.3402/rlt.v20i0.14404. [ CrossRef ] [ Google Scholar ]
  • Vanslambrouck S, Zhu C, Lombaerts K, Philipsen B, Tondeur J. Students’ motivation and subjective task value of participating in online and blended learning environments. The Internet and Higher Education. 2018; 36 :33–40. doi: 10.1016/j.iheduc.2017.09.002. [ CrossRef ] [ Google Scholar ]
  • Vo HM, Zhu C, Diep NA. The effect of blended learning on student performance at course-level in higher education: A meta-analysis. Studies in Educational Evaluation. 2017; 53 (Supplement C):17–28. doi: 10.1016/j.stueduc.2017.01.002. [ CrossRef ] [ Google Scholar ]
  • Vo HM, Zhu C, Diep NA. Students’ performance in blended learning: Disciplinary difference and instructional design factors. Journal of Computers in Education. 2020; 7 (4):487–510. doi: 10.1007/s40692-020-00164-7. [ CrossRef ] [ Google Scholar ]
  • Wade W. Introduction. In: Wade W, Hodgkinson K, Smith A, Arfield J, editors. Flexible Learning in Higher Education. Routledge; 1994. pp. 12–17. [ Google Scholar ]

Application of the UTAUT model to understand learning behavior using Online Video Conference media for RPL students

New citation alert added.

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Information & Contributors

Bibliometrics & citations, index terms.

Social and professional topics

Professional topics

Computing education

Adult education

Recommendations

Examining students' intention to continue using blogs for learning.

This study is designed to investigate factors influencing undergraduate students' continuance intention to use blogs for learning in a management information systems course. Constructs from three theoretical frameworks, i.e., social-cognitive theory, ...

Using the UTAUT model to understand students’ usage of e-learning systems in developing countries

Research on information systems has identified a variety of factors across a range of adoption models that determine their acceptance. In this research, the unified theory of acceptance and use of technology (UTAUT), which integrates determinants ...

Towards a Japanese Language Learning Process Based on Japanese Dubbing -- A Case Study on University Students

The purpose of this paper was to examine the factors influencing adoption of Japanese language learning process based on Japanese dubbing in university students. According to the process of Japanese dubbing, this paper extended activity playfulness to ...

Information

Published in.

cover image ACM Other conferences

Association for Computing Machinery

New York, NY, United States

Publication History

Permissions, check for updates, author tags.

  • Online Learning
  • Online Video Conference
  • Recognition of Prior Learning
  • Technology Acceptance Model
  • Research-article
  • Refereed limited

Funding Sources

  • EQUITY Program, Lembaga Pengelola Dana Pendidikan (LPDP), Ministry of Finance, Indonesia

Contributors

Other metrics, bibliometrics, article metrics.

  • 0 Total Citations
  • 0 Total Downloads
  • Downloads (Last 12 months) 0
  • Downloads (Last 6 weeks) 0

View Options

Login options.

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

View options.

View or Download as a PDF file.

View online with eReader .

HTML Format

View this article in HTML Format.

Share this Publication link

Copying failed.

Share on social media

Affiliations, export citations.

  • Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
  • Download citation
  • Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

Knee Osteoporosis Diagnosis Based on Deep Learning

  • Research Article
  • Open access
  • Published: 12 September 2024
  • Volume 17 , article number  241 , ( 2024 )

Cite this article

You have full access to this open access article

research paper about flexible learning

  • Amany M. Sarhan   ORCID: orcid.org/0000-0002-0151-9619 1 ,
  • Mohamed Gobara   ORCID: orcid.org/0009-0004-0060-0134 2 ,
  • Shady Yasser   ORCID: orcid.org/0009-0006-5857-3938 2 ,
  • Zainab Elsayed   ORCID: orcid.org/0009-0007-2833-9152 2 ,
  • Ghada Sherif   ORCID: orcid.org/0009-0006-2968-2545 2 ,
  • Nada Moataz   ORCID: orcid.org/0009-0003-0921-1621 2 ,
  • Yasmen Yasir   ORCID: orcid.org/0009-0001-0604-3402 2 ,
  • Esraa Moustafa   ORCID: orcid.org/0009-0003-7119-5749 2 ,
  • Sara Ibrahim   ORCID: orcid.org/0009-0007-7657-7139 2 &
  • Hesham A. Ali 2 , 3  

Osteoporosis, a silent yet debilitating disease, presents a significant challenge due to its asymptomatic nature until fractures occur. Rapid bone loss outpaces regeneration, leading to pain, disability, and loss of independence. Early detection is pivotal for effective management and fracture risk reduction, yet current diagnostic methods are time-consuming. Despite its importance, research addressing early diagnosis remains limited. Deep learning, particularly convolutional neural networks (CNNs), has emerged as a potent tool in image analysis. This paper presents a novel approach utilizing transfer learning with CNNs for osteoporosis detection from X-ray images. The proposed approach not only achieves a high accuracy of osteoporosis diagnosis but also offers a revealed feature map that can guide medical professionals for osteoporosis diagnosis. The innovation lies in a dual strategy: (i) a model integrating transfer learning from CNN architectures such as AlexNet, VGG-16, ResNet-50, VGG-19, InceptionNet, XceptionNet, and a custom CNN, and (ii) a dataset collection augmentation mechanism to enhance learning accuracy. The study includes binary and multiclass classification of knee joint X-ray images into normal, osteopenia, and osteoporosis groups, utilizing a dataset of 1947 knee X-rays for training and testing. Performance comparisons against state-of-the-art models reveal the proposed VGG-19 model achieves the highest accuracy at 92.0% for multiclass and 97.5% for binary. These findings underscore the potential of deep learning with transfer learning in aiding early osteoporosis detection, thereby mitigating fracture risks.

Explore related subjects

  • Artificial Intelligence
  • Medical Imaging

Avoid common mistakes on your manuscript.

1 Introduction

Knee pain is a frequent complaint that affects people of all ages. It is also an irreversible condition that causes issues and has an impact on our lives and future. The first type of knee problems is knee osteoarthritis, which is the most common joint disorder, while the second case is knee osteoporosis, which can progress without symptoms until a broken bone occurs. The later begins as osteopenia, which is loss of bone mass or bone mineral density, then escalates to be osteoporosis. Osteoporosis is detected when both bone mineral density and bone mass decrease resulting in defect of the structural of bone tissue.

Numerous studies have delved into the classification of knee osteoporosis, each presenting unique findings and methodological nuances. For instance, Wani and Arora’s research published in 2020 reported a commendable accuracy of 91%. However, upon closer examination of the confusion matrix, the actual accuracy was found to be lower. In addition, the models exhibited high validation loss, suggesting potential limitations in generalization. In contrast, Kumar et al.’s study in 2023 showcased a higher accuracy compared to Wani & Arora’s findings. Nevertheless, the research noted significant drops in model performance during training, indicating potential instability or overfitting issues. Similarly, Abubakar et al.’s research in 2022 demonstrated promising accuracy in binary classification. However, the study lacked reporting on validation loss and encountered difficulties in correctly classifying certain instances, highlighting the need for improved model robustness. These studies collectively underscore several crucial gaps in the current research landscape. Key areas of concern include limited dataset size, insufficient picture quality, and the necessity for a more robust and comprehensive convolutional neural network (CNN)-based architecture for osteoporosis detection from X-ray images. Moreover, our own research endeavors have yielded promising results, particularly in achieving higher accuracy in both multiclass and binary classification tasks. Addressing the identified gaps presents an opportunity for advancing the field and enhancing the efficacy of knee osteoporosis classification methodologies. Methodologies such as constrained multi-dimensional (CM) mathematical algorithms and meta-heuristic algorithms (MAs) [ 43 ] showcase the potential for optimizing complex models in healthcare diagnostics.

Osteoporosis is a major public health problem in many countries around the world. It is a silent danger that increases the incidence of it among the elderly. The statistics collected by the World Health Organization (WHO) stated that Osteoporosis is considered a health problem of great importance to the elderly, as the rate of bone fractures in women due to this disease ranges between 30 and 40%100 people. That is, an increase of 132 percent, and by 2050, experts expect this number to reach one billion. It affects men and women from all racial groups, and can rob people of their independence and quality of life. However, osteoporosis could be prevented and treated; therefore, its effect can be eliminated if it is early detected [ 1 , 2 , 3 ]. That makes it very important to check our bone health and to detect osteoporosis.

The normal bone modeling process involves the creation of new bone and the removal of the old bone. Osteoporosis causes imbalance in this process so bone loss occurs more rapidly than bone formation. Its symptoms are not obvious that leads to fracture in many people, especially the elderly, causing pain, disability and loss of free movement. The treatment of osteoporosis can take long time, where that time can be reduced if we detect its early stage called osteopenia. X-rays are used in the bone density test to determine how many grams of calcium and other bone minerals are packed into a section of bone. Doctors often require this test if a fracture occurs, a noticed loss of patient’s height (about 3.5 cm) or suffer of certain hormone drop.

Figure  1 a shows a sample of X-ray image for three cases: normal, osteopenia and osteoporosis knee. We can clearly see that the knee is still having good bone mass even it has osteopenia (Fig.  1 b) than that of the one having osteoporosis (Fig.  1 c). Figure  1 d also shows areas of decreased bone density of patellas, femoral and tibial condyles. Finally, Fig.  1 e gives a detailed measures of knee densities in an X-ray image. Through these images and due to its importance, it is a must to detect this disease in early stages to enable better treatment results.

figure 1

X-ray images for a normal, b osteopenia, c osteoporosis knee, d noticed change in bone density, and detailed measures of knee densities

Machine learning has an effective rule in understanding diagnosis and treatment of osteoporosis, as its algorithms can analyze large datasets to identify patterns to how individuals respond to different treatments. It can also monitor changes about health overtime and through natural language processing (NLP) algorithms can extract valuable information from electronic health records that facilitates cohesive analysis of patient histories. Convolutional neural network (CNN) models have gained prominence [ 4 , 5 ] due to their revolutionary success in diagnosing numerous diseases from pictures and many other useful applications, such as brain tumor detection and segmentation [ 6 ], COVID detection [ 7 ], cancer detection [ 8 ], human activity identification [ 9 ], and age and face detection [ 10 ].

Osteoporosis can be found in many parts of the human body including teeth, hip, spine, hand, and knee. Therefore, automated systems can detect suitable changes in bone density and structure that may be indicative of osteoporosis. Several work have been proposed for osteoporosis in teeth [ 11 ], hip [ 12 ], and spine [ 13 ], while few have been directed to knee osteoporosis despite the fact that the knee is the most strained joint since it bears the body’s weight and is responsible for movement. With an aging population, the prevalence of osteoporotic fractures around the knee rises especially in with women. X-ray images have been the major media in osteoporosis detection. It is the most common imaging technique among the medical community to find bone pathologies as it also takes images of whole body. X-ray images greatly help in fracture diagnosis, dislocation of joints, and detecting changes in the bone density and architecture. Although it is the main tool for the doctors to diagnose osteoporosis, it does not always easily detect osteoporosis unless the doctor is professional. There are some researches that dedicated their work to osteoporosis detection [ 11 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 ]; however, their accuracies are insufficient to rely on.

In this research, we introduce two deep learning models aimed at detecting knee osteoporosis in X-ray images, considering the detection or absence of osteoporosis. We also introduce several deep learning models targeting the differentiation between knee osteoporosis and osteopenia in X-ray images as multiclass classification problem. Our models not only excel in classification but also assimilate intricate abstract representations from raw inputs, courtesy of their layered architecture. Our study is geared towards addressing pivotal challenges in knee osteoporosis detection, with the following distinctive contributions:

Model architecture: We investigate various deep learning models of convolutional neural networks utilizing transfer learning approach to ensure detection accuracy and efficiency, including VGG-19, VGG-16, AlexNet, ResNet-50, ExceptionNet, and InceptionNet, as well as a CNN specifically designed for accurate knee osteoporosis detection.

The detection aim is binary classes for osteoporosis detection and multiclass for osteoporosis and osteopenia detection. As such, our work serves both directions.

Dataset Use: The model is trained and evaluated on an assembled dataset [ 22 ], which is a combination of four independent datasets [ 23 , 24 , 25 , 26 ]. Furthermore, after combining the datasets, data augmentation is employed to solve the unbalanced dataset problem.

This research paper is structured as follows: an introduction followed by background and related work, and then a full discussion of the suggested models. The experiments, findings, and analysis are presented in order, followed by the conclusion and list of references.

2 Background and Related Work

As the latest trend in complex and powerful deep learning models, deep neural networks are designed to model geometric transformations and are designed such that they are maximally activated throughout the dataset, facilitating the task of helping us understand the structure of the model, i.e., mapping deep features to truth labels. Convolutional neural networks (CNNs), a network inspired by the visual functions of animal brains, were first proposed by Fukushima in the late 1970s, but the current network framework was developed by LeCun in 1998. The development of CNNs is based on the backpropagation algorithm, which is It is necessary to adjust model parameters repeatedly and automatically. Various CNN designs have been developed to control the structure, learning method, number of training epochs of the corresponding layers, and hyperparameters, and CNN models can maintain deep networks for complex learning tasks [ 30 – 34 ].

Transfer learning (TL) and deep learning (DL) are two important theories in machine learning, and it is essential to understand both above all in the field of artificial intelligence (AI). The motivation behind learning is the assumption that intelligence gained from learning in one domain can be usefully reused to solve learning problems in other related or completely unrelated learning tasks. In the context of distance learning, such reuse learning is typically achieved by transferring early or middle level layers of deep models which involves demonstrating transfer strategies and how much knowledge and complexity is transferred to the structure of the target model [ 37 , 38 ].

2.1 Significance of Knee Osteoporosis Diagnosis

The importance of achieving more accurate measurements of bone density in the knee is enormous in reducing the serious consequences mentioned above. Dual-energy X-ray absorptiometry (DXA) examination or whole leg models for accurate estimation of knee mineral density are useless because BMD over the entire knee area is not of direct clinical significance, although most BMD measurements are reported to contain error. In clinical practice, the diagnosis of osteoarthritis of the knee is mainly based on X-ray radiographs according to the qualitative evaluation of bone tissue. Furthermore, there is a real unsaleable demand for a rapid, non-invasive and accurate diagnostic tool for knee osteoarthritis. Our study aims to develop a rapid diagnostic tool to aid widespread screening for early detection of knee osteoarthritis and support decision-making in clinical practice.

The underlying path of knee osteoarthritis is bone mineral density disease. Loss of bone mineral density in the region of the subarticular trabecular bone located in the intercondylar notch beneath the thickened cartilage has a negative correlation with the loss of mechanical properties. Therefore, it is considered a major indicator of knee osteoarthritis. However, accurately estimating knee mineral density in this discrete region is a difficult task, as the subarticular region of the knee is considered to have a peculiar response in postmenopausal women compared to bone mineral density in other parts of the human skeleton. Significant and complex challenges have generally been reported in the literature such as knee degeneration, delayed treatment, increased duration of post-fracture hospitalization, and major depressive disorder secondary to femoral neck fractures, which are particularly severe when the patient is an elderly woman.

2.2 Transfer Learning in Deep Learning

Deep learning (DL), or more precisely the class of machine learning called artificial neural networks, is dominant as a tool for solving high-dimensional prediction and control problems in computer vision, language, and controlling agents interacting with relatively simple environments. In the context of computer vision, a deep learning model is given an image as input, is expected to produce a vector of class probabilities, and ranks the identified key elements present in the image according to their probability of presence. These class probability vectors are generated by softmax transformation after computing a linear combination of inputs and weights at each layer of the network [ 37 , 38 , 42 ].

In the deep learning field, deep structures are mostly learned on a dataset that consists of pairs of data and labels. Deep learning systems typically need large supervised training, and they require large labeled data to perform well. But in some cases, large labeled datasets are not available, and labeling is costly. To this end, the learned features in one deep network can be reused to improve the performance of another neural network that has a different task than the original task. This practice is called transfer learning. As a result of using a pre-trained network for the new task, it is also possible to use a good model with less training.

Matching the right task and model in the right way is a complex and resource-intensive process. One way to make this work easier is to use known representations from other learning processes and use them as a starting point. One of the technologies for this purpose, transfer learning, is a technique that takes advantage of the features obtained in the source task to improve learning in the target task. The criteria for selecting the source and target tasks in transfer learning are not clear. For that reason, there are many ways to apply transfer learning in deep learning.

We can describe osteoporosis as medical condition characterized by weakened bones and increased fracture risk due to a loss of bone density. It commonly affects older adults, particularly postmenopausal women, osteoporosis, a silent disease causing weak and fragile bones, often leads to fractures with no evident symptoms until they occur, causing pain and disability. The traditional diagnostic methods are time-consuming. Early detection is crucial for effective management. Motivations to address osteoporosis stem from the desire to improve health outcomes, prevent debilitating fractures in an aging population, reduce the economic burden on healthcare systems, and advance scientific knowledge and innovations in medical science. The goal is to enhance overall well-being, emphasizing the importance of prevention, early detection, and effective management of osteoporosis.

2.3 Osteoporosis Diagnose Methodologies

Generally, the diagnosing of osteoporosis can be achieved using the next most common four different methodologies, as depicted in Fig.  2 . These four methodologies are:

X-ray and DXA where important signs of osteoporosis in an X-ray may appear such (cortical thinning—increased radiolucency). They gives very precise measurements at clinically relevant skeletal sites; however, the major disadvantages of DXA are that the machine is large (not portable) and expensive, and that it uses ionizing radiation.

Numerical laboratory test—postmenopausal women with low BMD (T-score below − 2.5) and/or fragility fracture have the some basic tests such as: biochemistry profile, 25-hydroxyvitamin D (25[OH]D) and complete blood count (CBC).

Computed tomography scan which is an imaging technique that provides detailed cross-sectional images of bones to check bone density, and structure, and identifying fractures.

Quantitative ultrasound (QUS) which is a non-invasive test that measures bone density using sound waves. It is quick procedure with no radiation exposure, suitable for certain population.

figure 2

The most common osteoporosis diagnose methodologies

Although several studies had observed that the sensitivity of X-ray and DXA in the diagnosing of osteoporosis is higher than other methods, the American College of Radiology (ACR) has issued guidance that X-ray and DXA is accurate tools for diagnosing. Insights gained from studying bone health can potentially influence a range of medical disciplines, from orthopedics to geriatrics, opening doors to novel treatments and preventive strategies for a variety of age-related conditions. Moreover, a concerted effort to address osteoporosis aligns with broader public health initiatives.

In exploring the landscape of AI applications in medical imaging, previous studies have predominantly focused on image recognition, feature extraction, and diagnostic assistance. In this section, we will briefly introduce the most recent works in these two fields. In [ 17 ], Wani & Arora handled the osteoporosis diagnosis in knee X-rays using transfer learning based on convolution neural network using multiclass dataset from Mendeley dataset [ 26 ]. Four models were used: AlexNet, VGG-16, ResNet and VGG-19 with accuracies: 91%, 86.30, 86.30% and 84.20%, respectively. In Ref. [ 18 ], Kumar et al. utilized a fuzzy rank-based ensemble model for accurate diagnosis of knee osteoporosis built using three models: Inception v3, Xception and ResNet 18. A multiclass dataset from Mendeley data [ 26 ] for Knee X-ray images was used. The models accuracies were: Inception v3 (89.8%), Xception (90.9%), and ResNet 18 (91.4%). In [ 21 ], by Abubakar, et al. used transfer learning models for osteoporosis classification on knee radiograph of RGB and grayscale images using binary Knee X-ray images dataset from Kaggle [ 23 ]. Two models were employed: GoogleNet with accuracy 90.0% and VGG-16 with accuracy 87%.

Yang 2022 [ 27 ] aimed to identify and classify knee diseases (osteoporosis and osteoarthritis) through X-ray images using deep learning. A binary classes dataset from kaggle [ 23 ] for Knee X-ray images was employed. They used three models: custom CNN with accuracy 77%, late-fusion with accuracy 71%, and VGG-16 with accuracy 82%. In Ref. [ 16 ], Dodamani and Danti worked with a binary dataset from Zydus Hospital [ 16 ] for spine, hand, leg, and knee X-ray images utilizing transfer learning for osteoporosis classification. Five models were trained: VGG-16, VGG-19, DenseNet-121, ResNet-50 and Inception V3, where the reported accuracies were 78%, 86%, 93%, 89% and 90%, respectively. A summary of the previous X-ray-based knee osteoporosis diagnosis using deep learning models are listed below along with their accuracy in Table  1 .

As a conclusion of these studies that investigated using deep learning models for osteoporosis diagnosis using knee X-ray images, we have identified that the performance (presented as model accuracy) are poor. More advanced models are required to enhance the detection accuracy of knee osteoporosis.

2.4 Mathematical Background and Formulations for Deep Learning Models

Deep learning models are associated with several building blocks that enable them to perform the feature extraction and classification process they are built to perform. Each deep learning model is built using some or all of these blocks, where each model has different architecture and, therefore, different performance. In the area of neural networks and deep learning, understanding the mathematical underpinnings of various activation functions and normalization techniques is crucial [ 35 , 36 , 39 ]. This section delves into the fundamentals of softmax, sigmoid, rectified linear unit (ReLU), and batch normalization, elucidating their roles in shaping the behavior and performance of neural networks.

2.4.1 Softmax Function

The softmax function is a cornerstone in multiclass classification tasks, particularly in the output layer of neural networks. It transforms the raw output scores into probabilities, ensuring that the sum of these probabilities across all classes equals one. Mathematically, given a vector of raw scores z , the softmax function computes the probability distribution σ( z ) as follows:

where z i represents the raw score for class i , and K denotes the total number of classes. The softmax function ensures that the network outputs a valid probability distribution, enabling intuitive interpretation and facilitating tasks such as classification.

2.4.2 Sigmoid Function

The sigmoid function is a widely used activation function, especially in binary classification tasks. It squashes the input values to the range [0, 1], effectively interpreting them as probabilities. The sigmoid function is defined as

where z represents the input to the function. Despite its vanishing gradient issue for extreme input values, the sigmoid function remains valuable in scenarios where binary decisions are required, such as determining the probability of a given input belonging to a certain class.

2.4.3 Rectified Linear Unit (ReLU)

ReLU is a popular activation function known for its simplicity and effectiveness in mitigating the vanishing gradient problem. It replaces negative input values with zero while leaving positive values unchanged. Mathematically, the ReLU function is defined as

ReLU’s simplicity and computational efficiency make it a preferred choice in many deep learning architectures, promoting faster convergence during training and alleviating issues associated with vanishing gradients.

2.4.4 Batch Normalization

Batch normalization is a normalization technique that addresses the internal covariate shift problem during training by normalizing the activations of each layer. The process involves the following steps:

Calculate mini-batch mean: compute the mean μB of the mini-batch

Calculate Mini-batch variance: compute the variance \({\upsigma }_{B}^{2}\) of the mini-batch

Normalize: the activations within the mini-batch using the mean and variance

Scale and shift: the normalized activations using learnable parameters γ and β

Here, ϵ is a small constant added to the variance to prevent division by zero, and γ and β are learnable parameters per feature dimension. Batch normalization not only accelerates convergence but also acts as a regularizer, reducing the reliance on techniques like dropout.

2.5 Convolutional Neural Network (CNN) Models

Convolutional Neural Networks (CNNs) are structured with three primary layers: convolutional, pooling, and fully connected. The convolutional layer employs filters to extract features from input images, while pooling reduces spatial dimensionality, promoting shift-invariance. Stacking multiple convolutional and pooling layers facilitates the extraction of higher-level feature maps. Fully connected layers typically follow this stack in a CNN, occurring before the output layer to perform reasoning tasks [ 28 ].

In the realm of deep learning, transfer learning involves two main approaches: feature extraction and fine-tuning. Feature extraction utilizes a standard dataset like ImageNet to remove the top classification layer. The remaining model serves as a feature extractor, capturing relevant features for a new dataset. A new classification model is then trained on the extracted features. Fine-tuning, on the other hand, utilizes pre-trained model weights initially and updates them as the training process progresses. During fine-tuning, the weights learned from the original dataset serve as a starting point and are adjusted for more precise features related to the new dataset, aiming to adapt general features to specific ones [ 29 ].

To mitigate overfitting due to limited training data, this study employed pre-trained weights from ImageNet and implemented a transfer learning strategy. For multi and binary classification, the final layer of advanced models such as VGG-16, VGG-19, AlexNet, XceptionNet, InceptionNet, and ResNet-50 underwent fine-tuning [ 30 , 32 , 40 , 41 ]. This involved replacing the fully connected layers with a SoftMax activation function and adding a dropout of 0.5 for regularization. Finally, a layer with dense connections and a SoftMax activation function was added, producing either two probability outputs in binary classification: ‘Normal’ and ‘Osteoporosis’, or three probability outputs ‘Normal’, ‘Osteopenia’ and ‘Osteoporosis’ for the multiclass classification.

2.5.1 Feature Extraction

Feature extraction in our models involves using pre-trained convolutional layers to extract important features from images. These layers apply filters to the input image, creating feature maps that capture visual patterns. Activation functions and max pooling further enhance and reduce the features. The extracted features can then be used for tasks like image classification. Using pre-trained layers, our models leverages learned knowledge from a large dataset to extract meaningful features effectively.

2.5.2 Classification

The classification process in our models refers to how the extracted features are used to assign a label or category to an input image. After the feature extraction step, the output from the last convolutional layer is typically flattened into a vector. This vector is then passed through one or more fully connected layers, also known as the classifier. These layers are responsible for making the final predictions based on the extracted features. The fully connected layers in our models usually consist of one or more dense layers, followed by a SoftMax activation function. The dense layers have learnable weights that are adjusted during the training process to map the extracted features to specific classes.

The SoftMax activation function then normalizes the output of the dense layers into a probability distribution over the possible classes. During training, the our models model is presented with a labeled dataset, and the weights of the fully connected layers are adjusted using techniques such as backpropagation and gradient descent to minimize the difference between the predicted probabilities and the true labels. During inference or testing, an input image is passed through the feature extraction layers of our models, and the resulting features are fed into the fully connected layers.

The output of the SoftMax layer represents the probabilities of the image belonging to each class. The class with the highest probability is considered the predicted class for the input image. Generally speaking, classification’s objective can be binary or multiclass. Binary classification is a method where the diagnostic outcome is divided into two categories or classes. In the case of knee osteoporosis diagnosis, a binary classification approach may involve categorizing patients into two groups: those with knee osteoporosis and those without knee osteoporosis. This approach simplifies the classification into a yes/no or positive/negative outcome.

Whereas multiclass classification, on the other hand, involves classifying the diagnostic outcomes into more than two categories. In the context of knee osteoporosis diagnosis, a multiclass classification approach may involve categorizing patients into multiple groups based on the severity of osteoporosis or different subtypes of knee-related bone disorders. For example, patients may be classified as having normal bone density, mild osteoporosis, moderate osteoporosis, or severe osteoporosis affecting the knee joint.

2.6 Problem Formulation

Osteoporosis, identified by weakened and brittle bones, is frequently referred to as a “silent disease” as it remains symptom-free until fractures manifest. The usual bone regeneration cycle is disturbed in osteoporosis, causing faster bone loss than formation. Symptoms are subtle, resulting in fractures that induce pain, disability, and a decline in independence. Existing diagnostic procedures are lengthy, underscoring the significance of early identification to effectively manage and diminish the risk of fractures. Deep learning has gained widespread acceptance in image analysis, representing a notable progress in recent decades. Existing research papers on osteoporosis have previously implemented algorithms, but many of them have reported relatively low accuracy rates [ 16 , 17 , 18 , 21 , 27 ]. In response to this limitation, our study focuses on introducing an algorithm that achieves notably higher accuracy in the context of knee osteoporosis diagnosis. By incorporating advanced deep learning techniques, specifically employing transfer learning with VGG-19 and exploring various architectures such as VGG-16 and ResNet, our model aims to surpass the accuracy levels documented in these earlier studies. This novel approach highlights our commitment to addressing the limitations of prior algorithms and pushing the boundaries of accuracy in osteoporosis detection.

2.7 Plan of Solution

Our ongoing research is focused on the creation of an advanced algorithm tailored for osteoporosis diagnosis, relying on a sophisticated blend of neural network architectures. These include renowned models such as AlexNet, VGG-16, ResNet, VGG-19, CNN, InceptionNet, and XceptionNet. This comprehensive integration of diverse architectures underscores our commitment to elevating the accuracy and efficacy of osteoporosis detection. By harnessing the capabilities of these cutting-edge models, we aim to not only enhance diagnostic precision but also contribute significantly to the evolution of methodologies in medical image analysis. Through this extensive exploration, our goal is to propel advancements in the understanding and application of osteoporosis diagnosis techniques, ultimately benefiting the broader landscape of healthcare research. In addition, we prepare a new dataset, which is a collection of available datasets [ 23 , 24 , 25 , 26 ] used in binary and multiclass classification for osteoporosis in previous work [ 16 , 17 , 18 , 21 , 27 ].

3 Proposed Detection Models

The purpose of this study is to construct a deep learning model based on CNN-empowered TL. In this proposed strategy, a specialized model is first used to mimic the significant projection in DL to affirm the importance of the indicators associated with the ground truth, and then performing transfer learning on the widely used convolutional neural networks. Subsequently, we demonstrate the clinical value of the selected feature map in the CNN learning process from another perspective to show the guiding properties of the CNN-derived feature representation for knee osteoporosis.

Diverse artificial intelligence (AI) empowered by machine learning and deep learning (DL) has significantly aided a more efficient and accurate decision support system in the biomedical field in the past decade. Convolutional neural networks (CNN), a type of DL method, has been extensively applied in ensemble learning techniques such as boosting and bagging. Transfer learning (TL) is also a strategy to improve the performance of CNN based on the combination of CNN’s feature extraction power and knowledge transfer across different datasets or domains. Threshold imitation-volumetric fractal dimension is closely associated with bone microarchitecture geometry and is also reported as a reliable index for knee osteoporosis diagnosis. However, the utility of deep learning empowered by TL and the guiding significance of the CNN-derived feature map on osteoporosis diagnosis has not been well explored in the literature.

Figure  3 depicts the different components and phases of the proposed architecture, first, the data are collected via bone scanner (or available datasets), and the next phase is to label each image. For the collected images, we split them into training data and testing data. For the training data, we apply data augmentation and data preprocessing then we split them to training and validation. Training the model is achieved using the training data, afterwards we calculate the model performance. In the testing phase, we test the model after applying the same preprocessing steps on the images. The goal of the testing phase is to predict the knee condition through the knee image and classify it into one of three classes: normal, osteopenia or osteoporosis.

figure 3

Proposed architecture for knee osteoporosis diagnosis

The proposed methodology initiates with meticulous data curation, involving the collection of datasets and precise labeling of knee images, as detailed in Sect.  4 . This forms the groundwork for subsequent model training. The data are then stratified into training and testing subsets, followed by a comprehensive treatment of the training data. This treatment includes sophisticated techniques such as augmentation and preprocessing, as outlined in Sect.  4.1 , aimed at enhancing model generalization.

To address the challenge of training deep learning models from scratch with a vast amount of data, we opt for transfer learning—a technique widely used in deep learning. Transfer learning constructs new models by reusing previously learned ones, leveraging prior task expertise to improve generalization. Knee X-ray images, sourced from various channels as discussed in Sect.  4.1 , undergo data augmentation to expand the image dataset and enhance model training accuracy. The dataset is subsequently divided into training and testing components. The training dataset is employed to train CNN classification models, while the testing dataset assesses the performance of these trained models. The training phase involves a bifurcation into training and validation data, culminating in the deployment of a sophisticated model for predicting knee conditions. This intricate classification task distinguishes among three classes: normal knee, osteopenia knee, and osteoporosis knee, providing nuanced diagnostic insights. Following the exploration of multiclass classification models, the subsequent section introduces multi-classes classification models. The framework’s robustness is then validated through rigorous performance evaluation on the test dataset, ensuring the model’s effectiveness in accurately classifying knee X-ray images across diverse cases. Detailed results are presented in Sect.  5 , establishing the model’s reliability in clinical applications.

3.1 Utilizing Transfer Learning in Our Models

Transfer learning is a pivotal strategy in the realm of knee osteoporosis diagnosis, where deep learning models are endowed with the ability to transcend their original training domains and assimilate knowledge from broader medical imaging contexts. This methodology capitalizes on the wealth of information encoded within pre-trained neural network architectures, particularly convolutional neural networks (CNNs), which have been extensively trained on expansive and diverse medical image datasets encompassing various musculoskeletal conditions.

The essence of transfer learning lies in its transformative impact on model adaptation and generalization. By fine-tuning pre-trained models on knee X-ray datasets specific to osteoporosis, we imbue these models with a nuanced understanding of bone density variations and related pathologies observed in knee imaging. This nuanced understanding is crucial for accurate and sensitive detection of osteoporotic manifestations, facilitating early diagnosis and intervention.

Moreover, transfer learning serves us in accelerating the learning process and convergence during model training. The transfer of knowledge from source tasks, such as bone density analysis or musculoskeletal imaging, empowers the model to discern subtle yet clinically significant features in knee X-rays, thereby enhancing diagnostic precision and efficacy. Beyond its immediate impact on diagnostic performance, transfer learning fosters a dynamic learning paradigm characterized by continual adaptation and refinement.

In essence, transfer learning stands as a cornerstone in our approach to knee osteoporosis diagnosis using deep learning, embodying a fusion of domain-specific expertise and broad-based knowledge derived from diverse medical imaging domains. This fusion not only elevates diagnostic accuracy but also instills resilience and adaptability in our models, positioning them at the forefront of precision medicine in musculoskeletal health.

3.2 VGG-19 for Multiclass Model

VGG-19 is a convolutional neural network (CNN) architecture that was developed by researchers at the Visual Geometry Group (VGG) at the University of Oxford [ 40 , 41 ]. It is an extension of the VGG-16 architecture and was introduced in the 2014 paper titled “Very Deep Convolutional Networks for Large-Scale Image Recognition”. VGG-19 is known for its simplicity and uniformity in design. It consists of 19 layers, including 16 convolutional layers and 3 fully connected layers. All the convolutional layers in VGG-19 have a 3 × 3 filter size and a stride of 1, and they are followed by rectified linear unit (ReLU) activation functions. The max pooling operation with a 2 × 2 window and a stride of 2 is applied after every two convolutional layers, reducing the spatial dimensions of the feature maps. In our system, we used VGG-19 for multiclass classification, where the classification result is: normal, osteopenia or osteoporosis as shown in Fig.  4 .

figure 4

Architecture of VGG-19 for multiclass

3.3 VGG-16 for Multiclass Model

VGG-16 represents a convolutional neural network (CNN) architecture recognized as one of the most advanced vision model designs to date. Widely acknowledged, VGG-16 serves as a foundational architecture for diverse computer vision applications, encompassing tasks such as image classification and object detection, and image segmentation. Its pre-trained weights are often used as initialization for other models or as a feature extractor for transfer learning. The details of the VGG-16 model for multiclass problem are given in Fig.  5 .

figure 5

Architecture of VGG-16 for multiclass model

3.4 InceptionNet for Multiclass Model

InceptionNet, introduced by Google in 2014, stands as a convolutional neural network architecture distinguished for its utilization of inception modules—collections of layers adept at capturing both local and global features from input data [ 34 ]. Engineered with the aim of enhanced efficiency and quicker training compared to alternative deep convolutional neural networks, InceptionNet has found application in tasks such as image classification and object detection, and face recognition and has been the basis for popular neural network architectures such as Inception-v4 and Inception-ResNet. The details of the InceptionNet model for multiclass problem are given in Fig.  6 .

figure 6

Architecture of InceptionNet for multiclass model

3.5 XceptionNet for Multiclass Model

XceptionNet, also known as Extreme Inception, is a deep convolutional neural network architecture that was introduced in 2016 by François Chollet, the creator of the Keras deep learning library [ 35 ]. XceptionNet is designed to perform image classification tasks and is known for its exceptional performance and efficiency. The name “Xception” is a combination of “Extreme” and “Inception” because the architecture is an extension of the Inception architecture, which was introduced by Google researchers. The Inception architecture was originally designed to improve computational efficiency using multiple parallel convolutional layers of different sizes. The details of the XceptionNet model for multiclass problem are given in Fig.  7 .

figure 7

Architecture of XceptionNet for multiclass model

3.6 ResNet for Multiclass Model

ResNet-50 represents a convolutional neural network characterized by its depth of 50 layers. It allows for the loading of a pre-trained version, having been trained on an extensive dataset comprising over a million images sourced from the ImageNet database [ 35 ]. The term “ResNet” is derived from “Residual Network,” delineating a specific category of convolutional neural networks (CNN) detailed in the 2015 paper “Deep Residual Learning for Image Recognition” by He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. Commonly employed in computer vision applications, CNNs are integral to ResNet-50, featuring a structure of 50 layers, including 48 convolutional layers, one MaxPool layer, and one average pool layer. Residual neural networks, a subtype of artificial neural networks (ANN), are constructed by stacking residual blocks. The details of the ResNet-50 model for multiclass problem in given in Fig.  8 .

figure 8

Architecture of ResNet for multiclass model

3.7 AlexNet for Multiclass Model

The model comprises convolutional layers with ReLU activation and max pooling, capturing hierarchical features. An adaptive average pooling layer ensures consistent input size, and a sequence of fully connected layers with dropout and ReLU forms the classifier [ 33 ]. The forward method orchestrates the flow of input data through these layers. This implementation is versatile for image classification tasks, with the flexibility to adjust parameters such as the number of classes for specific use cases. The details of the AlexNet model for multiclass problem are given in Fig.  9 .

figure 9

Architecture of AlexNet for multiclass model

3.8 Custom CNN for Multiclass Model

A custom CNN refers to a CNN architecture that is tailored to solve a specific task or problem. It involves designing the network architecture by choosing the appropriate number and type of layers, as well as their configurations, based on the characteristics of the input data and the objectives of the task. The proposed model is a deep convolutional neural network (CNN) for image classification. It consists of 13 convolutional layers, 4 max pooling layers, and 3 fully connected layers. The convolutional layers have filters ranging from 128 to 512, with kernel sizes from 1 × 1 to 8 × 8. Batch normalization is applied after each convolutional layer. The output layer has 3 units with softmax activation. The model has a total of approximately 15 million parameters. The model structure is depicted in Fig.  10 .

figure 10

Architecture of custom CNN for multiclass model

3.9 VGG-19 for Binary Model

VGG-19 is also employed in our work for binary class classification, where the result is normal and osteoporosis. The details of the VGG-19 model for binary class problem are given in Fig.  11 .

figure 11

Architecture of VGG-19 for binary classes’ model

3.10 Custom CNN for Binary Classes

A custom CNN is also built for binary classes’ classification problem. The details of the custom CNN model for binary classes’ problem in given in Fig.  12 .

figure 12

Architecture of custom CNN for binary classes

4 Dataset Description

Instead of utilizing a single dataset to train our models, we construct our gathered dataset [ 22 ], which is made up of four datasets [ 23 , 24 , 25 , 26 ]. Figure  13 depicts the method we used to acquire the multiclass dataset. Our collection currently includes 793 pictures for osteoporosis, 374 for osteopenia, and 780 for healthy (normal) patients. The datasets details are freely accessible via the URLs [ 23 , 24 , 25 , 26 ].

figure 13

Building our dataset from 4 different datasets

In our work, we use two levels of data augmentation. The first level’s goal is to increase the number of images in the osteopenia class category, which is now lower than the other two classes (normal and osteoporosis, as shown in Fig.  13 . The second level of data augmentation is applied to the entire dataset in order to enhance the size of the dataset to accommodate the deep learning models’ learning process.

figure a

Proposed dataset collection algorithm

4.1 Data Augmentation

In this section, we delve into the practical implementation of image data augmentation using the Python Imaging Library (Pillow). Figure  14 depicts an illustration of the result of the used three fundamental augmentation techniques: flipping, scaling, and rotation. The implementation emphasizes the simplicity and versatility of data augmentation. These transformations, when applied individually or in combination, contribute to the creation of a more diverse training dataset. The resulting images showcase the variations introduced by each augmentation technique, highlighting their potential to enhance a neural network’s ability to handle diverse scenarios.

figure 14

a Dataset images augmentation using flipping, scaling, and rotation functions, b examples of dataset images

It is worth noting that the parameters, such as the scaling factor and rotation angle, can be adjusted based on the specific requirements of the dataset and the targeted application. Fine-tuning these parameters allows for a tailored augmentation strategy that aligns with the characteristics of the data.

5 Experiments and Results

This section presents an evaluation of the results achieved by the proposed system in osteoporosis detection using knee X-ray images. The evaluation includes metrics such as accuracy, AUC, precision, F1-score, recall, support, and error rate. The various classification models in this paper were implemented using Python 3 and the Keras framework. We utilized the Google Colab pro version with proper RAM size, and a graphical processing unit (GPU) processor assistance. Python Imaging Library (Pillow) was used to raise the number of images in each class after merging the dataset from the different sources. All images were processed by ImageDataGenerator class in Keras to perform preprocessing operations, such as resizing, and normalization. The generated images were fed into our proposed binary and multiclass classification deep learning models.

We have conducted two types of experiments according to the output of the models, as we have binary classification models and multiclass classification models. For training and validating our models, ADAMAX optimizer and suitable fitness functions were used, where each model ran around 14–100 epochs according to the model type (as will be shown in results figures) and a batch size of 64. Only in the custom CNN models we used different number of epochs (up to 20), dropout and batch size values to adjust the performance of the model results. It was compiled with categorical cross-entropy loss and SGD optimizer with a learning rate of 0.001.

In the multiclass models, which classify the images into one of three classes: normal, osteopenia, and osteoporosis, the dataset was split into 80% for training and 20% for testing, utilizing a learning rate of 0.001. In the binary models, in which the classes are: normal and osteoporosis, the dataset comprises a total of 1573 images, with 793 for osteoporosis and 780 for healthy cases. This dataset was also split into 80% for training and 20% for testing, utilizing a learning rate of 0.001. The architecture of the models is summarized in Table  2 . The prediction steps is given in details in Algorithm 2.

figure b

Proposed prediction algorithm

5.1 Evaluation Metrics

The subsequent deep learning classification metrics were employed to gain a deeper insight into the models’ performance across the two different classification approaches: binary and multiclass. During the next experiments, the evaluation parameters such as accuracy, error, recall, and precision will be calculated. Then, F1-score, micro-average and macro-average related to precision and recall will be measured. The confusion matrix is used to calculate the values of these parameters. Various formulas are used as a summarization of the confusion matrix as depicted in Table  3 . Finally, the run time of knee osteoporosis detection algorithms should be measured using the second unit. Through these performance metrics, we utilized many of those mentioned in Table  3 , such as Accuracy, Recall, Precision, and F-measure, to confirm the effectiveness of the models we developed.

5.2 Models Classification Results

Figures  15 , 16 , 17 , 18 , 19 , 20 , and 21 illustrate the performance results of our trained models for multiclass models on the collected dataset. The figures present the performance of various deep learning models on classifying knee X-ray images. The models were trained and tested on the collected dataset [ 22 ]. The performance of each model is measured by its classification accuracy. In the multiclass setting, the models used are VGG-19, custom CNN, VGG-16, InceptionNet, XceptionNet 3, ResNet, and AlexNet. These models are arranged from the highest to the lowest accuracy, showing their performance in descending order.

figure 15

VGG-19 results for multiclass classification

figure 16

Custom CNN results for multiclass classification

figure 17

VGG-16 results for multiclass classification

figure 18

InceptionNet results for multiclass classification

figure 19

XceptionNet results for multiclass classification

figure 20

ResNet-50 results for multiclass classification

figure 21

AlexNet results for multiclass classification

Figures  22 and 23 also illustrate the performance results of VGG-19 and custom CNN trained models for binary models on the collected dataset. These binary models are arranged from the highest to the lowest accuracy, showcasing their performance in descending order.

figure 22

VGG-19 results for binary class classification

figure 23

Custom CNN results for binary class classification

Table 4 augments the comparison of our models accuracies, precision, F1-score and recall for binary and multiclass classification, while Table  5 gives a comparison of our models accuracies for binary and multiclass against state-of-the-art models [ 16 , 17 , 18 , 21 , 27 ].

From the figures and the tables, we can conclude that the highest classification accuracy is achieved by VGG-19 at 92.0%, followed by custom CNN at 90.4%, and VGG-16 at 89.9%. The other models have relatively lower performance, with InceptionNet at 89.7%, XceptionNet 3 at 87.9%, ResNet at 81.4%, and AlexNet at 78.4%.

In the binary classification setting, only two models are compared: VGG-19 and custom CNN. VGG-19 achieves an impressive classification accuracy of 97.5%, while custom CNN performs slightly lower at 95.6%. Based on these figures, it can be concluded that VGG-19 is the best-performing model for classifying knee X-ray images in both multiclass and binary classification settings, while custom CNN also demonstrates competitive performance.

5.3 Evaluation of Training Results

As well as the accuracy comparison, Fig.  24 displays the confusion matrix for the six classification models, depicting the system’s predictions for our multiclass models on the dataset. These models are arranged from the highest to the lowest accuracy, providing a visual representation of their performance.

figure 24

Confusion matrices for the 6 multiclass classification models

Figure  25 displays the confusion matrix for the two classification models, depicting the system’s predictions for our binary models on the dataset. These binary models are arranged from the highest to the lowest accuracy, providing a clear visual representation of their performance in binary classification.

figure 25

Confusion matrices for VGG-19 and custom CNN binary classification models

Tables 6 and 7 show the training and testing results of multiclass custom CNN with different dropout and batch size values, while Tables 8 and 9 show the training and testing results of binary class custom CNN with different dropout and batch size values. We can see the consistency of the model across the various values of dropout and batch size values, which indicates the robustness of the model. The best model training and testing parameters are:

Training results of multiclass

No. of epochs = 20, without dropout, batch size = 4 which yields loss= 0.0025 and accuracy = 99.94%

Testing results of multiclass

No. of epochs = 20, dropout = 0.5, batch size = 16 which yields loss= 0.3346 and accuracy = 91.03%

Training results of binary class

No. of epochs = 15, without dropout, batch size = 4 which yields loss= 9.9635e-04 and accuracy = 100%

Testing results of binary class

No. of epochs = 15, without dropout, batch size = 4 which yields loss = 0.2054 and accuracy = 94.29%

The custom CNN models (both multiclass and binary classes) have smaller architecture compared to VGG-19 model, thus have smaller number of trainable parameters. It took only 20 epochs to reach reliable accuracies compared to 100 epochs for VGG-19 model training.

5.4 Results’ Analysis

This study rigorously compares the performance of several classification models for knee osteoporosis diagnosis employing advanced deep learning techniques. Our approach, featuring transfer learning with VGG-19, achieves a compelling 92.0% accuracy for multiclass classification, outperforming the previous work [ 17 ] accuracy of 84.20%. Across various architectures, our VGG-16 achieves 89.9% accuracy, surpassing the 86.30% reported in Ref. [ 17 ], while ResNet demonstrates competitive accuracy at 81.4% compared to 86.30% in Ref. [ 17 ]. Although our AlexNet accuracy of 78.4% is lower than the results in Ref. [ 17 ] 91%, it highlights nuanced model performance differences. We also notice that the results in Ref. [ 17 ] are taken as the best results in each epoch rather than the average of all epochs as we have implemented. That was the reason to report our results in Table  7 .

Notably, our research extends beyond multiclass scenarios to include binary classifications, showcasing the efficacy of custom CNNs with accuracies of 90.4% and 95.6% in multiclass and binary contexts, respectively. InceptionNet attains a robust 89.7% accuracy in multiclass scenarios. Diverging from Ref. [ 17 ], our explicit consideration of binary classification scenarios, particularly with VGG-19 for binary classes (97.5%) and custom CNN for binary classes (95.6%), underscores the versatility and discriminative capabilities of our proposed models.

This comprehensive analysis positions our methodologies as superior in multiclass scenarios and emphasizes the adaptability and effectiveness of our proposed models in addressing both multiclass and binary classification challenges. As deep learning in medical image analysis advances, our research contributes valuable insights and benchmarks to propel advancements in knee osteoporosis diagnosis. Upon meticulous examination of the confusion matrix calculations detailed in their research paper, it has come to our attention that discrepancies exist between the reported accuracy figures and the actual values derived from the confusion matrix for several key models.

In the case of VGG-19, the reported accuracy stands at 84.30%, but our scrutiny of the confusion matrix reveals a more accurate figure of 81.81%. Similarly, for VGG-16, the reported accuracy of 86.30% is contradicted by our calculations, which yield a more accurate value of 81.81%. While ResNet aligns closely with the reported accuracy of 86.36%, a significant discrepancy is apparent in the case of AlexNet. Their reported accuracy of 91% contrasts starkly with the rigorously derived accuracy of 77.27% from our in-depth examination of the confusion matrix.

Consequently, and from the results of the conducted experiments after validating the accuracy of the proposed models, we recommend the VGG-19 model for both binary and multiclass classifications as it achieved 92.0% accuracy, 92.0% recall, 92.0% F1-score, and 92.0% precision for multiclass classification, and 97.5% accuracy, 97.0% recall, 97.0% F1-score, and 98.0% precision for binary classifications. However, the proposed custom CNN has comparative results with lower number of parameters and lower number of epochs for training. We opt that the presented deep models and their results may serve as a first step toward developing a knee problems diagnosis system from X-ray images, according to the high accuracy reached by the models.

6 Conclusion

The diagnosis of knee osteoarthritis using X-ray image currently faces difficulties in automated and accurate classification. In this paper, we propose a new deep learning method to classify different osteoarthritis diseases of the knee. The main objective contribution of this work is to presented an exceptionally automated procedure for osteoarthritis diagnosis utilizing deep learning. In this research, a learning and classification model was presented, which is based on integrating transfer learning from CNN architectures based on deep learning. A new mechanism for collecting the data set was also presented based on the image enlargement method provided by different sources. The detection has been applied to both binary and multiple classes to classify X-ray images of the knee.

In the introduced study, we have evaluated and compared the performance of popular CNN architectures, namely VGG-19 for binary, custom CNN for binary, VGG-19 for multi-classes, custom CNN for multiclass, VGG-16 for multiclass, InceptionNet, XceptionNet, ResNet and AlexNet in diagnosing osteoporosis from knee X-ray images. The X-ray images used were taken from the custom dataset that was classified into normal, osteopenia, and osteoporosis group with the help of a medically. Our demonstration highlighted the successful application of transfer learning from the object detection domain to achieve effective knee joint segmentation. The results show that the best performance was achieved by VGG-19 for binary with 97.5% accuracy and was achieved by VGG-19 for multiclass with 92.0% accuracy. The outcomes derived from all convolutional neural networks (CNNs) exhibited commendable diagnostic performance. These findings suggest that employing transfer learning with CNNs for osteoporosis diagnosis from knee X-rays can represent a cost-effective and readily accessible diagnostic tool. In subsequent efforts, additional data collection, particularly from normal and osteoporotic subjects, could enhance the dataset’s comprehensiveness. Furthermore, exploring the correlation between knee osteoporosis and osteoporosis at other anatomical sites could contribute to the development of a universal osteoporosis diagnostic system. Lastly, there is potential to construct a system capable of detecting osteoporosis by integrating clinical factors with image analysis. Importantly, our study underscores that, with adept model structuring and parameterization, a deep convolutional network can mitigate computational complexities without compromising robustness, particularly in managing extensive datasets.

Data Availability

The dataset used in this paper is publicly available online.

Cooper, C., Campion, G., Melton, L.: Hip fractures in the elderly: a world-wide projection. Osteoporos Int. 2 (6), 285–289 (1992)

Article   Google Scholar  

https://www.who.int/ar/data/gho/publications/world-health-statistics

https://www.Osteoporosis.foundation/wod2023-survey

Li, Q., Cai, W., Wang, X., Zhou, Y., Feng, D.D., Chen, M.: “Medical image classification with convolutional neural network," Proceedings of the IEEE 13th International Conference on Control Automation Robotics & Vision (ICARCV), pp. 844–848, (2014)

Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. J. Big Data 6 (1), 1–18 (2019)

Mahmud, I., Mamun, M., Abdelgawad, A.: A deep analysis of brain tumor detection from mr images using deep learning networks. Algorithms 16 (4), 176 (2023). https://doi.org/10.3390/a16040176

Arafa, D.A., Moustafa, H.E.D., Ali, H.A., Ali-Eldin, A.M.T., Saraya, S.: A deep learning framework for early diagnosis of Alzheimer’s disease on MRI images. In: Multimedia Tools and Applications, pp. 1–33. Springer, Cham (2022)

Google Scholar  

Balaha, H.M., Balaha, M.H., Ali, H.A.: Hybrid COVID-19 segmentation and recognition framework (HMB-HCF) using deep learning and genetic algorithms. Artif. Intell. Med. 119 , 102–156 (2021)

Shalaby, E., ElShennawy, N., Sarhan, A.: Utilizing deep learning models in CSI-based human activity recognition. In: Neural computing and applications, pp. 1–18. Springer, Cham (2022)

Sathyavathi, S., Baskaran, K.R.: An intelligent human age prediction from face image framework based on deep learning algorithms. Inform. Technol. Control 52 (1), 245–257 (2023)

Lee, K., Jung, S.K., Ryu, J.J., Shin, S.W., Choi, J.: Evaluation of transfer learning with deep convolutional neural networks for screening osteoporosis in dental panoramic radiographs. J. Clin. Med. 9 (2), 392 (2020)

Feng, S., Lin, S., Chiang, Y., Lu, M., Chao, Y.: Deep learning-based hip X-ray image analysis for predicting osteoporosis. Appl. Sci. 14 (1), 133 (2023)

Zhang, B., Yu, K., Ning, Z., Wang, K., Dong, Y., Liu, X., Liu, S., Wang, J., Zhu, C., Yu, Q., Duan, Y.: Deep learning of lumbar spine X-ray for osteopenia and Osteoporosis screening: a multicenter retrospective cohort study. Bone 140 , 115561 (2020)

Chen, Z., Zheng, H., Duan, J., Wang, X.: GLCM-based FBLS: a novel broad learning system for knee osteopenia and osteoporosis screening in athletes. Appl. Sci., MDPI 13 (20), 11150 (2023)

Sebro, R., De la Garza-Ramos, C.: Machine learning for opportunistic screening for Osteoporosis from CT scans of the wrist and forearm. Diagnostics, MDPI 12 (3), 691 (2022)

Dodamani, P.S., Danti, A.: Transfer learning-based osteoporosis classification using simple radiographs. Int. J. Onl. Biomed. Eng. 19 (8), 66 (2023)

Wani, M.I., Arora, S.: Osteoporosis diagnosis in knee X-rays by transfer learning based on convolution neural network. Multim. Tools Appl. 82 (9), 14193–14217 (2023)

Kumar, S., Goswami, P., Batra, S.: Fuzzy rank-based ensemble model for accurate diagnosis of osteoporosis in knee radiographs. IJACSA Int. J. Adv. Comput. Sci. Appl. (2023). https://doi.org/10.14569/IJACSA.2023.0140430

Ashames, M.M., Ceylan, M., Jennane, R.: Deep transfer learning and majority voting approaches for Osteoporosis classification. Int. J. Intell. Syst. Appl. Eng. (2021). https://doi.org/10.18201/ijisae.2021473646

Dzierżak, R., Omiotek, Z.: Application of deep convolutional neural networks in the diagnosis of Osteoporosis. Sens., MDPI 22 (21), 8189 (2022)

Abubakar, U.B., Boukar, M.M., Adeshina, S., Dane, S.: Transfer learning model training time comparison for Osteoporosis classification on knee radiograph of RGB and grayscale images. WSEAS Trans. Electron. 13 , 45–51 (2022)

https://kaggle.com/datasets/866059b7930a5c49cd77d94c1761840a19d88074cad74e8f0e0cfa2b236a6904

https://www.kaggle.com/datasets/mrmann007/Osteoporosis

https://www.kaggle.com/datasets/sachinkumar413/Osteoporosis-knee-dataset-preprocessed128x256

https://www.kaggle.com/datasets/stevepython/Osteoporosis-knee-xray-dataset

https://data.mendeley.com/datasets/fxjm8fb6mw/2

Yang, T.S.: "Recognition and classification of knee osteoporosis and osteoarthritis severity using deep learning techniques," (Doctoral dissertation, Dublin, National College of Ireland), (2022)

Vishnu, T., Saranya, K., Arunkumar, R., Devi, M.G.,:"Efficient and early detection of Osteoporosis using trabecular region," Proceedings of the IEEE Online International Conference on Green Engineering and Technologies (IC-GET), pp. 1–5, (2015)

Bengio, Y.: "Deep learning of representations for unsupervised and transfer learning," Proceedings of ICML workshop on unsupervised and transfer learning, pp. 17–36, (2012)

Hosny, K.M., Kassem, M.A., Foaud, M.M.: Classification of skin lesions using transfer learning and augmentation with Alexnet. PLoS ONE 14 (5), e0217293 (2019)

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 25 , 1097–1105 (2012)

Lu, S., Wang, S.-H., Zhang, Y.-D.: Detection of abnormal brain in MRI via improved AlexNet and ELM optimized by chaotic bat algorithm. Appl. MDPI (2020). https://doi.org/10.1007/s00521-020-05082-4

Salih, S.Q., Hawre, Kh., et al.: "Modified Alexnet convolution neural network for Covid-19 detection using chest X-ray images. KJAR (2020). https://doi.org/10.24017/covid.14

Guo, M., Du, Y.: "Classification of thyroid ultrasound standard plane images using ResNet-18 Networks," Proceedings of the IEEE 13 th International Conference on Anti-counterfeiting, Security, and Identification (ASID), pp. 324–328, (2019)

He, K, Zhang, X, Ren, S, Sun, J.: "Deep residual learning for image recognition," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, (2016)

Liu, D., Liu, Y., Dong, L.: G-ResNet: Improved ResNet for brain tumor classification. In: Proceedings of the International Conference on Neural Information Processing, pp. 535–545. Springer, Cham (2019)

Chapter   Google Scholar  

Yu, X., Wang, S.-H.: Abnormality diagnosis in mammograms by transfer learning based on ResNet18. Fund. Inform. 168 (2), 219–230 (2019)

Ebrahimi-Ghahnavieh, A., Luo, S., Chiong, R.: "Transfer learning for Alzheimer's disease detection on MRI images," Proceedings of the IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT), pp. 133–138, (2019)

Howard, J., Gugger, S.: A layered API for deep learning. Inform., MDPI 11 (2), 108 (2020)

Khan, Z., Khan, F.G., Khan, A., Rehman, Z., Shah, S., Qummar, S., Pack, S.: Diabetic retinopathy detection using VGG-NIN a deep learning architecture. IEEE Access 9 , 61408–61416 (2021)

Militante, S.V.: "Malaria disease recognition through adaptive deep learning models of convolutional neural network," Proceedings of the IEEE 6th International Conference on Engineering Technologies and Applied Sciences (ICETAS), pp. 1–6, (2019)

Simonyan, K., Zisserman, A.: "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556 , (2014)

Tutsoy, O.: Graph theory based large-scale machine learning with multi-dimensional constrained optimization approaches for exact epidemiological modeling of pandemic diseases. IEEE Trans. Pattern Anal. Mach. Intell. 45 (8), 9836–9845 (2023). https://doi.org/10.1109/TPAMI.2023.3256421

Download references

Open access funding provided by The Science, Technology & Innovation Funding Authority (STDF) in cooperation with The Egyptian Knowledge Bank (EKB).

Author information

Authors and affiliations.

Department of Computer and Control Engineering, Tanta University, Tanta, Egypt

Amany M. Sarhan

Department of Artificial Intelligence, Faculty of Artificial Intelligence, Delta University, Gamassa, Egypt

Mohamed Gobara, Shady Yasser, Zainab Elsayed, Ghada Sherif, Nada Moataz, Yasmen Yasir, Esraa Moustafa, Sara Ibrahim & Hesham A. Ali

Department of Computer and Systems Engineering, Mansoura University, Mansoura, Egypt

Hesham A. Ali

You can also search for this author in PubMed   Google Scholar

Contributions

AS and HAA: conceptualization of this study, adding the basic ideas writing and editing the manuscript. DS and YY: formal analysis, methodology, software, validation, writing original draft preparation SY, ZE, GS, NM, EM, AND SI: investigation, methodology, validation.

Corresponding author

Correspondence to Amany M. Sarhan .

Ethics declarations

Conflict of interest.

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Sarhan, A.M., Gobara, M., Yasser, S. et al. Knee Osteoporosis Diagnosis Based on Deep Learning. Int J Comput Intell Syst 17 , 241 (2024). https://doi.org/10.1007/s44196-024-00615-4

Download citation

Received : 09 January 2024

Accepted : 22 July 2024

Published : 12 September 2024

DOI : https://doi.org/10.1007/s44196-024-00615-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Knee osteoporosis
  • Deep learning
  • Convolutional neural network
  • Medical imaging
  • Image preprocessing
  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. (PDF) Exploring the Online Learning Self-efficacy of Teacher Education

    research paper about flexible learning

  2. Essay on flexible learning

    research paper about flexible learning

  3. Framework for flexible learning.

    research paper about flexible learning

  4. FLEXIBLE LEARNING AS NEW LEARNING DESIGN IN … / flexible-learning-as

    research paper about flexible learning

  5. (PDF) Effectiveness of an Online Classroom for Flexible Learning

    research paper about flexible learning

  6. (PDF) Flexible Learning Strategies in First through Fourth-Year Courses

    research paper about flexible learning

VIDEO

  1. Kraft paper flexible packaging roll printer

  2. Sony's new A4-sized digital paper notepad prototype

  3. Flexible silicon solar cells with high power-to-weight ratios

  4. Follow @FloreCakes

  5. DIY Paper Flexible Fish #shorts #youtubeshorts #craftshorts #origami

  6. Prepare Managerial Economics for HPSC AP Commerce with our platform. #hpsc #hpscassistantprofessor

COMMENTS

  1. Learning effectiveness of a flexible learning study programme in a

    Flexible learning addresses students' needs for more flexibility and autonomy in shaping their learning process, and is often realised through online technologies in a blended learning design. While higher education institutions are increasingly considering replacing classroom time and offering more blended learning, current research is limited regarding its effectiveness and modifying ...

  2. Facilitating flexible learning by replacing classroom time with an

    Most flexible learning initiatives focus on aspects of temporal and spatial flexibility in learning, ... For 19 papers, the same effect sizes were calculated (86.36 per cent agreement), and in three cases deviating results were found. ... Research on blended learning would greatly benefit from more meta-analysis-friendly reporting. The ...

  3. PDF Flexible Learning Adaptabilities in the New Normal: E-Learning ...

    Abstract: The Covid-19 pandemic has forced the educational systems to shift from traditional learning to flexible learning. Flexible learning is a combination of digital and non-digital technology that ensures the continuity of inclusive and accessible education in the form of online, offline, or blended modes of teaching and learning processes.

  4. Learning Effectiveness and Students' Perceptions in A Flexible Learning

    With flexible learning, students gain access and flexibility with r egard to at least one of t he following. dimensions: tim e, place, pace, le arning style, co ntent, assessm ent or learning path ...

  5. PDF Learning Effectiveness and Students' Perceptions in A Flexible Learning

    The paper is structured as follows: First, the flexible learning programme FLEX is used as an example to illustrate objectives and considerations when implementing flexible learning in a blended learning design. Then, the research design of the pilot FLEX course is introduced. Finally, the results are presented and discussed.

  6. Flexible learning spaces facilitate interaction, collaboration and

    Globally, many schools are replacing traditional classrooms with innovative flexible learning spaces to improve academic outcomes. Little is known about the effect on classroom behaviour. Students from nine secondary schools (n = 60, M age = 13.2±1.0y) were observed via momentary time sampling for a 30 minute period, in both a traditionally furnished and arranged classroom and a flexible ...

  7. PDF FLEXIBLE LEARNING AS NEW LEARNING DESIGN IN CLASSROOM PROCESS TO ...

    Flexible learning. According to Shurville et al. (2008) "Flexible Learning is a set of educational philosophies and systems, concerned with providing learners with increased choice, convenience, and personalisation to suit the learner. In particular, flexible learning provides learners with choices about where, when, and how learning occurs".

  8. Revisiting the Definitions and Implementation of Flexible Learning

    During the following 5 years, another 409 papers on flexible learning were published. The number of papers in 2006-2010 increased to 1301 and then 1943 in 2011-2015. ... The past three decades has witnessed an increasing growth of research on flexible learning. Efforts to pursue flexible learning have been made by not only researchers but ...

  9. Hybrid flexible (HyFlex) teaching and learning: climbing the mountain

    In 2020, King's College London introduced HyFlex teaching as a means to supplement online and face-to-face teaching and to respond to Covid-19 restrictions. This enabled teaching to a mixed cohort of students (both online and on campus). This article provides an outline of how such an approach was conceptualized and implemented in a higher-education institution during an intense three-month ...

  10. An analysis of flexible learning and flexibility over the last 40 years

    ABSTRACT. In this paper, we report the major themes we identified in the literature surrounding flexible learning that has been published in Distance Education over the last 40 years. We identified six themes: the qualities of flexibility as affording "anytime, anyplace" learning; flexibility as pedagogy; liberatory or service-oriented aspects of flexibility; limitations of flexibility ...

  11. How flexible is flexible learning, who is to decide and what are its

    As such flexible learning, in itself, is not a mode of study. It is a value principle, like diversity or equality are in education and society more broadly. Flexibility in learning and teaching is relevant in any mode of study including campus-based face-to-face education. At USP, a unique institution that is owned and run by 12 independent ...

  12. Review of Flexible Learning Spaces in Education

    The review of Flexible Learning Spaces' research revealed school life's inner workings, particularly on teachers' and students' work. This review's findings were grouped into three ...

  13. Challenges Encountered by Students in Flexible Learning: the Case of

    The COVID-19 Pandemic has led Higher Education Institutions (HEIs) in the Philippines to replace on-campus learning with flexible learning. This paper explores the students' challenges on flexible ...

  14. Full article: Becoming more systematic about flexible learning: beyond

    Flexibility can occur within the individual course, as a result of choices made by the instructor. Flexibility can involve options in course resources, in types of learning activities, in media to support learning, in options for communication and social interaction, and many other possibilities (Ling et al., 2001; Zimitat, 2002). However ...

  15. Flexible learning spaces facilitate interaction, collaboration and

    It is therefore encouraging to note the greater level of positive interaction among students observed in the flexible learning spaces. Previous research has demonstrated that a strong student-teacher relationship fosters behavioural engagement , and that in classrooms where teachers facilitate dialogue and discussion, student engagement is ...

  16. Effectiveness of an Online Classroom for Flexible Learning

    International Journal of Academic Multidisciplinary Research (IJAMR), Vol. 4, Issue 8, August - 2020, Pages: 100-107 ... Effectiveness of an Online Classroom for Flexible Learning (2020). International Journal of Academic Multidisciplinary Research (IJAMR), Vol. 4, Issue ... Research Paper Series; Conference Papers; Partners in Publishing ...

  17. PDF Flexibility in e-Learning: Modelling its Relation to Behavioural

    From a pedagogically student-centered perspective, it is stated that students should be given flexibility in terms of time, space, learning at their own pace, changing learning strategies, and choosing learning resources and evaluation activities (Flannery & McGarr, 2014; Nikolov, Lai, Sendova & Jonker, 2018). Table 1.

  18. PDF Academic Stress and Coping Strategies of Students in Flexible Learning

    of Students in Flexible Learning Research Article Jamillah Martin University of Saint Louis, Philippines ARTICLE INFO Article History Date Received: October 7, 2022 Date Accepted: January 20, 2023 Keywords Flexible Learning, Online Learning, Modular Learning, COVID-19, Aca-demic Stress, Coping Strategies ABSTRACT

  19. Learning effectiveness of a flexible learning study programme in a

    Accordingly, the term 'flexible learning' is used in this paper as desired study characteristics at the programme level. The term 'blended learning' is used to describe the educational design of the courses under investigation in the experimental condition. ... Research in Learning Technology. 2012; 20 (1):n1. doi: ...

  20. Application of the UTAUT model to understand learning behavior using

    The research questionnaire was created using Google Form and the results of respondents' answers were analyzed using SEM-AMOS. The research results show that perceived ease of use influences RPL students' perceived usefulness of online lectures using online video conferencing, which also influences online learning intentions.

  21. Perceived interplay between flexible learning spaces and teaching

    The research contributes to an understanding of how flexible learning spaces are used and with what effect, thereby addressing a present gap in the literature. A flexible learning space in a ...

  22. Learning to Reason with LLMs

    Our large-scale reinforcement learning algorithm teaches the model how to think productively using its chain of thought in a highly data-efficient training process. We have found that the performance of o1 consistently improves with more reinforcement learning (train-time compute) and with more time spent thinking (test-time compute).

  23. PDF Readiness of Students in Flexible Learning Modality: A Convergent ...

    of difficulties among students in all dimensions of flexible learning. This research places the thrust of improving curriculum delivery by addressing flexible learning policies in the local context. Keywords flexible learning, the readiness of students, Covid-19 Pandemic, convergent parallel mixed-methods study, Philippines INTRODUCTION

  24. Dual‐Mode Flexible Electrophoretic E‐Paper with Integration of

    Reflective e-paper displays show promising prospects with the increasing demand for energy-efficient and sustainable technologies. However, the intrinsic incapability to display in dark environments hinders their broader application, and it is significantly important to enable reflective displays to achieve good display effects under various lighting conditions.

  25. (PDF) A Flexible Learning Framework Implementing ...

    This paper provides a framework for local universities and colleges in implementing flexible learning procedures. The asynchronous course delivery consists of the design of outcomes-based teaching ...

  26. Review: Developments and challenges of advanced flexible electronic

    Flexible sensors, made from flexible electronic materials, are of great importance in the medical field due to the rising prevalence of cardiovascular and cerebrovascular diseases. Studies have demonstrated that timely diagnosis and continuous monitoring of relevant physiological signals can be beneficial in preventing such conditions. Although traditional rigid monitoring sensors are still ...

  27. PDF The Effect of Flexible Learning Schedule on Online Learners' Learning

    While traditional classroom instruction requires learner to follow certain sequence bounded by time, content, and place, online instruction allows flexible learning modes so students can control their learning path, pace, and contingencies of instruction (Hannafin, 1984). The more the learners can control individual learning environment, the ...

  28. (PDF) Systematic review of inquiry-based learning ...

    PDF | Background Inquiry-based learning (IBL) is a student-centred pedagogical approach that promotes critical thinking, creativity, and active... | Find, read and cite all the research you need ...

  29. Knee Osteoporosis Diagnosis Based on Deep Learning

    Deep learning has gained widespread acceptance in image analysis, representing a notable progress in recent decades. Existing research papers on osteoporosis have previously implemented algorithms, but many of them have reported relatively low accuracy rates [16,17,18, 21, 27]. In response to this limitation, our study focuses on introducing an ...