This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
1. Introduction
The Internet of Things (IoT) and related technologies consist of the important parts of the new generation information technologies. The typical application scenarios of IoT include Internet of vehicles, intelligent transportation, smart factory, and smart home. The rapid development of communication, computation, and networking technologies has made more IoT devices connected. In the IoT, besides of the typical fixed equipment (e.g., sensors and cameras), it also includes huge amount of mobile user devices (e.g., cell phone, cars, and UAV). There is also high demand for mobile traffic and many time-sensitive typical applications (e.g., automatic drive and telemedicine). The high speed, low delay, and ubiquitous network characters of 5G networks support the Internet of everything, which is the critical guarantee for the high quality of communication services and big data business in IoT application scenarios.
The 5G low band, midband, and LTE (Long-Term Evolution) small cell techniques cannot meet the requirements of massive devices access, high data rate, and huge amount of mobile traffic in the next generation wireless networks [1]. Therefore, we adopt high frequency section and the ultradense deployment technique of 5G networks in our research. In ultradense networks (UDN), the 5G critical techniques consist of the millimeter wave technology [2]. By the ultradense deployment of small cells, the network throughput and number of access users in two-layer cellular network architecture are improved [3–5]. And the QoS (quality of service) requirements of mobile users are also satisfied. However, the small coverage and network access limitations of small cells bring about the frequent handover and ping-pong effect which directly influence the quality and continuity of communication services in 5G ultradense networks [6–8]. The traditional handover decision methods depend on the handover threshold and measurement report, which cannot efficiently resolve the frequent handover and ping-pong effect.
To reduce the unnecessary handover and improve the QoS, from the point of state aware method, combine with the analysis of dwell time, the SA-PER handover decision method is proposed. The handover management process in wireless networks includes three steps: information collection, handover decision, and handover execution [9]. Most research works focus on the improvements of handover decision methods [10]. In the handover decision process, the optimal candidate cellular is determined by the multiple handover decision criteria and efficient handover decision strategies [11]. And the handover rate, ping-pong effect, radio link failure rate, throughput, and so on are selected as the evaluation criteria. In this paper, the dwell time and prioritized experience replay are selected as the new handover criteria and handover strategy, respectively.
As Figure 1 shows, the 5G ultradense networks consist of two-layer cellular architecture, included macro base station (MBS) and small base station (SBS) [9]. The communication services and data transmission of mobile users are realized with the connections of macro cell or small cell. Because of the ultradense deployment of small cells, the overlapped coverage of macro cell and small cell is obvious. The small coverage and access users’ limitation of small cell lead to the frequent handover and ping-pong effect [10]. In our study, the complex handover decision problem includes vertical handover (MBS-SBS) and horizontal handover (MBS-MBS and SBS-SBS). How do ordinary mobile users choose between horizontal handover and vertical handover? How do we improve the performance and efficiency of deep reinforcement learning-based handover decision methods? The traditional weighted multiple handover decision method is easily affected by the training process of weighted coefficients, which unable to maintain stable performance. The handover threshold and priori knowledge cannot solve the ping-pong effect completely. Therefore, the cell dwell time is selected as the handover decision criteria and prefer to choose the cell which provides the long connection time not the cell which provides the optimal network services. We should be aware that if we select the cell obtained the optimal network service, the frequent changes of optimal cell lead to the frequent handover and degrade the QoS of mobile users [3]. To deal with the overestimates of DQN-based handover decision method, the DDQN is selected as the base method. To improve the learning efficiency, convergence rate, and handover performance, the prioritized experience replay mechanism is added into DDQN. Combining with the analysis of cell dwell time and PER method, a state aware-based prioritized experience replay handover decision method is proposed to deal with the frequent handover and communication interrupt problems in 5G ultradense networks.
[figure(s) omitted; refer to PDF]
Our proposed method has good performance of handover and meets the demands of mobile communication service. In this research, our contributions are summarized as follows:
(1) The handover threshold and periodic measurement report cannot efficiently solve the frequent handover and ping-pong effect. And the ultradense deployment exacerbated the handover problems in 5G UDN. Aiming at the above handover problems in 5G UDN, we propose the SA-PER handover decision method to deal with the frequent handover and communication interrupt problems and reduce the ping-pong effect
(2) The dwell time of mobile users in cellular networks is analysed and calculated in detail. The proposed state aware method includes state aware sequence, max-min normalization, and normalized state decision matrix, which supports the preprocessing of data and assists the handover decision
(3) The handover decision problems of MBS-MBS, MBS-SBS, and SBS-SBS are carefully researched. Moreover, the competitive and collaborative relationships between vertical handover and horizontal handover in 5G UDN are concerned and analysed. Our analysis and discussion help mobile user better balance the choice between vertical handover and horizontal handover
The rest of this paper is organized as follows. The main research works of handover decision and existing challenges are introduced in Section 2. The system model is described in Section 3. The SA-PER handover decision method is proposed in Section 4. Simulation setups and experimental results are provided in Section 5. Finally, Section 6 concludes this paper. We summarize the definitions of the acronyms in this paper in Table 1.
Table 1
List of acronyms.
Symbol | Description |
5G | 5 generation |
AHP | Analytic hierarchy process |
A3C | Asynchronous advantage actor-critic |
DDQN | Double deep Q-network |
DNN | Deep neural networks |
DQN | Deep Q-network |
DRL | Deep reinforcement learning |
ES | Evolution strategy |
GRA | Grey relational analysis |
HetNets | Heterogeneous networks |
HOF | Handover failure rate |
HOR | Handover rate |
IoT | Internet of Things |
KPIs | Key performance indicators |
LTE | Long-Term Evolution |
MBS | Macro base station |
MDP | Markov decision process |
PER | Prioritized experience replay |
QoS | Quality of service |
RL | Reinforcement learning |
SAW | Simple additive weighting |
SBS | Small base station |
SDN | Software-defined network |
SA-PER | State ware-based prioritized experience replay |
TOPSIS | Technique for Order Preference by Similarity to Ideal Solution |
UDN | Ultradense networks |
2. Related Work
5G networks support the Internet of everything, which provides the ubiquitous communication services for the fixed IoT devices and mobile user devices. The mobility management of the connected mobile devices is one critical challenge for the continuous communications and high quality of QoS. Therefore, many researchers focus on the handover problem of mobile devices. In high mobility scenario of IoT applications, such as UAV, the continuous communication connection and handover management are vital and nonignorable [12]. Sharma et al. [12] proposed a media independent handover-based fast handover security protocol in a heterogeneous IoT networks. The CoAP protocol is widely used in IoT networks. Chun and Park [13] proposed a CoAP-based mobility management protocol to realize the mobility management in IoT by the location management function. An SDN-based method realizes the mobility management in urban IoT heterogeneous networks [14]. Machine learning [15, 16] and reinforcement learning [17] have been widely applied to the research of handover management. As one new artificial intelligence method, DRL [18] is used in communications and networking to deal with many decision problems, e.g., handover decision. The high performance, online learning, and decision ability of DRL attracted much attention from the academia and industry.
The traditional handover decision methods in cellular networks include multi-attribute-based handover decision method [19], decision function-based handover decision method [15, 19], and context-aware-based handover decision method [20]. Bastidas-Puga et al. [19] proposed a predicted SINR-based handover decision method to deal with frequent handover and ping-pong effect. Singh and Singh [15] adopted the multiattribute decision method to obtain the weights of decision factors. By using the simple additive weighting (SAW), TOPSIS (Technique for Order Preference by Similarity to Ideal Solution), and grey relational analysis (GRA) methods, the candidate cells are decided. Hu et al. [20] proposed a velocity aware-based handover prediction method. The handover decision problem is formalized as the formal state-based shortest path problem in time expansion diagram. In [21], Goyal and Kaushal combined with the analytic hierarchy process method (AHP), TOPSIS, and reinforcement learning to optimize the selection of candidate cell. In addition, many researches adopt state aware in handover decision process, including context-aware [22, 23], mobility aware [6, 24], velocity aware [4, 20], and load aware [25]. The state aware method provides necessary data supports and decision basis for handover decision. In this paper, we adopt state aware method and cell dwell time to solve the performance fluctuation problem of traditional weighted multiple attribute handover decision methods.
There are many research works focus on the frequent handover, ping-pong effect, and handover failure problems in 5G ultradense networks. Sun et al. [6] combined with the cell dwell time and movement state of users to match the candidate cells. By using movement aware handover decision method, the relations between dwell time and well connected cellular are balanced. In [26], by the assistance of unmanned aerial vehicles, the authors analysed the handover rate and dwell time of users in cellular networks. When the dwell time increases, the average handover numbers of users decrease, and the quality and continuity of communication services become better. Aiming at the frequent handover and increasing load of networks, Liu et al. [7] proposed a Q-learning-based handover decision method. The SDN (software-defined network) and 5G techniques were combined, and the entropy-based SAW handover decision method was proposed [8]. In recent researches, the base stations in cellular networks are selected as the edge computing node. Considering the migration of communication services, data services, and computing services, the researchers proposed a joint handover method and unloading decision method [27]. Huang et al. [16] firstly transformed the handover decision problem into the classification problem. Considering the changes of SINR parameter, the deep neural network (DNN) method realized the handover decision. Hasan et al. [28] classified the users into high speed users and ping-pong users. An elimination method of frequent handover was proposed. The energy cost issues of periodic measurements in 5G ultradense networks were also concerned [5].
The reinforcement learning-based handover decision method has good decision ability and handover performance, which is popular in handover decision researches in heterogeneous networks (HetNets) and UDN. Guidolin et al. [23] proposed an MDP-based handover decision method. By modelling the handover decision of mobile users, the optimal context handover decision standards were obtained. In [29], an MDP-based vertical handover method maximized the total expected rewards of handover. The AHP method computed the weight coefficients for the power, mobility, and energy cost decision factors. Yang et al. [30] and Sun et al. [31] adopted the multiarmed bandit handover decision method to produce handover decision strategies and reward. And the optimal candidate cell was determined. Tabrizi et al. [17] considered the state of networks and user devices and adopted Q-learning method to select candidate cells in handover decision process. The Q-learning-based handover decision method is widely used to solve the handover decision problems in terrestrial networks and satellite networks. The Q-learning-based handover decision method and relevant improved algorithms outperform the existing multiple attribute-based, decision function-based, and handover threshold-based methods. But, the Q-learning method needs to search the Q table for the optimal action in each iteration, which cost high searching time for the high dimensional state space. The Q-learning method is not suitable for the decision problem with high dimension state space. The DQN method replaces the Q table with DNN to describe the action value function, which is used to solve the decision problem with high dimension state space [32].
Google DeepMind team proposed the DRL method and obtained the superior performance in Atari 2600 games, which attracted more attentions from academia [33]. This new artificial intelligence method was used in communications and networking to deal with dynamic network access, data rate control, wireless caching, data offloading, and resource management [18]. In [34], the DQN-based handover decision method is used to deal with the frequent handover issue in UDN. The handover decision is formalized as a discrete Markov decision process. In [35], Sun et al. selected the evolution strategy (ES) to optimize the convergence speed and accuracy of backhaul network. And the DQN method was used in the vertical handover decision problem in HetNets. Wang et al. [36] creatively adapted the duelling network in reinforcement learning (RL). The proposed new network architecture represents two separate estimators, which express the state value function and the state-dependent action advantage function, respectively. The main benefit of this factoring is to generalize learning across actions without imposing any change to the underlying RL algorithm. To reduce the signalling overhead and solve the frequent handover, in [37], a double DRL method is proposed in 5G UDN, which reduces the handover numbers. By the trajectory-aware-based optimization method, the optimal candidate cell is determined with the trajectory of UE and topology of network. The connection time of UE-BS is increasing which reduces the handover overhead. Considering the handover decision problem in ultradense heterogeneous network, Song et al. [38] proposed a distributed DRL decision method. This proposed approach concerned the energy costs of transmission and handover load and minimized the total energy costs. In [39], the mobility patterns of users were classified, and the asynchronous multiagent DRL method was used in the handover decision process. In [40], the prior knowledge and supervised learning method are used to initialize the DNN, which offsets the bad effects of random exploration method. The frequent handover issue caused by deployment handover policy is solved by asynchronous advantage actor-critic- (A3C-) based handover method. In [41], the joint problem of handover and power allocation is formalized as the completely cooperated multiagent task, which is solved by the proposed proximal policy optimization-based multiagent reinforcement learning method. The global information is used in the training process of decentralized policy used in UE. In [32], Wu et al. proposed a load balancing-based double deep Q-network (LB-DDQN) method for handover decision. In the proposed load balancing strategy, a load coefficient is defined to express the conditions of loading in each base station. The supplementary load balancing evaluation function evaluates the performance of this load balancing strategy. The comparisons of different handover methods for cellular networks are shown in Table 2.
Table 2
Comparisons of different handover methods according to their characteristics.
Ref. | Problems and scenarios | Method | Contributions | Simulations | KPIs |
[6] | Coordinated multipoint handover in 5G UDN | User-centric CoMP handover schemes | Characterize the movement trend through dwell time | Numerical simulation | HOR; Th |
[7] | Handover triggering policy in 5G UDN | Clustering-based RL | Multiple decision criteria-based handover triggering mechanism | MATLAB; SD | HON; HOF; PPE; Th; latency |
[12] | Handover failures and ping-pong effect in HetNet | HO triggers mechanism | Recursive least squares-based SINR prediction method | SD | HOF; PPE |
[15] | Handover decision issue in LTE-A | AHP-TOPSIS; Q-learning | UE rank; the optimal triggering points of HO | MATLAB; SD | HOR; PPE |
[17] | Handover in HetNets | Markov-based handover strategy | Context-aware handover policies | Monte Carlo simulations | Capacity |
[22] | Handover in 5G | DNN | Reduce the handover problem to a classification problem | SD | RLF; PPE |
[23] | Frequent handover in 5G UDN | Frequent handover mitigation algorithm | Dwell time estimation; user detection | NS3; | HON; Th |
[27] | Handover decision in HetNets | Q-learning | Q-learning-based handover decision | SD | Cost; utility |
[30] | Frequent handover in UDN | DQN | SDN-based UDN architecture; DQN-based handover decision | Mininet; | HOR; Th |
[31] | VHO in HetNets | ES-DQN | Training the parameters of main Q-network with ES | Python; SD | HOF; Th; delay |
[33] | Frequent handover in 5G | Double DRL | Trajectory-aware HO optimization approach | Wireless Insite software; SD | HON; Th |
[34] | Handover decision in HetNets | Distributed DRL | MDP formulation; distributed DRL | SD | HON; energy cost |
[36] | Handover in UDN | A3C | Mobility pattern-based user clustering; A3C-based HO policy | SD | HOR |
[37] | Handover and power allocation in HetNets | Multiagent DRL | Proximal policy optimization; cooperative multiagent DRL | SD | HOR; Th |
[43] | Load balancing and handover in 5G UDN | LB-DDQN | Load balancing strategy; load coefficient; load balancing evaluation function | Python; SD | HOR; Th |
This paper | Handover decision in 5G UDN | SA-PER | State aware; analysis of dwell time; the relationships between VHO and HHO | Python; SD | RLF; HON; PPE; Th |
VHO: vertical handover; HHO: horizontal handover; HON: handover number; HOR: handover rate; Th: throughput; HOF: handover failure rate; RLF: radio link failure; PPE: ping-pong rate; SD: simulated data.
3. System Model
3.1. Network Model
In our research, the 5G UDN have tow-layer cellular architecture included
[figure(s) omitted; refer to PDF]
3.2. Channel Model
The channel model of MBS and SBS in 5G UDN describes the characteristics of wireless channel [7]. The path loss of wireless link connected cell
3.3. Movement Model of Users
Figure 3 shows that the simulated scenario of smart city has multiple crossing roads, and many users move randomly. The MBS and SBS deploy in the both sides of roads, which provide wireless network access services, communication services, and data transmission with the covered users. In this city, there are
[figure(s) omitted; refer to PDF]
3.4. Problem Formulation and Algorithm Elements
In this paper, the handover decision problem in 5G UDN is formalized as a discrete Markov decision process, expressed with <S, A, and R>. And the parameters
3.4.1. State Space
In 5G UDN, the network state is obtained by state aware method. The state aware sequence consists of SINR, dwell time
3.4.2. Action Space
In network time slot
3.4.3. Reward Function
The value of reward function is the immediate reward of action
4. The State Aware-Based Prioritized Experience Replay Handover Decision Method
4.1. Analysis of Dwell Time in Cellular
According to the coverage area of heterogeneous cells, coordinates, and speed of mobile users, the dwell time in cell is computed [6]. Because the dwell time
[figure(s) omitted; refer to PDF]
When users moving in the positive direction of the
When users moving in the negative direction of the
The dwell time
4.2. State Aware Decision Matrix
In the state aware decision matrix, the state aware sequence is a vital input, which includes SINR,
The normalized state decision matrix is
4.3. The Prioritized Experience Replay Based on DDQN Method
By the state aware method and normalization operation, the normalized state decision matrix is obtained which assists the handover decision. Combining with state aware method, the proposed SA-PER handover decision method adopts rank-based prioritization and importance sampling, which make sure of the learning efficiency and convergence of algorithm. The rank-based prioritization method computes the priority
The
When the maximum value of
The loss function of DDQN method is the difference value between the target value
In the training process of handover decision, the loss function returns the gradient loss to update the parameters of main Q-network at each iteration. With the updates of parameters, the value of loss function decreases. And the performance of handover becomes better. The loss function of DDQN method is optimized by the stochastic gradient descent method. The gradient of loss function is defined as
In Figure 5, the framework of the state aware-based prioritized experience replay method is illustrated clearly. In network environment, the necessary information and data collected by UE periodically input the state aware method. The obtained state decision matrix is normalized. Then, the obtained current state aware sequence
Algorithm 1: SA-PER handover decision algorithm.
Input: Iteration number NUM_EPISODES, step number MAX_STEPS, node number node_num, measurement information SINR, length of update step D.
Output: Handover decision matrix A.
1: Initialize action-value function Q, replay buffer B and handover decision matrix A. The initialized parameters of the main Q-network and target Q-network are consistent.
2: fori=1, NUM_EPISODESdo
3: forj=1, MAX_STEPSdo
4: fork=1, node_numdo
5: According to Eq. (6), the immediate reward rt is computed.
6: According to Eq. (11), the dwell time is computed. According to Eq. (14), the load coefficient Load is obtained. By the state aware method, the network state st in time slot t is constructed. According to Eq. (16, 17), the state decision matrix Ms is normalized.
7: By the ε-greedy method, the action at corresponding to state st is determined and the handover decision matrix A is updated.
8: The next state st+1 is produced and the transition (st, at, rt, st+1) is stored in buffer B.
9: In PER method, according to Eq. (18, 19), the priority and probability of sample are computed. According to Eq. (20), the weight of importance sampling method is computed. The sampling data is the input of main-Q network, and the action-value function Qm(st,at) is computed.
10: According to Eq. (22), the action am corresponding to the maximum value of Qm is obtained and input the target Q-network Qt. And the action-value Qt(st+1, am) is computed.
11: Adopt the stochastic gradient descent method, according to Eq. (24), the parameters θx of main Q-network are updated.
12: end for
13: Every D steps, the parameters of target Q-network are updated by the parameters of main Q-network.
14: end for
15: end for
16: Return the handover decision matrix A.
[figure(s) omitted; refer to PDF]
5. Experimental Results and Discussions
5.1. Simulation Environment Setups
The targets of this research are to solve the frequent handover and communication interrupt. A PC carries out the simulation experiments with 3.2 GHz quad-core i5-1570 and 16 GB of RAM. The OS is win 10, 64 bits, and the simulation platform is Python 3. The simulated scenario of virtual city is shown as Figure 3. The width and length of simulated area in city are 2.5 kilometres and 2 kilometres. This scenario includes 7 roads, and the buildings, hills, rivers, and so on are unmarked. It contains 10 macro cells and 34 small cells. These base stations are deployed along the roads to cover as much area as possible. Note that the overlapping coverage is also evident. The movement model of UE is described as Section 3.3. The starting point of mobile user is randomly selected from 11 initial points. The speed of mobile user is randomly selected from 5 km/h, 25 km/h, 50 km/h, 70 km/h, and 120 km/h. The mobile user is moving at a constant speed in straight lines. The number of mobile users is 50, 100, 200, and 300, respectively. The simulation environment of wireless heterogeneous cellular networks is realized by Python. In this simulation, the system bandwidth of macro cell and small cell is set to 20 MHz and 500 MHz, respectively. The wireless channels of macro cell and micro cell are modelled reference the TR 38.901 V16.1.0. The standard deviations of shadow fading are 7.8 dB and 8.2 dB, respectively. For the handover settings, TTT and A3 offset are set as 450 ms and 3 dB. If the SINR is below -3 dB for 500 ms, then the radio link is considered to have failed. The communication radius of macro cell and small cell is 500 meters and 50 meters, respectively. And the upper limits of connected users are 100 and 275, respectively. One user only occupies up to one resource block, and the bandwidth of subchannel in macro cell and small cell is 180 kHz and 1.75 MHz, respectively [43].
The handover rate (HOR), radio link failure rate (RLF rate), and ping-pong rate (PPR) are selected as evaluation criterions.
Reference to [39, 41], the simulation parameters of the network are show as Table 3
Table 3
Simulation parameters of the network.
Parameters | Macro cell | Small cell |
Total number of cell | 10 | 34 |
Cell radius | 500 m | 50 m |
Carrier frequency | 2 GHz | 28 GHz |
System bandwidth | 20 MHz | 500 MHz |
RB’s bandwidth | 180 kHz | 1.75 MHz |
Number of RBs | 100 | 275 |
Thermal noise | -174 dBm/Hz | |
Shadowing | 7.8 dB | 8.2 dB |
Antenna gain | 15 dBi | 5 dBi |
Cell transmit power | 46 dBm | 35 dBm |
Path loss model | ||
Number of users | 50, 100, 200, 300 | |
Speed of UE (km/h) | 5, 25, 50, 70, 120 | |
Duration of simulation | 600 seconds | |
Sampling interval | 0.1 second |
5.2. Analysis and Discussion of Experimental Results
5.2.1. Average Handover Numbers of UE
Figure 6 shows the average handover numbers of different handover decision methods while the numbers of users are 50, 100, 200, and 300, respectively. When the number of users increases, the handover numbers increase. And the proposed SA-PER handover decision method has the excellent performance, and the performance of DuelingNet method is much closed. When a number of users are 50, 100, 200, and 300, the average handover numbers of SA-PER are 6.82, 10.76, 13.12, and 13.36, respectively.
[figure(s) omitted; refer to PDF]
In the proposed SA-PER method, the state aware method makes full use of the state aware data and provides the decision basis for the handover decision. Moreover, the PER method improves the sampling method, and the learning efficiency and accuracy of DRL algorithm are optimized. In the DDQN method, the main Q-network trains the network coefficients, and the target Q-network updates Q-network. The learning performance of DDQN method is better than the traditional DQN method. Based on DDQN, the DuelingNet method updates the network structure and improves the learning ability. According to the comparative analysis, we found that the proposed SA-PER handover decision method solved the frequent handover problem. And the average handover numbers decreased obviously, which meets the communication demands of mobile users.
Figure 7 shows the average handover numbers of SA-PER method with different speeds and numbers of users. When the number of user is fixed, the increase of user speed leads to the decrease of handover numbers. This is because that when the user speed is bigger, the number of sampling is smaller, and the number of handover request is smaller. When the user speed is fixed, the increase of users’ number leads to the increase of average handover number, because the load coefficient is one handover decision factor. In the process of users’ movement, the mobile users prefer to connect the candidate cell which has a low load coefficient.
[figure(s) omitted; refer to PDF]
Figure 8 shows the vertical handover (MBS-SBS) and horizontal handover (MBS-MBS and SBS-SBS) performance of SA-PER method with different numbers of users. With the increase of users’ number, the total handover numbers are increased. Because the increase of users’ number affects the load of cell directly, in the SA-PER method, the number of vertical handover is smaller than horizontal handover. This is because that in the ultradense deployment of small cells, the overlapped coverage between macro cell and small cell is obvious. In the handover decision process, the macro cell is mostly selected as the candidate cell. This is because that the dwell time is also one decision factor. When the dwell time is longer, the handover number is smaller. The total handover numbers of vertical handover change a little. When the coverage of cellular network is poor, the mobile user only connects MBS or SBS. The collaborative relationship between horizontal handover and vertical handover is dominated. When the coverage of cellular network is good, the candidate cellular set is big. The competitive relationship between horizontal handover and vertical handover is dominated. When the speed of UE increases, the UE selects the macro cell to handover, which has the long dwell time. Our research analyses the relations between vertical handover and horizontal handover, which provides good preparations for the real deployment and increases the successive handover rate.
[figure(s) omitted; refer to PDF]
5.2.2. Handover Rate, Radio Link Failure Rate, and Ping-Pong Rate
Figure 9 shows the average value of the handover rate, radio link failure rate, and ping-pong rate of different handover decision methods with the
[figure(s) omitted; refer to PDF]
When the values of HOR, RR, and PPR are smaller, the performance of handover decision method is better. Because of the random motion of UE, the
5.2.3. The Throughput of Networks
Figure 10 shows the average throughput of network for different handover decision methods while the number of user is 100. In comparison, the proposed SA-PER handover decision method has a higher throughput 0.5465 Mbps. The performance of network throughput for Q-learning method is in the second place. Because the Q-learning method usually used in the discrete problems not the continuity problems, the state aware and PER method optimize the data collection and batch sampling. Therefore, the proposed method meets the demands of communication services for the mobile users.
[figure(s) omitted; refer to PDF]
5.2.4. Average Dwell Time of User
The average dwell time of different handover decision methods with different numbers of users is shown in Figure 11. When the number of users increases, the average dwell time decreases. And the SA-PER method has a longer dwell time than others. Because the state aware and PER method improve the learning efficiency and accuracy, according to Equation (12), when the total dwell time is fixed, the decrease of handover number and connected cell number leads to the increase of dwell time. The proposed SA-PER method has the longest dwell time which means the lower handover numbers. And this proposed method meets the demand of communication continuity for mobile users.
[figure(s) omitted; refer to PDF]
5.2.5. The Convergence of SA-PER Method
Figure 12 shows the convergence condition of SA-PER method when the number of user is 100. The average handover numbers correspond to each generation. In the proposed SA-PER method, the coefficients of Q-network have the random initial parameters, which leads to a high handover number. With the training process, the handover performance of our method becomes stable, and the handover number becomes small. When the number of generation is 100, the convergence of our method is obvious, and the handover number is 30.54. When the number of generation increases to 1000, the minimum handover number is 8.88. The proposed method has a good handover performance and improves the efficiency of handover management.
[figure(s) omitted; refer to PDF]
6. Conclusions
In this research, the proposed SA-PER handover decision method reduced the frequent handover and ping-pong effect in 5G ultradense networks. The quality and continuity of communication services are upgraded and improved. The state aware method and the analysis of cell dwell time reduced the frequent handover and ping-pong effect. The prioritized experience replay method improved the learning efficiency and convergence rate of DDQN-based handover decision method. The analysis of competitive and collaborative relationships between different handovers helps the network operators balance the resource efficiency and QoS. In addition, by means of the decision ability of DDQN method, the online learning of handover decision is more adapted to the dynamics of networks and mobility of users.
Acknowledgments
This work is supported by the National Natural Science Foundation of China (No. 61772385).
[1] M. I. Rochman, V. Sathya, N. Nunez, D. Fernandez, M. Ghosh, A. S. Ibrahim, W. Payne, "A comparison study of cellular deployments in Chicago and Miami using apps on smartphones," Proceedings of the 15th ACM Workshop on Wireless Network Testbeds, Experimental evaluation & CHaracterization, pp. 61-68, DOI: 10.1145/3477086.3480843, .
[2] S. Khosravi, H. S. Ghadikolaei, M. Petrova, "Learning-based load balancing handover in mobile millimeter wave networks," GLOBECOM 2020 - 2020 IEEE Global Communications Conference, .
[3] V. Sathya, "Evolution of small cell from 4G to 6G: past, present, and future," . https://arxiv.org/abs/2101.10451
[4] R. Arshad, H. ElSawy, S. Sorour, T. Y. Al-Naffouri, M.-S. Alouini, "Velocity-aware handover management in two-tier cellular networks," IEEE Transactions on Wireless Communications, vol. 16 no. 3, pp. 1851-1867, DOI: 10.1109/TWC.2017.2655517, 2017.
[5] Y. Z. H. Wang, X. Yang, C. Wei, "METRE measurement task recommendation for energy-efficient handover in dense networks," GLOBECOM 2020 - 2020 IEEE Global Communications Conference, .
[6] L. W. W. Sun, J. Liu, N. Kato, Y. Zhang, "Movement aware CoMP handover in heterogeneous ultra-dense networks," IEEE Transactions on Communications, vol. 69 no. 1, pp. 340-352, DOI: 10.1109/TCOMM.2020.3019388, 2021.
[7] Q. Liu, C. F. Kwong, S. Wei, L. Li, S. Zhang, "Intelligent handover triggering mechanism in 5G ultra-dense networks via clustering-based reinforcement learning," Mobile Networks and Applications, vol. 26, pp. 27-39, DOI: 10.1007/s11036-020-01718-w, 2021.
[8] M. Cicioğlu, "Multi-criteria handover management using entropy-based SAW method for SDN-based 5G small cells," Wireless Networks, vol. 27 no. 4, pp. 2947-2959, DOI: 10.1007/s11276-021-02625-y, 2021.
[9] G. Gódor, Z. Jakó, Á. Knapp, S. Imre, "A survey of handover management in LTE-based multi-tier femtocell networks: requirements, challenges and solutions," Computer Networks, vol. 76, pp. 17-41, DOI: 10.1016/j.comnet.2014.10.016, 2015.
[10] D. Xenakis, N. Passas, L. Merakos, C. Verikoukis, "Mobility management for femtocells in LTE-advanced: key aspects and survey of handover decision algorithms," IEEE Communications Surveys & Tutorials, vol. 16 no. 1, pp. 64-91, DOI: 10.1109/SURV.2013.060313.00152, 2014.
[11] A. Stamou, N. Dimitriou, K. Kontovasilis, S. Papavassiliou, "Autonomic handover management for heterogeneous networks in a future Internet context: a survey," IEEE Communications Surveys & Tutorials, vol. 21 no. 4, pp. 3274-3297, DOI: 10.1109/COMST.2019.2916188, 2019.
[12] V. Sharma, J. Guan, J. Kim, S. Kwon, I. You, F. Palmieri, M. Collotta, "MIH-SPFP: MIH-based secure cross-layer handover protocol for Fast Proxy Mobile IPv6-IoT networks," Journal of Network and Computer Applications, vol. 125, pp. 67-81, DOI: 10.1016/j.jnca.2018.09.002, 2019.
[13] S.-M. Chun, J.-T. Park, "Mobile CoAP for IoT mobility management," 2015 12th Annual IEEE Consumer Communications and Networking Conference (CCNC), pp. 283-289, .
[14] D. Wu, D. I. Arkhipov, E. Asmare, Z. Qin, J. A. McCann, "UbiFlow: Mobility Management in Urban-Scale Software Defined IoT," 2015 IEEE Conference on Computer Communications (INFOCOM), pp. 208-216, DOI: 10.1109/infocom.2015.7218384, .
[15] N. P. Singh, B. Singh, "Vertical handoff decision in 4G wireless networks using multi attribute decision making approach," Wireless Networks, vol. 20 no. 5, pp. 1203-1211, DOI: 10.1007/s11276-013-0670-1, 2014.
[16] Z. H. Huang, Y. L. Hsu, P. -K. Chang, M. -J. Tsai, "Efficient handover algorithm in 5G networks using deep learning," GLOBECOM 2020 - 2020 IEEE Global Communications Conference, .
[17] H. Tabrizi, G. Farhadi, J. Cioffi, "Dynamic handoff decision in heterogeneous wireless systems_ Q-learning approach," 2012 IEEE International Conference on Communications (ICC), pp. 3217-3222, 2012.
[18] N. C. Luong, D. T. Hoang, S. Gong, D. Niyato, P. Wang, Y.-C. Liang, D. I. Kim, "Applications of deep reinforcement learning in communications and networking: a survey," IEEE Communications Surveys & Tutorials, vol. 21 no. 4, pp. 3133-3174, DOI: 10.1109/COMST.2019.2916583, 2019.
[19] E. R. Bastidas-Puga, Á. G. Andrade, G. Galaviz, D. H. Covarrubias, "Handover based on a predictive approach of signal-to-interference-plus-noise ratio for heterogeneous cellular networks," IET Communications, vol. 13 no. 6, pp. 672-678, DOI: 10.1049/iet-com.2018.5126, 2019.
[20] X. Hu, H. Song, S. Liu, W. Wang, "Velocity-aware handover prediction in LEO satellite communication networks," International Journal of Satellite Communications and Networking, vol. 36 no. 6, pp. 451-459, DOI: 10.1002/sat.1250, 2018.
[21] T. Goyal, S. Kaushal, "Handover optimization scheme for LTE-advance networks based on AHP-TOPSIS and Q-learning," Computer Communications, vol. 133, pp. 67-76, DOI: 10.1016/j.comcom.2018.10.011, 2019.
[22] A. Stamou, N. Dimitriou, K. Kontovasilis, S. Papavassiliou, "Context-aware handover management for HetNets: performance evaluation models and comparative assessment of alternative context acquisition strategies," Computer Networks, vol. 176, article 107272,DOI: 10.1016/j.comnet.2020.107272, 2020.
[23] F. Guidolin, I. Pappalardo, A. Zanella, M. Zorzi, "Context-aware handover policies in HetNets," IEEE Transactions on Wireless Communications, vol. 15 no. 3, pp. 1895-1906, DOI: 10.1109/TWC.2015.2496958, 2016.
[24] J. Liu, X. Tao, J. Lu, "Mobility-aware centralized reinforcement learning for dynamic resource allocation in HetNets," 2019 IEEE global communications conference (GLOBECOM),DOI: 10.1109/GLOBECOM38437.2019.9013191, .
[25] S. He, T. Wang, S. Wang, "Load-aware satellite handover strategy based on multi-agent reinforcement learning," GLOBECOM 2020 - 2020 IEEE Global Communications Conference,DOI: 10.1109/GLOBECOM42002.2020.9322449, .
[26] M. Salehi, E. Hossain, "Handover rate and sojourn time analysis in mobile drone-assisted cellular networks," IEEE Wireless Communications Letters, vol. 10 no. 2, pp. 392-395, DOI: 10.1109/LWC.2020.3032596, 2021.
[27] W. Nasrin, J. Xie, "A joint handoff and offloading decision algorithm for mobile edge computing," 2019 IEEE Global Communications Conference (GLOBECOM),DOI: 10.1109/GLOBECOM38437.2019.9013932, .
[28] M. M. Hasan, S. Kwon, S. Oh, "Frequent-handover mitigation in ultra-dense heterogeneous networks," IEEE Transactions on Vehicular Technology, vol. 68 no. 1, pp. 1035-1040, DOI: 10.1109/TVT.2018.2874692, 2019.
[29] J. Chen, X. Ge, Q. Ni, "Coverage and handoff analysis of 5G fractal small cell networks," IEEE Transactions on Wireless Communications, vol. 18 no. 2, pp. 1263-1276, DOI: 10.1109/TWC.2018.2890662, 2019.
[30] B. Yang, X. Wang, Z. Qian, "A multi-armed bandit model-based vertical handoff algorithm for heterogeneous wireless networks," IEEE Communications Letters, vol. 22 no. 10, pp. 2116-2119, DOI: 10.1109/LCOMM.2018.2861731, 2018.
[31] L. Sun, J. Hou, T. Shu, "Optimal handover policy for mmWave cellular networks a multi-armed bandit approach," 2019 IEEE Global Communications Conference (GLOBECOM),DOI: 10.1109/GLOBECOM38437.2019.9014079, .
[32] D.-F. Wu, C. Huang, Y. Yin, S. Huang, M. W. A. Ashraf, Q. Guo, L. Zhang, "LB-DDQN for handover decision in satellite-terrestrial integrated networks," Wireless Communications and Mobile Computing, vol. 2021,DOI: 10.1155/2021/5871114, 2021.
[33] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis, "Human-level control through deep reinforcement learning," Nature, vol. 518 no. 7540, pp. 529-533, 2015.
[34] M. Wu, W. Huang, K. Sun, H. Zhang, "A DQN-based handover management for SDN-enabled ultra-dense networks," 2020 IEEE 92nd Vehicular Technology Conference (VTC2020-Fall),DOI: 10.1109/VTC2020-Fall49728.2020.9348779, .
[35] J. Sun, Z. Qian, X. Wang, X. Wang, "ES-DQN-based vertical handoff algorithm for heterogeneous wireless networks," IEEE Wireless Communications Letters, vol. 9 no. 8, pp. 1327-1330, DOI: 10.1109/LWC.2020.2990713, 2020.
[36] Z. Wang, T. Schaul, M. Hessel, H. Hasselt, M. Lanctot, N. Freitas, "Dueling network architectures for deep reinforcement learning," Proceedings of The 33rd International Conference on Machine Learning, Proceedings of Machine Learning Research, pp. 1995-2003, .
[37] M. S. Mollel, A. I. Abubakar, M. Ozturk, S. Kaijage, M. Kisangiri, A. Zoha, M. A. Imran, Q. H. Abbasi, "Intelligent handover decision scheme using double deep reinforcement learning," Physical Communication, vol. 42, article 101133,DOI: 10.1016/j.phycom.2020.101133, 2020.
[38] Y. Song, S. H. Lim, S. W. Jeon, "Distributed online handover decisions for energy efficiency in dense HetNets," GLOBECOM 2020 - 2020 IEEE Global Communications Conference,DOI: 10.1109/GLOBECOM42002.2020.9348215, .
[39] Z. Wang, L. Li, Y. Xu, H. Tian, S. Cui, "Handover control in wireless systems via asynchronous multiuser deep reinforcement learning," IEEE Internet of Things Journal, vol. 5 no. 6, pp. 4296-4307, DOI: 10.1109/JIOT.2018.2848295, 2018.
[40] L. L. Z. Wang, Y. Xu, H. Tian, S. Cui, "Handover optimization via asynchronous multi-user deep reinforcement learning," 2018 IEEE International Conference on Communications (ICC),DOI: 10.1109/ICC.2018.8422824, .
[41] D. Guo, L. Tang, X. Zhang, Y.-C. Liang, "Joint optimization of handover control and power allocation based on multi-agent deep reinforcement learning," IEEE Transactions on Vehicular Technology, vol. 69 no. 11, pp. 13124-13138, DOI: 10.1109/TVT.2020.3020400, 2020.
[42] V. M. Nguyen, C. S. Chen, L. Thomas, "A unified stochastic model of handover measurement in mobile networks," IEEE/ACM Transactions on Networking, vol. 22 no. 5, pp. 1559-1576, DOI: 10.1109/TNET.2013.2283577, 2014.
[43] M. T. Nguyen, S. Kwon, H. Kim, "Mobility robustness optimization for handover failure reduction in LTE small-cell networks," IEEE Transactions on Vehicular Technology, vol. 67 no. 5, pp. 4672-4676, DOI: 10.1109/TVT.2017.2787602, 2018.
[44] A. D. D. M. Sana, E. C. Strinati, A. Clemente, "Multi-agent deep reinforcement learning for distributed handover management in dense mmWave networks," ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8976-8980, DOI: 10.1109/ICASSP40776.2020.9052936, .
[45] J. Cai, C. Wang, M. Lei, M. J. Zhao, "An intelligent routing algorithm based on prioritized replay double DQN for MANET," 2020 IEEE 92nd Vehicular Technology Conference (VTC2020-Fall),DOI: 10.1109/VTC2020-Fall49728.2020.9348471, .
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright © 2022 Dong-Fang Wu et al. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
The traditional handover decision methods depend on the handover threshold and measurement reports, which cannot efficiently resolve the frequent handover issue and ping-pong effect in 5G (5 generation) ultradense networks. To reduce the unnecessary handover and improve the QoS (quality of service), combine with the analysis of dwell time, we propose a state aware-based prioritized experience replay (SA-PER) handover decision method. First, the cell dwell time is computed by the geometrical analysis of real-time locations of mobile users in cellular networks. The constructed state aware sequence including SINR, load coefficient, and dwell time is normalized by max-min normalization method. Then, the handover decision problem in 5G ultradense networks is formalized as a discrete Markov decision process (MDP). The random sampling and small batch sampling affect the performance of deep reinforcement learning methods. We adopt the prioritized experience replay (PER) method to resolve the learning efficiency problems. The state space, action space, and reward functions are designed. The normalized state aware decision matrix inputs the DDQN (double deep Q-network) method. The competitive and collaborative relationships between vertical handover and horizontal handover in 5G ultradense networks are mainly discussed. And the high average network throughput and long average cell dwell time make sure of the communication quality for mobile users.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details



1 School of Computer Science, Wuhan University, Wuhan 430072, China; Hubei LuoJia Laboratory, Wuhan 430072, China
2 School of Information Engineering, Zhengzhou Institute of Finance and Economics, Zhengzhou 450053, China
3 Wuhan Maritime Communication Research Institute, Wuhan 430072, China