A Novel Framework For The Segmentation Of Breast Cancer Images

Abstract

Breast cancer is the second major leading cause of cancer fatality in women. Mammography prevails the best method for initial detection of cancers of the breast, capable of identifying small pieces up to two years before they grow large enough to be evident on physical testing. X-ray images of the breast must be accurately evaluated to identify beginning signs of cancerous growth. Segmenting, or partitioning, radiographic images into regions of similar texture is usually performed during the method of image analysis and interpretation. The comparative lack of structure definition in mammographic images and the implied transition from one texture to makes segmentation remarkably hard. The task of analyzing different texture areas can be considered a form of the exploratory report since a priori awareness about the number of different regions in the image is not known. This paper presents a segmentation method by using SOM.

Introduction

According to the USA Cancer Society, breast cancer is in second place as the most common type of cancer afflicting women but remains the leading cause of cancer mortality in women within the ages of 40 and 55. Recent year in the United States, approximately 180, 200 women will be diagnosed with invasive breast cancer. Meanwhile the same year, about 44, 190 women will lose the fight against this life-threatening disease. Although the proportion of new breast cancer rose on average 4 percent between the years 1982 and 2017, the percentage rate has tapered off to just over one percent in the years since. Much of this welcome decrease in new breast cancer diagnoses have been attributed to the increased use of mammography to detect early stages of this disease. Although significant prediction technique has been made in the technology of mammography, much work remains to be done to improve overall detection accuracy. Segmentation is the process of partitioning an image into multiple fragments. All the pixels in a region are similar to some characteristic, such as color, intensity, or texture. Artificial neural networks are parallel computational models, comprised of densely interconnected adaptive processing units. An essential feature of these networks is that they learn by example. The adaptive nature of the artificial neural networks makes it more suitable for applications where one has a little or incomplete understanding of the problem to be solved but where training data is readily available.

Self-Organizing Map

Self-Organizing Map is used to project the high dimensional data on to the two-dimensional map. The dimensional reduction could allow us to visualize the important relationship among the data more easily. The topology structure property which is observed in the brain is also noted in SOM which is not observed in any other artificial neural network. It is said to be topology preserving since it preserves the neighborhood relation of the input pattern. The units that are physically located next to each other will respond to classes of input vectors that are likewise located next to each other. The basic SOM model consists of an input layer and an output layer. The SOM network consists of neurons which are similar to the neurons in the brain. Each input is fully connected to all the units. The number of neurons in the output layer depends on the number of clusters in the image to be segmented, i. e. , the number of clusters is equal to the number of output neurons.

Color is one of the essential features used for image segmentation. SOM is used to map patterns in a three-dimensional color space to a two-dimensional space. SOM learns through competition. For each input vector, only one neuron in the network will respond. This mechanism is known as competition. Once a neuron is elected as a winner, the weights of that neuron and the neurons in the neighborhood of the victor are updated. The neighborhood scheme for SOM may be rectangular, hexagonal or circular. The multicomponent values are given as input for training. Initially, the learning rate is set to 0. 1, and the neighborhood size is initialized to the maximum of either the height or width of the network divided by two. The weight vectors of the neurons are initialized randomly. For every iteration, the input vectors to be clustered are presented to the network in a random order. The neurons with weight vector that best match the input vector is elected as the winner or the best matching unit (BMU). The winner is elected by using the Euclidean distance method which is as follows where x is the input vector, W is the weight of the winning unit i at each iteration k. The winning neuron and the neurons within the neighborhood of the winning unit are updated in such a way that their weights become closer to the input vector being presented to the network. The weights are updated as follows. where H is the smoothing kernel defined over the winning neuron. The kernel can be written concerning Gaussian function as where d is the distance between the winning neuron and the neuron i and σ is the neighborhood distance, and σ k is the learning rate at iteration k.

The learning rate and the neighborhood size are updated after each iteration. As the number of iterations increases, the learning rate and the neighborhood decreases. The learning rate is exponentially reduced as follows where σ 0 is the initial learning rate, and T is the total number of iterations which is set to 1000. The decreasing function for the neighborhood is given as follows where σ0 is the initial neighborhood size and σ k is the neighborhood size at iteration k. The size of the neighborhood is decreased until it encompasses a single unit. Once the SOM converges, the input is mapped from a high color space to a two-dimensional map. The final result of SOM depends on the initial values of weights, data used for training, and the characteristics of the map such as some nodes in the network, learning rate, and the neighborhood. SOM suffers from a drawback of over-segmentation. So, an optimization method like genetic algorithm is used to identify the optimal number of clusters. The data set identified from SOM is given as an input to an optimization method for identifying the cluster centers.

Proposed Framework for the classification of Breast Cancer Images

The proposed framework for the classification of breast cancer cell using Pre-processing step, Segmentation step, feature extraction step and Classification step. In this work, the noise of the image is removed by using Riotous Clustering algorithm in the pre-processing step; SOM is used in the segmentation step.

Pre-Processing with Riotous Clustering

Algorithm Image representation based on superpixels has become indispensable for improving efficiency in Computer Vision systems. Object recognition, segmentation, depth estimation, and body model estimation are some important problems where superpixels can be applied. However, superpixels can influence the efficacy of the system positively or negatively, depending on how well they respect the object boundaries in the image. In this work, a novel pre-processing algorithm has proposed to remove the noise in the image and to enhance the segmentation of the image. This proposed algorithm composed of Simple Linear Iterative Clustering (SLIC) and fusing optimization.

Segmentation using SOM

Clustering process in this study uses gray scale values of each pixel as an input to SOM method. Neighborhood topology which is used in SOM method in this study is a linear array or also known as one dimensional (1-D) topology. Calculation of SOM algorithm is split into two stages, the stage of learning, and recognition stage. In this research, to determine the distance does not use Euclidean Distance, but it uses Normalized Euclidean Distance.

The computation of the Normalized Euclidean Distance is modified form of the Euclidean Distance [8]. Normalized Euclidean Distance of two vectors, between vector u and vector v is shown by the following equation. where ||v|| is normalized value of vector v. The normalized value is expressed in the following equation

Davies-Bouldin Index (DBI) DBI was preceded in 1979 by David L. Davies and Donald W. DBI is applied to evaluate the clustering results. DBI is a method to measure the ratio of the total within-cluster scatter (a spread of the cluster) and the between-cluster separation (distance between clusters). The distance between clusters is calculated by the Euclidean distance between the center of ith cluster and center of jth cluster. Following equation is used to calculate its distance. Rij is ratio value between ith cluster and jth cluster, which is calculated by the following formula. Finding the maximum value of the ratio (Di), it is used to find the value of DBI.

Validity Measure (VM) is one of the indexes to test the validity of clustering results. VM is commonly used in the application of image segmentation based on clustering. VM is calculated using below given equation. where intra is intra-cluster distance, inter is inter-cluster distance, and y is a function of the number of the clusters that is formed. Below equation is used to find the value of intra-cluster distance. where N is a total number of pixels in an image, k is number of clusters, and zi is the center of cluster Ci. Also, to calculate VM, takes the minimum value of inter-cluster distance.

Conclusion

In this paper, from the proposed framework, Normalized Euclidean Distance performs clustering well and gives segmentation results as in human perception. Using this proposed work, the segmentation of the breast cancer images can be done unsupervised and automatically, by utilizing measurement of cluster validity. Davies-Bouldin Index (DBI) and Validity Measurement (VM) indexes comparatively affords distinct of optimal number of clusters. For each breast cancer images, the optimal numbers of clusters which are developed by DBI, on average are less than the results which are obtained by VM.

15 July 2020
close
Your Email

By clicking “Send”, you agree to our Terms of service and  Privacy statement. We will occasionally send you account related emails.

close thanks-icon
Thanks!

Your essay sample has been sent.

Order now
exit-popup-close
exit-popup-image
Still can’t find what you need?

Order custom paper and save your time
for priority classes!

Order paper now