Data mining and machine learning studies on the subject “defect prediction.”
\\n\\n
Dr. Pletser’s experience includes 30 years of working with the European Space Agency as a Senior Physicist/Engineer and coordinating their parabolic flight campaigns, and he is the Guinness World Record holder for the most number of aircraft flown (12) in parabolas, personally logging more than 7,300 parabolas.
\\n\\nSeeing the 5,000th book published makes us at the same time proud, happy, humble, and grateful. This is a great opportunity to stop and celebrate what we have done so far, but is also an opportunity to engage even more, grow, and succeed. It wouldn't be possible to get here without the synergy of team members’ hard work and authors and editors who devote time and their expertise into Open Access book publishing with us.
\\n\\nOver these years, we have gone from pioneering the scientific Open Access book publishing field to being the world’s largest Open Access book publisher. Nonetheless, our vision has remained the same: to meet the challenges of making relevant knowledge available to the worldwide community under the Open Access model.
\\n\\nWe are excited about the present, and we look forward to sharing many more successes in the future.
\\n\\nThank you all for being part of the journey. 5,000 times thank you!
\\n\\nNow with 5,000 titles available Open Access, which one will you read next?
\\n\\nRead, share and download for free: https://www.intechopen.com/books
\\n\\n\\n\\n
\\n"}]',published:!0,mainMedia:null},components:[{type:"htmlEditorComponent",content:'
Preparation of Space Experiments edited by international leading expert Dr. Vladimir Pletser, Director of Space Training Operations at Blue Abyss is the 5,000th Open Access book published by IntechOpen and our milestone publication!
\n\n"This book presents some of the current trends in space microgravity research. The eleven chapters introduce various facets of space research in physical sciences, human physiology and technology developed using the microgravity environment not only to improve our fundamental understanding in these domains but also to adapt this new knowledge for application on earth." says the editor. Listen what else Dr. Pletser has to say...
\n\n\n\nDr. Pletser’s experience includes 30 years of working with the European Space Agency as a Senior Physicist/Engineer and coordinating their parabolic flight campaigns, and he is the Guinness World Record holder for the most number of aircraft flown (12) in parabolas, personally logging more than 7,300 parabolas.
\n\nSeeing the 5,000th book published makes us at the same time proud, happy, humble, and grateful. This is a great opportunity to stop and celebrate what we have done so far, but is also an opportunity to engage even more, grow, and succeed. It wouldn't be possible to get here without the synergy of team members’ hard work and authors and editors who devote time and their expertise into Open Access book publishing with us.
\n\nOver these years, we have gone from pioneering the scientific Open Access book publishing field to being the world’s largest Open Access book publisher. Nonetheless, our vision has remained the same: to meet the challenges of making relevant knowledge available to the worldwide community under the Open Access model.
\n\nWe are excited about the present, and we look forward to sharing many more successes in the future.
\n\nThank you all for being part of the journey. 5,000 times thank you!
\n\nNow with 5,000 titles available Open Access, which one will you read next?
\n\nRead, share and download for free: https://www.intechopen.com/books
\n\n\n\n
\n'}],latestNews:[{slug:"stanford-university-identifies-top-2-scientists-over-1-000-are-intechopen-authors-and-editors-20210122",title:"Stanford University Identifies Top 2% Scientists, Over 1,000 are IntechOpen Authors and Editors"},{slug:"intechopen-authors-included-in-the-highly-cited-researchers-list-for-2020-20210121",title:"IntechOpen Authors Included in the Highly Cited Researchers List for 2020"},{slug:"intechopen-maintains-position-as-the-world-s-largest-oa-book-publisher-20201218",title:"IntechOpen Maintains Position as the World’s Largest OA Book Publisher"},{slug:"all-intechopen-books-available-on-perlego-20201215",title:"All IntechOpen Books Available on Perlego"},{slug:"oiv-awards-recognizes-intechopen-s-editors-20201127",title:"OIV Awards Recognizes IntechOpen's Editors"},{slug:"intechopen-joins-crossref-s-initiative-for-open-abstracts-i4oa-to-boost-the-discovery-of-research-20201005",title:"IntechOpen joins Crossref's Initiative for Open Abstracts (I4OA) to Boost the Discovery of Research"},{slug:"intechopen-hits-milestone-5-000-open-access-books-published-20200908",title:"IntechOpen hits milestone: 5,000 Open Access books published!"},{slug:"intechopen-books-hosted-on-the-mathworks-book-program-20200819",title:"IntechOpen Books Hosted on the MathWorks Book Program"}]},book:{item:{type:"book",id:"8686",leadTitle:null,fullTitle:"Direct Torque Control Strategies of Electrical Machines",title:"Direct Torque Control Strategies of Electrical Machines",subtitle:null,reviewType:"peer-reviewed",abstract:"This book deals with the design and analysis of Direct Torque Control (DTC). It introduces readers to two major applications of electrical machines: speed drive and position control and gives the readers a comprehensive overview of the field of DTC dedicated to AC machines. It includes new DTC approaches with and without control of commutation frequency. It also covers DTC applications using artificial intelligence. The book combines theoretical analysis, simulation, and experimental concepts.To make the content as accessible as possible, the book employs a clear proposal in each chapter, moving from the background, to numerical development, and finally to case studies and illustrations. The book is a wide-ranging reference source for graduate students, researchers, and professors from related fields and it will benefit practicing engineers and experts from the industry.",isbn:"978-1-83880-296-7",printIsbn:"978-1-83880-295-0",pdfIsbn:"978-1-83880-359-9",doi:"10.5772/intechopen.80103",price:119,priceEur:129,priceUsd:155,slug:"direct-torque-control-strategies-of-electrical-machines",numberOfPages:172,isOpenForSubmission:!1,isInWos:null,hash:"b6ad22b14db2b8450228545d3d4f6b1a",bookSignature:"Fatma Ben Salem",publishedDate:"January 20th 2021",coverURL:"https://cdn.intechopen.com/books/images_new/8686.jpg",numberOfDownloads:1108,numberOfWosCitations:0,numberOfCrossrefCitations:1,numberOfDimensionsCitations:1,hasAltmetrics:0,numberOfTotalCitations:2,isAvailableForWebshopOrdering:!0,dateEndFirstStepPublish:"April 10th 2019",dateEndSecondStepPublish:"June 3rd 2019",dateEndThirdStepPublish:"August 2nd 2019",dateEndFourthStepPublish:"October 21st 2019",dateEndFifthStepPublish:"December 20th 2019",currentStepOfPublishingProcess:5,indexedIn:"1,2,3,4,5,6,7",editedByType:"Edited by",kuFlag:!1,editors:[{id:"295623",title:"Associate Prof.",name:"Fatma",middleName:null,surname:"Ben Salem",slug:"fatma-ben-salem",fullName:"Fatma Ben Salem",profilePictureURL:"https://mts.intechopen.com/storage/users/295623/images/system/295623.jpeg",biography:"Fatma Ben Salem was born in Sfax, Tunisia, in 1978. She received her BS, MSc, Ph.D., and HCR degrees in 2002, 2003, 2010, and 2015, respectively, all in electrical engineering from the National Engineering School of Sfax, University of Sfax, Tunisia. She is an associate professor of electrical engineering at the High Institute of Industrial Management of Sfax, Tunisia. She is a member of the Control Energy Management Laboratory (CEMLab) of the University of Sfax. She is the author of several journals and international conference papers. She has participated in writing book chapters and in the organization of international conferences and workshops. She is an IEEE member. Her main research interests cover several aspects related to the control and the diagnostics of electrical machine drives and generators involved in automotive as well as in renewable energy systems.",institutionString:"University of Sfax",position:null,outsideEditionCount:0,totalCites:0,totalAuthoredChapters:"2",totalChapterViews:"0",totalEditedBooks:"1",institution:{name:"University of Sfax",institutionURL:null,country:{name:"Tunisia"}}}],equalEditorOne:null,equalEditorTwo:null,equalEditorThree:null,coeditorOne:null,coeditorTwo:null,coeditorThree:null,coeditorFour:null,coeditorFive:null,topics:[{id:"737",title:"Electromagnetism",slug:"electrical-and-electronic-engineering-electromagnetism"}],chapters:[{id:"71268",title:"Improved Direct Torque Control Based on Neural Network of the Double-Star Induction Machine Using Deferent Multilevel Inverter",doi:"10.5772/intechopen.89877",slug:"improved-direct-torque-control-based-on-neural-network-of-the-double-star-induction-machine-using-de",totalDownloads:89,totalCrossrefCites:0,totalDimensionsCites:0,signatures:"Mohamed Haithem Lazreg and Abderrahim Bentaallah",downloadPdfUrl:"/chapter/pdf-download/71268",previewPdfUrl:"/chapter/pdf-preview/71268",authors:[{id:"293688",title:"Ph.D. Student",name:"Mohamed Haithem",surname:"Lazreg",slug:"mohamed-haithem-lazreg",fullName:"Mohamed Haithem Lazreg"},{id:"293689",title:"Prof.",name:"Abderrahim",surname:"Bentaallah",slug:"abderrahim-bentaallah",fullName:"Abderrahim Bentaallah"}],corrections:null},{id:"70427",title:"Direct Torque Control Strategies of Induction Machine: Comparative Studies",doi:"10.5772/intechopen.90199",slug:"direct-torque-control-strategies-of-induction-machine-comparative-studies",totalDownloads:400,totalCrossrefCites:1,totalDimensionsCites:1,signatures:"Cherifi Djamila and Miloud Yahia",downloadPdfUrl:"/chapter/pdf-download/70427",previewPdfUrl:"/chapter/pdf-preview/70427",authors:[{id:"306238",title:"Dr.",name:"Cherifi",surname:"Djamila",slug:"cherifi-djamila",fullName:"Cherifi Djamila"},{id:"310114",title:"Prof.",name:"Miloud",surname:"Yahia",slug:"miloud-yahia",fullName:"Miloud Yahia"}],corrections:null},{id:"74056",title:"DTC-SVM Approaches of an Induction Motor Dedicated to Position Control Applications",doi:"10.5772/intechopen.94436",slug:"dtc-svm-approaches-of-an-induction-motor-dedicated-to-position-control-applications",totalDownloads:86,totalCrossrefCites:0,totalDimensionsCites:0,signatures:"Fatma Ben Salem",downloadPdfUrl:"/chapter/pdf-download/74056",previewPdfUrl:"/chapter/pdf-preview/74056",authors:[{id:"295623",title:"Associate Prof.",name:"Fatma",surname:"Ben Salem",slug:"fatma-ben-salem",fullName:"Fatma Ben Salem"}],corrections:null},{id:"72125",title:"Flux Reversal Machine Design",doi:"10.5772/intechopen.92428",slug:"flux-reversal-machine-design",totalDownloads:186,totalCrossrefCites:0,totalDimensionsCites:0,signatures:"Yuting Gao and Yang Liu",downloadPdfUrl:"/chapter/pdf-download/72125",previewPdfUrl:"/chapter/pdf-preview/72125",authors:[{id:"305837",title:"Dr.",name:"Yuting",surname:"Gao",slug:"yuting-gao",fullName:"Yuting Gao"},{id:"310020",title:"MSc.",name:"Yang",surname:"Liu",slug:"yang-liu",fullName:"Yang Liu"}],corrections:null},{id:"70396",title:"Predictive Direct Torque Control Strategy for Doubly Fed Induction Machine for Torque and Flux Ripple Minimization",doi:"10.5772/intechopen.89979",slug:"predictive-direct-torque-control-strategy-for-doubly-fed-induction-machine-for-torque-and-flux-rippl",totalDownloads:165,totalCrossrefCites:0,totalDimensionsCites:0,signatures:"Gopala Venu Madhav and Y.P. Obulesu",downloadPdfUrl:"/chapter/pdf-download/70396",previewPdfUrl:"/chapter/pdf-preview/70396",authors:[{id:"230750",title:"Dr.",name:"Dr. Venu Madhav",surname:"Gopala",slug:"dr.-venu-madhav-gopala",fullName:"Dr. Venu Madhav Gopala"},{id:"232357",title:"Dr.",name:"Obulesu",surname:"Y. P.",slug:"obulesu-y.-p.",fullName:"Obulesu Y. P."}],corrections:null},{id:"74487",title:"Study of the Parameters of the Planner with a Screw Working Body",doi:"10.5772/intechopen.93308",slug:"study-of-the-parameters-of-the-planner-with-a-screw-working-body",totalDownloads:63,totalCrossrefCites:0,totalDimensionsCites:0,signatures:"Juraev Tojiddin Khayrullaevich, Norov Sobirjon Negmurodovich and Musulmanov Furqat Shodiyevich",downloadPdfUrl:"/chapter/pdf-download/74487",previewPdfUrl:"/chapter/pdf-preview/74487",authors:[{id:"268891",title:"Ph.D.",name:"Tojiddin",surname:"Juraev",slug:"tojiddin-juraev",fullName:"Tojiddin Juraev"},{id:"319875",title:"MSc.",name:"Sobir",surname:"Norov",slug:"sobir-norov",fullName:"Sobir Norov"},{id:"341386",title:"Dr.",name:"Musulmanov",surname:"Furqat Shodiyevich",slug:"musulmanov-furqat-shodiyevich",fullName:"Musulmanov Furqat Shodiyevich"}],corrections:null},{id:"73820",title:"Torque Ripple Reduction in DTC Induction Motor Drive",doi:"10.5772/intechopen.94225",slug:"torque-ripple-reduction-in-dtc-induction-motor-drive",totalDownloads:124,totalCrossrefCites:0,totalDimensionsCites:0,signatures:"Adhavan Balashanmugham, Maheswaran Mockaisamy and Sathiyanathan Murugesan",downloadPdfUrl:"/chapter/pdf-download/73820",previewPdfUrl:"/chapter/pdf-preview/73820",authors:[{id:"303946",title:"Dr.",name:"Adhavan",surname:"Balashanmugham",slug:"adhavan-balashanmugham",fullName:"Adhavan Balashanmugham"},{id:"304330",title:"Dr.",name:"Maheswaran",surname:"Mockaisam",slug:"maheswaran-mockaisam",fullName:"Maheswaran Mockaisam"},{id:"329199",title:"Dr.",name:"Sathiyanathan",surname:"Murugesan",slug:"sathiyanathan-murugesan",fullName:"Sathiyanathan Murugesan"}],corrections:null}],productType:{id:"1",title:"Edited Volume",chapterContentType:"chapter",authoredCaption:"Edited by"}},relatedBooks:[{type:"book",id:"83",title:"Properties and Applications of Silicon Carbide",subtitle:null,isOpenForSubmission:!1,hash:"7e94cc189847633076ad5b2d23117c2f",slug:"properties-and-applications-of-silicon-carbide",bookSignature:"Rosario Gerhardt",coverURL:"https://cdn.intechopen.com/books/images_new/83.jpg",editedByType:"Edited by",editors:[{id:"19005",title:"Prof.",name:"Rosario",surname:"Gerhardt",slug:"rosario-gerhardt",fullName:"Rosario Gerhardt"}],equalEditorOne:null,equalEditorTwo:null,equalEditorThree:null,productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}},{type:"book",id:"166",title:"Electromagnetic Waves",subtitle:null,isOpenForSubmission:!1,hash:"6561a39a2e8aaffc6cde23ecd65cdfde",slug:"electromagnetic-waves",bookSignature:"Vitaliy Zhurbenko",coverURL:"https://cdn.intechopen.com/books/images_new/166.jpg",editedByType:"Edited by",editors:[{id:"3721",title:"Prof.",name:"Vitaliy",surname:"Zhurbenko",slug:"vitaliy-zhurbenko",fullName:"Vitaliy Zhurbenko"}],equalEditorOne:null,equalEditorTwo:null,equalEditorThree:null,productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}},{type:"book",id:"2431",title:"Dielectric Material",subtitle:null,isOpenForSubmission:!1,hash:"70942e6b7ab8fb1bfa75537709d3910d",slug:"dielectric-material",bookSignature:"Marius Alexandru Silaghi",coverURL:"https://cdn.intechopen.com/books/images_new/2431.jpg",editedByType:"Edited by",editors:[{id:"128198",title:"Dr.",name:"Marius Alexandru",surname:"Silaghi",slug:"marius-alexandru-silaghi",fullName:"Marius Alexandru Silaghi"}],equalEditorOne:null,equalEditorTwo:null,equalEditorThree:null,productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}},{type:"book",id:"3706",title:"Wave Propagation in Materials for Modern Applications",subtitle:null,isOpenForSubmission:!1,hash:null,slug:"wave-propagation-in-materials-for-modern-applications",bookSignature:"Andrey Petrin",coverURL:"https://cdn.intechopen.com/books/images_new/3706.jpg",editedByType:"Edited by",editors:[{id:"7760",title:"Dr.",name:"Andrey",surname:"Petrin",slug:"andrey-petrin",fullName:"Andrey Petrin"}],equalEditorOne:null,equalEditorTwo:null,equalEditorThree:null,productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}},{type:"book",id:"1647",title:"Trends in Electromagnetism",subtitle:"From Fundamentals to Applications",isOpenForSubmission:!1,hash:"2c0ce5e84f67194c32ed9659512218c3",slug:"trends-in-electromagnetism-from-fundamentals-to-applications",bookSignature:"Victor Barsan and Radu P. Lungu",coverURL:"https://cdn.intechopen.com/books/images_new/1647.jpg",editedByType:"Edited by",editors:[{id:"100805",title:"Dr.",name:"Victor",surname:"Barsan",slug:"victor-barsan",fullName:"Victor Barsan"}],equalEditorOne:null,equalEditorTwo:null,equalEditorThree:null,productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}},{type:"book",id:"414",title:"Electromagnetic Waves",subtitle:"Propagation in Complex Matter",isOpenForSubmission:!1,hash:"d251c3cb1734bca598d8abbed939a1be",slug:"electromagnetic-waves-propagation-in-complex-matter",bookSignature:"Ahmed Kishk",coverURL:"https://cdn.intechopen.com/books/images_new/414.jpg",editedByType:"Edited by",editors:[{id:"73920",title:"Prof.",name:"Ahmed",surname:"Kishk",slug:"ahmed-kishk",fullName:"Ahmed Kishk"}],equalEditorOne:null,equalEditorTwo:null,equalEditorThree:null,productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}},{type:"book",id:"415",title:"Behaviour of Electromagnetic Waves in Different Media and Structures",subtitle:null,isOpenForSubmission:!1,hash:"8496cd6f3c63a2b4d0b69076ec095343",slug:"behavior-of-electromagnetic-waves-in-different-media-and-structures",bookSignature:"Ali Akdagli",coverURL:"https://cdn.intechopen.com/books/images_new/415.jpg",editedByType:"Edited by",editors:[{id:"76005",title:"Prof.",name:"Ali",surname:"Akdagli",slug:"ali-akdagli",fullName:"Ali Akdagli"}],equalEditorOne:null,equalEditorTwo:null,equalEditorThree:null,productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}},{type:"book",id:"10093",title:"Electromagnetic Propagation and Waveguides in Photonics and Microwave Engineering",subtitle:null,isOpenForSubmission:!1,hash:"1aa3bf83f471bb1591950efa117c6fec",slug:"electromagnetic-propagation-and-waveguides-in-photonics-and-microwave-engineering",bookSignature:"Patrick Steglich",coverURL:"https://cdn.intechopen.com/books/images_new/10093.jpg",editedByType:"Edited by",editors:[{id:"223128",title:"Dr.",name:"Patrick",surname:"Steglich",slug:"patrick-steglich",fullName:"Patrick Steglich"}],equalEditorOne:null,equalEditorTwo:null,equalEditorThree:null,productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}},{type:"book",id:"7617",title:"Electromagnetic Fields and Waves",subtitle:null,isOpenForSubmission:!1,hash:"d87c09ddaa95c04479ffa2579e9f16d2",slug:"electromagnetic-fields-and-waves",bookSignature:"Kim Ho Yeap and Kazuhiro Hirasawa",coverURL:"https://cdn.intechopen.com/books/images_new/7617.jpg",editedByType:"Edited by",editors:[{id:"126825",title:"Dr.",name:"Kim Ho",surname:"Yeap",slug:"kim-ho-yeap",fullName:"Kim Ho Yeap"}],equalEditorOne:null,equalEditorTwo:null,equalEditorThree:null,productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}},{type:"book",id:"1591",title:"Infrared Spectroscopy",subtitle:"Materials Science, Engineering and Technology",isOpenForSubmission:!1,hash:"99b4b7b71a8caeb693ed762b40b017f4",slug:"infrared-spectroscopy-materials-science-engineering-and-technology",bookSignature:"Theophile Theophanides",coverURL:"https://cdn.intechopen.com/books/images_new/1591.jpg",editedByType:"Edited by",editors:[{id:"37194",title:"Dr.",name:"Theophanides",surname:"Theophile",slug:"theophanides-theophile",fullName:"Theophanides Theophile"}],equalEditorOne:null,equalEditorTwo:null,equalEditorThree:null,productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}}],ofsBooks:[]},correction:{item:{id:"68579",slug:"corrigendum-to-industrial-heat-exchanger-operation-and-maintenance-to-minimize-fouling-and-corrosion",title:"Corrigendum to: Industrial Heat Exchanger: Operation and Maintenance to Minimize Fouling and Corrosion",doi:null,correctionPDFUrl:"https://cdn.intechopen.com/pdfs/68579.pdf",downloadPdfUrl:"/chapter/pdf-download/68579",previewPdfUrl:"/chapter/pdf-preview/68579",totalDownloads:null,totalCrossrefCites:null,bibtexUrl:"/chapter/bibtex/68579",risUrl:"/chapter/ris/68579",chapter:{id:"52929",slug:"industrial-heat-exchanger-operation-and-maintenance-to-minimize-fouling-and-corrosion",signatures:"Teng Kah Hou, Salim Newaz Kazi, Abu Bakar Mahat, Chew Bee Teng,\nAhmed Al-Shamma’a and Andy Shaw",dateSubmitted:"March 23rd 2016",dateReviewed:"October 10th 2016",datePrePublished:null,datePublished:"April 26th 2017",book:{id:"6080",title:"Heat Exchangers",subtitle:"Advanced Features and Applications",fullTitle:"Heat Exchangers - Advanced Features and Applications",slug:"heat-exchangers-advanced-features-and-applications",publishedDate:"April 26th 2017",bookSignature:"S M Sohel Murshed and Manuel Matos Lopes",coverURL:"https://cdn.intechopen.com/books/images_new/6080.jpg",licenceType:"CC BY 3.0",editedByType:"Edited by",editors:[{id:"24904",title:"Prof.",name:"S. M. Sohel",middleName:null,surname:"Murshed",slug:"s.-m.-sohel-murshed",fullName:"S. M. Sohel Murshed"}],productType:{id:"1",title:"Edited Volume",chapterContentType:"chapter",authoredCaption:"Edited by"}},authors:[{id:"93483",title:"Dr.",name:"Salim Newaz",middleName:null,surname:"Kazi",fullName:"Salim Newaz Kazi",slug:"salim-newaz-kazi",email:"salimnewaz@um.edu.my",position:null,institution:{name:"University of Malaya",institutionURL:null,country:{name:"Malaysia"}}},{id:"187135",title:"Ph.D.",name:"Kah Hou",middleName:null,surname:"Teng",fullName:"Kah Hou Teng",slug:"kah-hou-teng",email:"alex_teng1989@hotmail.com",position:null,institution:{name:"Liverpool John Moores University",institutionURL:null,country:{name:"United Kingdom"}}},{id:"194347",title:"Prof.",name:"Abu Bakar",middleName:null,surname:"Mahat",fullName:"Abu Bakar Mahat",slug:"abu-bakar-mahat",email:"ir_abakar@um.edu.my",position:null,institution:null},{id:"194348",title:"Dr.",name:"Bee Teng",middleName:null,surname:"Chew",fullName:"Bee Teng Chew",slug:"bee-teng-chew",email:"chewbeeteng@um.edu.my",position:null,institution:null},{id:"194349",title:"Prof.",name:"Ahmed",middleName:null,surname:"Al-Shamma'A",fullName:"Ahmed Al-Shamma'A",slug:"ahmed-al-shamma'a",email:"A.Al-Shamma'a@ljmu.ac.uk",position:null,institution:null},{id:"194350",title:"Prof.",name:"Andy",middleName:null,surname:"Shaw",fullName:"Andy Shaw",slug:"andy-shaw",email:"A.Shaw@ljmu.ac.uk",position:null,institution:null}]}},chapter:{id:"52929",slug:"industrial-heat-exchanger-operation-and-maintenance-to-minimize-fouling-and-corrosion",signatures:"Teng Kah Hou, Salim Newaz Kazi, Abu Bakar Mahat, Chew Bee Teng,\nAhmed Al-Shamma’a and Andy Shaw",dateSubmitted:"March 23rd 2016",dateReviewed:"October 10th 2016",datePrePublished:null,datePublished:"April 26th 2017",book:{id:"6080",title:"Heat Exchangers",subtitle:"Advanced Features and Applications",fullTitle:"Heat Exchangers - Advanced Features and Applications",slug:"heat-exchangers-advanced-features-and-applications",publishedDate:"April 26th 2017",bookSignature:"S M Sohel Murshed and Manuel Matos Lopes",coverURL:"https://cdn.intechopen.com/books/images_new/6080.jpg",licenceType:"CC BY 3.0",editedByType:"Edited by",editors:[{id:"24904",title:"Prof.",name:"S. M. Sohel",middleName:null,surname:"Murshed",slug:"s.-m.-sohel-murshed",fullName:"S. M. Sohel Murshed"}],productType:{id:"1",title:"Edited Volume",chapterContentType:"chapter",authoredCaption:"Edited by"}},authors:[{id:"93483",title:"Dr.",name:"Salim Newaz",middleName:null,surname:"Kazi",fullName:"Salim Newaz Kazi",slug:"salim-newaz-kazi",email:"salimnewaz@um.edu.my",position:null,institution:{name:"University of Malaya",institutionURL:null,country:{name:"Malaysia"}}},{id:"187135",title:"Ph.D.",name:"Kah Hou",middleName:null,surname:"Teng",fullName:"Kah Hou Teng",slug:"kah-hou-teng",email:"alex_teng1989@hotmail.com",position:null,institution:{name:"Liverpool John Moores University",institutionURL:null,country:{name:"United Kingdom"}}},{id:"194347",title:"Prof.",name:"Abu Bakar",middleName:null,surname:"Mahat",fullName:"Abu Bakar Mahat",slug:"abu-bakar-mahat",email:"ir_abakar@um.edu.my",position:null,institution:null},{id:"194348",title:"Dr.",name:"Bee Teng",middleName:null,surname:"Chew",fullName:"Bee Teng Chew",slug:"bee-teng-chew",email:"chewbeeteng@um.edu.my",position:null,institution:null},{id:"194349",title:"Prof.",name:"Ahmed",middleName:null,surname:"Al-Shamma'A",fullName:"Ahmed Al-Shamma'A",slug:"ahmed-al-shamma'a",email:"A.Al-Shamma'a@ljmu.ac.uk",position:null,institution:null},{id:"194350",title:"Prof.",name:"Andy",middleName:null,surname:"Shaw",fullName:"Andy Shaw",slug:"andy-shaw",email:"A.Shaw@ljmu.ac.uk",position:null,institution:null}]},book:{id:"6080",title:"Heat Exchangers",subtitle:"Advanced Features and Applications",fullTitle:"Heat Exchangers - Advanced Features and Applications",slug:"heat-exchangers-advanced-features-and-applications",publishedDate:"April 26th 2017",bookSignature:"S M Sohel Murshed and Manuel Matos Lopes",coverURL:"https://cdn.intechopen.com/books/images_new/6080.jpg",licenceType:"CC BY 3.0",editedByType:"Edited by",editors:[{id:"24904",title:"Prof.",name:"S. M. Sohel",middleName:null,surname:"Murshed",slug:"s.-m.-sohel-murshed",fullName:"S. M. Sohel Murshed"}],productType:{id:"1",title:"Edited Volume",chapterContentType:"chapter",authoredCaption:"Edited by"}}},ofsBook:{item:{type:"book",id:"8640",leadTitle:null,title:"Cancer Chemoresistance",subtitle:null,reviewType:"peer-reviewed",abstract:"
\r\n\tNeoplasia is considered to be the result of aberrations in the homeostatic mechanisms regulating cell turnover. This may be due to a combination of genetic, epigenetic and stochastic factors. Drug resistance is one of the major reasons for treatment failure and tumor relapse. Hence, an improved understanding of the mechanisms of neoplastic growth is possible based on systematic cataloging of the druggable targets for each of the major hallmarks of cancer. This approach can also aid in the refinement and validation of the in vitro and in vivo models, as well as provide pointers for the development of novel systems. Genomic instability and mutation, as well as tumor-promoting inflammation should be included as underlying factors influencing the process of neoplasia along with factors dysregulating energetics. The focus of this book is to update the reader on an integrated perspective including epigenetics data and corroborative mechanistic evidence from multiple model systems including humans. This will enable the reader to better comprehend the current scenario in terms of the pharmacological aspects pertaining to cancer chemoresistance. Also, the challenges for the molecular oncology will be discussed as well as probable strategies and a road-map for cancer chemotherapeutic drug development.
",isbn:null,printIsbn:null,pdfIsbn:null,doi:null,price:0,priceEur:null,priceUsd:null,slug:null,numberOfPages:0,isOpenForSubmission:!1,hash:"2a80fe34c552bb6ca76ef9cd8f21e377",bookSignature:"Dr. Suresh P.K.",publishedDate:null,coverURL:"https://cdn.intechopen.com/books/images_new/8640.jpg",keywords:"Molecular Mechanisms of Aberrations, Proliferation Mechanisms, Types of Cell Death, Signal Transduction Pathways, Angiogenesis in Cancer, Angiogenesis in Wound Healing, Molecular Mechanisms of Invasion, Molecular Mechanisms of Metastasis, Telomeres, Stem Cells, Delivery Challenges, Evasion Mechanisms",numberOfDownloads:null,numberOfWosCitations:0,numberOfCrossrefCitations:0,numberOfDimensionsCitations:null,numberOfTotalCitations:null,isAvailableForWebshopOrdering:!0,dateEndFirstStepPublish:"November 5th 2018",dateEndSecondStepPublish:"November 26th 2018",dateEndThirdStepPublish:"January 25th 2019",dateEndFourthStepPublish:"April 15th 2019",dateEndFifthStepPublish:"June 14th 2019",remainingDaysToSecondStep:"2 years",secondStepPassed:!0,currentStepOfPublishingProcess:5,editedByType:null,kuFlag:!1,biosketch:null,coeditorOneBiosketch:null,coeditorTwoBiosketch:null,coeditorThreeBiosketch:null,coeditorFourBiosketch:null,coeditorFiveBiosketch:null,editors:[{id:"190244",title:"Dr.",name:"Suresh",middleName:null,surname:"P.K.",slug:"suresh-p.k.",fullName:"Suresh P.K.",profilePictureURL:"https://mts.intechopen.com/storage/users/190244/images/system/190244.jpeg",biography:"P.K. Suresh is a Professor at the School of Biosciences & Technology, VIT, Vellore (2009 to date). He has approximately 18.5 years of teaching, research and administrative experience in Biotechnology & Industrial Biotechnology and allied disciplines. He had also headed Biotechnology & Industrial Biotechnology departments and has handled a wide variety of theory papers. He has 45 research publications in SCOPUS-indexed journals and has completed 3 funded projects as the Principal Investigator. He has made several presentations at International conferences. He has guided 5 doctoral students (2 as co-guide) and is currently guiding 5 students in the doctoral program. He has organized and participated actively in several Faculty Development Programs in areas as diverse as Stem Cells, Bio-inspired Design and Pharmacokinetics and was also a resource person in these events.",institutionString:"School of Biosciences & Technology",position:null,outsideEditionCount:0,totalCites:0,totalAuthoredChapters:"0",totalChapterViews:"0",totalEditedBooks:"0",institution:{name:"Vellore Institute of Technology University",institutionURL:null,country:{name:"India"}}}],coeditorOne:null,coeditorTwo:null,coeditorThree:null,coeditorFour:null,coeditorFive:null,topics:[{id:"6",title:"Biochemistry, Genetics and Molecular Biology",slug:"biochemistry-genetics-and-molecular-biology"}],chapters:null,productType:{id:"1",title:"Edited Volume",chapterContentType:"chapter",authoredCaption:"Edited by"},personalPublishingAssistant:{id:"177731",firstName:"Dajana",lastName:"Pemac",middleName:null,title:"Ms.",imageUrl:"https://mts.intechopen.com/storage/users/177731/images/4726_n.jpg",email:"dajana@intechopen.com",biography:"As a Commissioning Editor at IntechOpen, I work closely with our collaborators in the selection of book topics for the yearly publishing plan and in preparing new book catalogues for each season. This requires extensive analysis of developing trends in scientific research in order to offer our readers relevant content. Creating the book catalogue is also based on keeping track of the most read, downloaded and highly cited chapters and books and relaunching similar topics. I am also responsible for consulting with our Scientific Advisors on which book topics to add to our catalogue and sending possible book proposal topics to them for evaluation. Once the catalogue is complete, I contact leading researchers in their respective fields and ask them to become possible Academic Editors for each book project. Once an editor is appointed, I prepare all necessary information required for them to begin their work, as well as guide them through the editorship process. I also assist editors in inviting suitable authors to contribute to a specific book project and each year, I identify and invite exceptional editors to join IntechOpen as Scientific Advisors. I am responsible for developing and maintaining strong relationships with all collaborators to ensure an effective and efficient publishing process and support other departments in developing and maintaining such relationships."}},relatedBooks:[{type:"book",id:"6694",title:"New Trends in Ion Exchange Studies",subtitle:null,isOpenForSubmission:!1,hash:"3de8c8b090fd8faa7c11ec5b387c486a",slug:"new-trends-in-ion-exchange-studies",bookSignature:"Selcan Karakuş",coverURL:"https://cdn.intechopen.com/books/images_new/6694.jpg",editedByType:"Edited by",editors:[{id:"206110",title:"Dr.",name:"Selcan",surname:"Karakuş",slug:"selcan-karakus",fullName:"Selcan Karakuş"}],productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}},{type:"book",id:"1591",title:"Infrared Spectroscopy",subtitle:"Materials Science, Engineering and Technology",isOpenForSubmission:!1,hash:"99b4b7b71a8caeb693ed762b40b017f4",slug:"infrared-spectroscopy-materials-science-engineering-and-technology",bookSignature:"Theophile Theophanides",coverURL:"https://cdn.intechopen.com/books/images_new/1591.jpg",editedByType:"Edited by",editors:[{id:"37194",title:"Dr.",name:"Theophanides",surname:"Theophile",slug:"theophanides-theophile",fullName:"Theophanides Theophile"}],productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}},{type:"book",id:"3161",title:"Frontiers in Guided Wave Optics and Optoelectronics",subtitle:null,isOpenForSubmission:!1,hash:"deb44e9c99f82bbce1083abea743146c",slug:"frontiers-in-guided-wave-optics-and-optoelectronics",bookSignature:"Bishnu Pal",coverURL:"https://cdn.intechopen.com/books/images_new/3161.jpg",editedByType:"Edited by",editors:[{id:"4782",title:"Prof.",name:"Bishnu",surname:"Pal",slug:"bishnu-pal",fullName:"Bishnu Pal"}],productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}},{type:"book",id:"3092",title:"Anopheles mosquitoes",subtitle:"New insights into malaria vectors",isOpenForSubmission:!1,hash:"c9e622485316d5e296288bf24d2b0d64",slug:"anopheles-mosquitoes-new-insights-into-malaria-vectors",bookSignature:"Sylvie Manguin",coverURL:"https://cdn.intechopen.com/books/images_new/3092.jpg",editedByType:"Edited by",editors:[{id:"50017",title:"Prof.",name:"Sylvie",surname:"Manguin",slug:"sylvie-manguin",fullName:"Sylvie Manguin"}],productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}},{type:"book",id:"371",title:"Abiotic Stress in Plants",subtitle:"Mechanisms and Adaptations",isOpenForSubmission:!1,hash:"588466f487e307619849d72389178a74",slug:"abiotic-stress-in-plants-mechanisms-and-adaptations",bookSignature:"Arun Shanker and B. Venkateswarlu",coverURL:"https://cdn.intechopen.com/books/images_new/371.jpg",editedByType:"Edited by",editors:[{id:"58592",title:"Dr.",name:"Arun",surname:"Shanker",slug:"arun-shanker",fullName:"Arun Shanker"}],productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}},{type:"book",id:"72",title:"Ionic Liquids",subtitle:"Theory, Properties, New Approaches",isOpenForSubmission:!1,hash:"d94ffa3cfa10505e3b1d676d46fcd3f5",slug:"ionic-liquids-theory-properties-new-approaches",bookSignature:"Alexander Kokorin",coverURL:"https://cdn.intechopen.com/books/images_new/72.jpg",editedByType:"Edited by",editors:[{id:"19816",title:"Prof.",name:"Alexander",surname:"Kokorin",slug:"alexander-kokorin",fullName:"Alexander Kokorin"}],productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}},{type:"book",id:"314",title:"Regenerative Medicine and Tissue Engineering",subtitle:"Cells and Biomaterials",isOpenForSubmission:!1,hash:"bb67e80e480c86bb8315458012d65686",slug:"regenerative-medicine-and-tissue-engineering-cells-and-biomaterials",bookSignature:"Daniel Eberli",coverURL:"https://cdn.intechopen.com/books/images_new/314.jpg",editedByType:"Edited by",editors:[{id:"6495",title:"Dr.",name:"Daniel",surname:"Eberli",slug:"daniel-eberli",fullName:"Daniel Eberli"}],productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}},{type:"book",id:"57",title:"Physics and Applications of Graphene",subtitle:"Experiments",isOpenForSubmission:!1,hash:"0e6622a71cf4f02f45bfdd5691e1189a",slug:"physics-and-applications-of-graphene-experiments",bookSignature:"Sergey Mikhailov",coverURL:"https://cdn.intechopen.com/books/images_new/57.jpg",editedByType:"Edited by",editors:[{id:"16042",title:"Dr.",name:"Sergey",surname:"Mikhailov",slug:"sergey-mikhailov",fullName:"Sergey Mikhailov"}],productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}},{type:"book",id:"1373",title:"Ionic Liquids",subtitle:"Applications and Perspectives",isOpenForSubmission:!1,hash:"5e9ae5ae9167cde4b344e499a792c41c",slug:"ionic-liquids-applications-and-perspectives",bookSignature:"Alexander Kokorin",coverURL:"https://cdn.intechopen.com/books/images_new/1373.jpg",editedByType:"Edited by",editors:[{id:"19816",title:"Prof.",name:"Alexander",surname:"Kokorin",slug:"alexander-kokorin",fullName:"Alexander Kokorin"}],productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}},{type:"book",id:"2270",title:"Fourier Transform",subtitle:"Materials Analysis",isOpenForSubmission:!1,hash:"5e094b066da527193e878e160b4772af",slug:"fourier-transform-materials-analysis",bookSignature:"Salih Mohammed Salih",coverURL:"https://cdn.intechopen.com/books/images_new/2270.jpg",editedByType:"Edited by",editors:[{id:"111691",title:"Dr.Ing.",name:"Salih",surname:"Salih",slug:"salih-salih",fullName:"Salih Salih"}],productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}}]},chapter:{item:{type:"chapter",id:"40609",title:"Non-Linear Energy Harvesting with Random Noise and Multiple Harmonics",doi:"10.5772/50727",slug:"non-linear-energy-harvesting-with-random-noise-and-multiple-harmonics",body:'Harvesting energy from background mechanical vibrations in the environment has been proposed as a possible method to provide power in situations where battery usage is impractical or inconvenient. The most commonly used method for energy harvesting is to generate power from the vibrations of a piezoelectric material [1-3]; other methods include electromagnetic inductive coupling [4-6] and charge pumping across vibrating capacitive plates [7-10]. It has been shown that a piezoelectric cantilever attached to a vibrating structure can be used to power wireless transmission nodes for sensing applications [9]. In order to generate sufficient power, the frequency of the vibration source must match the resonant frequency of the piezoelectric cantilever. If the source vibrates at a fixed, known frequency, the dimensions of the cantilever, and the proof mass can be adjusted to ensure frequency matching. Many naturally occurring vibration sources do not have a fixed frequency spectrum, however, and vibrate over a broad range of frequencies. Lack of coupling of the piezoelectric cantilever to the off-resonance vibrations means that only a small amount of the available power can be harvested.
Recent reports have shown that the resonant frequency of a simply supported beam [11] or a piezoelectric cantilever [12] can be tuned by applying an axial force. Research also show that the resonant frequency of a cantilever can also be manipulated by applying a transverse force on the cantilever [13,14]. (In all these cases, the cantilevers response remained within the linear regime.) In principle, this effect could be developed into an active tuning scheme which matches the cantilever resonance to the maximum vibrational output of the environment at any particular time. Calculations indicate, however, that the power consumed by active tuning completely offsets any improvement obtained in the scavenging efficiency [15]. More promising are passive tuning schemes in which a fixed force modifies the frequency response of the cantilever beam, without requiring additional power input. For example, an attractive magnetic force acting above the cantilever beam reduces the spring constant of the cantilever and lowers the resonance frequency [13,14], while an attractive force acting along the axis of the cantilever applies axial tension, and increases the resonance frequency [12]. While this can be used to tune the resonant frequency, there is no increase in output power, and the cantilever motion can even be dampened by the magnetic force and the resulting power output reduced [12,13].
The use of a magnetic force to introduce non-linear oscillation in cantilever motion has recently been reported [16-18]. A pendulum made with piezoelectric material [16] was used to study the energy output under different strengths of random Gaussian noise. An improvement of between 400% and 600% was observed compared to a standard linear oscillator. A piezomagnetoelastic structure [17] with two external magnets was studied, in which chaotic motion was observed outside the resonance frequency. It was further reported [18] that the softening response of a cantilever due to a magnetic attractor expands the response bandwidth and also increases the off resonant amplitude significantly.
Stochastic motions have been long observed with a pendulum in a repulsive magnetic field [19-20] In a generalization effort, the optimal relationship among the physical parameters for a coupling enhancement was provided in [16] [Cottone et al., 2009] using Duffing oscillator. Improvements for the non-linear system have been attributed to an advantage in the amplification of the vibration response from energy harvesters in the stochastic regime [17-18].
Here, we will first demonstrate how this capability can be used to improve power output from a broadband vibration source, having a 1/f frequency dependence (pink noise) [21]. Note that a 1/f vibration spectrum describes a vibration source in which the power spectral density of the vibration is inversely proportional to frequency. Since many naturally occurring vibration sources display a 1/f dependence, this provides evidence that the magnetic coupling could be used for more efficient energy harvesting in practical settings.
The second part of this chapter provides an in-depth study of the response of a magnetically coupled cantilever at different frequencies [22-23]. It is our observation that amplification of the cantilever output occurs not only under stochastic motion but also due to subharmonic and ultraharmonic resonance in the vicinity of the main resonant frequency. The partial solutions of subharmonic and ultraharmonic are intrinsically embedded in the magnetic coupled equation as derived in forced oscillations of weakly nonlinear systems [24]. For a particular weakly coupled cantilever experimented in this paper, maximum output is maintained at the resonant frequency through combination of ultra-harmonic components. In a singly parametric excited scan of voltage production with non-linear piezoelectric cantilever, four distinct types of efficiency improvements are observed, in which the signal is amplified above the linear cantilever operation: (1) ultraharmonic amplification below resonance; (2) stochastic amplifications in multi-frequency and multi-amplitude oscillations; (3) ultra-sub-harmonic amplification at multiple quarter frequencies; (4) sub-harmonic amplification at one-third frequencies. For data analysis, a 1-D non-linear system coupled with piezoelectric charge production is modeled to illustrate the dynamic functions.
Figure 1 shows the set-up for the magnetically coupled piezoelectric cantilever measurements. The cantilever is manufactured using commercially available unimorph piezoelectric discs composed of a 0.9 mm thick PZT layer deposited on a 1 mm thick brass shim (APC International, MFT-50T-1.9A1). The disc is cut into a 13 mm wide by 50 mm long strip, and clamped at one end to produce a 44 mm long cantilever. The PZT layer extends 25 mm along the length of the cantilever, and the remainder is brass only. The proof mass (including the magnet and an additional fixture that holds the magnet) weighs 2.4 gm, while the cantilever itself weighs 0.8 gm. The electrical leads are carefully soldered with thin lead wires (134 AWP, Vishay) to the top side of the PZT and the bottom side of the shim [21].
The experimental set-up for the magnetically coupled (non-linear) piezoelectric cantilever. The magnetic force is repulsive and bi-directional.
Vibration is generated by a shaker table (Labwork ET-126) driven by an amplified pink noise source (Labwork Pa-13 amplifier). The pink noise is generated numerically, with amplitude and crest factor set to -4dB and 1.41, respectively. The average shaker table acceleration is 7.5 m/s2, independent of the magnetic coupling. A custom Labview data acquisition program measures output voltage from the cantilever beam and the acceleration from the shaker table, once every second. The voltage peak to peak (Vpp) is measured by an oscilloscope (Agilent 54624A), and the dc voltage is detected with a digital multi-meter (YOGOGAWA 7561). A 5mm diameter round rare earth magnet (Radio Shack model 64-1895) is attached to the vibrating tip of the cantilever beam, while a similar opposing magnet is attached directly to the shaker table frame, with repulsive force. The distance between the magnets is adjusted to 5.5 mm, to make the magnetic force comparable to the spring force of the cantilever.
The voltage generated by the cantilever in response to the pink noise source is measured using three different circuits, (shown in Figures 2(a), 3(a), and 4(a)). In each case, the output from the coupled cantilever is compared with the output from the same cantilever in the uncoupled situation (with the opposing magnet removed). In Figure 2, the piezoelectric cantilever beam is wired directly to an oscilloscope with a 1 M Ohm input impedance and the peak-to-peak output voltage, Vpp is measured. As shown in Figure 2 (b) the cantilever output is seen to fluctuate as a function of time, reflecting the random nature of the vibrations. For much of the time, the output from the coupled and uncoupled cantilevers is similar. However, occasionally, very large voltage spikes are observed in the output from the coupled cantilever, that are not observed for the uncoupled case. The voltage peak to peak spans to 5.7 V (min. 0.7 V and max. 6.4 V) with the coupled setup and only 2.2 V (min. 0.9 V to max. 3 V) volts with the uncoupled cantilever. The overall RMS powers for the uncoupled cantilever are 3.95 µW and 4.85 µW for the coupled case. The ratio of the maximal voltage output from the coupled to the uncoupled is 2.1.
In Figure 3, the voltage generated by the piezoelectric cantilever beam is rectified, using 0.4 V forward biased diodes, and detected across a 22 µF capacitor and a 1 M Ohm resistor in parallel. As shown in Figure 3(b), the amplitude of the voltage output with this measurement circuit is most of the time higher in the coupled case than in the uncoupled case. This is because the RC decay time of the circuit is larger than the time between the large amplitude deflections of the cantilever. The average voltage measured across the capacitor or the voltage integration over time is approximately 50% higher in the coupled case.
a) The open circuit measurement on Vpp directly from the piezoelectric cantilever, and (b) the higher swing voltage reflects the voltage generated by coupling setup with larger cantilever motions.
a) The schematic of a rectified circuit with a 1 M Ohm resister, and (b) the fluctuations of the voltage indicate that more power being generated by the magnetic coupled cantilever.
In Figure 4, the rectified voltage is measured directly across the 22 µF capacitor without the 1 M Ohm resistor. As shown in Fig. 4(b), the voltage across the capacitor increases with time, until a maximum charging voltage is achieved. The maximum voltage measured across the capacitor is approximately 50% higher in the coupled case than in the uncoupled case. Note that there is a time delay for the coupled cantilever to achieve a higher voltage than the uncoupled cantilever. This is due to the time passing before the first large amplitude deflection occurs. The random nature of the motion means that this time will vary from run to run, however, on average the coupled cantilever output will be consistently higher than the uncoupled output. Note that in addition to producing more power, the higher voltage output enables circuit operation without a step-up transformer, eliminating the power loss in the transformer.
a) The schematic of the storage circuit, and (b) DC voltage output measured on the storage capacitor indicating more charge is stored with the magnetic coupling setup.
It is instructive to compare the force exerted on the cantilever in the coupled and uncoupled cases. To do this, an empirical measure of the magnetic force is obtained using the experimental set-up shown in Figure 5.
The magnetic force component function, Fz, is determined by the electronic scale versus the manual deflection of the cantilever.
The opposing magnet is mounted onto a measurement scale, and the position of the magnetized cantilever is manipulated by pushing up and down at the end of a cantilever beam, simulating flexure movement. The deflection z is measured using a micrometer, while the reading on the scale provides the force between the two magnets. The details of the force measurments were shown in [22]. Only the magnetic force in the z direction, Fz, contributes to the resultant spring force. At z=0, the force is zero in the z direction because the two magnetic forces only repel each other in the longitudinal direction. Fz increases as the angles between the two magnets increase until the overlap between the two magnets is zero. At this point, Fz decreases with increasing distance because the force is inversely proportional to the distance squared.
The spring force, the magnetic force and the resultant force (spring plus magnetic) are plotted in Figure 6,
The plot shows the magnitude of the magnetic forces exerted on the cantilever beam, the spring forces and the resultant forces.
The resultant force is significantly reduced compared to the bare spring force near z=0. The coupled system has three equilibrium points where the resultant force is zero, compared to the single equilibrium point of the bare spring force. Because the resultant force in the region of the three equilibrium points is relatively small, transitions between the three points occurs relatively easily. Note that the middle equilibrium is unstable, therefore when the piezoelectric cantilever is set up for the coupling experiment, the cantilever is off the equilibrium point toward ground in static state as shown in Figure 1. In Figure 7 the potential energy is plotted for both the uncoupled and coupled systems. The potential energy is calculated by direct integration of the force with respect to the displacement, z. This gives for the uncoupled case, and for the coupled case. For the coupled case, the resultant potential is raised, with two local minima symmetric to z=0. This double well structure allows easy movement of the cantilever beam even when excited by non resonant forces. Once it passes the local high potential, it drifts to the other side of the balance, resulting in an increased total deflection distance. This can be seen by considering the possible motion of the cantilever beam having a kinetic energy, h, which is large enough to surmount the potential barrier at z=0. With the same random acceleration background the coupled cantilever can travel further distance than the uncoupled one. The voltage output, which depends on the movement of the cantilever, therefore, increases. The ratio of the maximum displacement in the coupled and uncoupled systems determined from Figure 7 is 2.4. This is comparable to the ratio of maximum voltage output in the coupled and uncoupled systems, which was seen in Figure 2 (b), at 2.1.
The direct integration from the measured forces function in Fig. 6 leads to the magnetic potential, spring potential and the resultant spring potential. The responding range in the coupled and the uncoupled cantilever is defined by the same potential height, h.
The magnetic coupling (although a passive force requiring no energy) introduces a symmetric force which acts in the opposite direction to the spring force around z=0. Being comparable in magnitude to the spring force, the magnetic force compensates the spring potential, and introduces a double valley in the potential energy profile. Under the influence of the modified spring potential, the magnetically coupled cantilever responds to a random vibration source (like the pink noise) by moving chaotically between the two minima in the potential energy profile. As compared with the non-chaotic motion of the uncoupled cantilever around the single z=0 potential minimum, this produces larger cantilever deflection and more voltage output from the piezoelectric cantilever. The oscillations around the resonance frequency are unstable and chaotic, but persistent. The modified spring potential is higher, and flatter than the bare spring potential, making the magnetic coupled cantilever easier to excite in the random frequency region. The experiments show that the ratio of the open circuit peak to peak voltage output and the potential well are closely related. Future work includes the design and implementation of modified potential wells and further analysis of the gain due to the modified potential wells.
The experiment set up is the same as Figure 1. In all measurements, the shaker table acceleration is set to approximately 4.2 m/s2 at resonant freqeuncy, and the frequency swept from 0 to 30 Hz in 0.5 Hz steps. The opposing magnet fitted at the free end of the cantilever supplies a symmetrical, repulsive force about the balance of the cantilever during vibration. The horizontal separation between the magnets (designated by ) is adjusted to be approximately between 6 to 6.5 mm. This separation is found to provide the best compensation for the spring force, and makes the effective restoring force as small as possible near the equilibrium point.
Figure 8 shows both the output of the piezoelectric cantilever as a function of shaker table vibration frequency for the linear and non-linear case. The voltage generated by the piezoelectric cantilever beam is directed measured by oscilloscope treated as an open circuit. At the resonance frequency (measured to be 9.5 Hz) the output of the cantilever was 53 V, and the peak height, resonance frequency and line width are all approximately the same for the linear and non-linear states (here linear refers to the non-coupled state, while non-linear refers to the magnetically coupled state). On either side of the main resonance, however, there is additional output observed for the non-linear cantilever, which is not observed in the linear state. As can be seen from a comparison of the linear and the non-linear runs, the overall amplitude profile of the non-linear run is much larger in the sense of a broadband distribution, although there are gaps between peaks in the overall pattern of the non-linear output.
Figure 9 shows the output of both the linear and non-liner cantilever measured as a function of time at selected frequency to illustrate the comparison of the linear and non-linear dynamics. The voltage output of the non-linear cantilever evolves with frequency, while being amplified close to the resonance frequency. The spectrum shows a variety of amplified motions and harmonics. For example, at a driving frequency as low as 6.5 Hz (between 6-7.5Hz) (Figure 9(a)) both the linear and non-linear cantilever motions follow the vibrations of the shaker table, producing periodic oscillations. The amplitude of the oscillations for the non-linear cantilever is 5 times larger than those for the linear cantilever, however. At the resonant frequency (Figure 9(b)) both linear and non-linear cantilevers oscillate at the driving frequency with equal amplitudes. At 13 Hz (Figure 9(c)) the linear cantilever motion continues to follow the vibrations of the shaker table, producing low amplitude periodic oscillations. The non-linear cantilever motion is aperiodic and has a magnitude which is on average 3 times larger than that of the linear cantilever. At 16 Hz (Figure 9 (d)) the non-linear cantilever produces a 3 times larger peak to peak amplitude than the linear cantilever, and shows multiple and periodic “half-way” vibrations. At 20Hz (Figure 9 (e)) the non-linear cantilever shows a 5 times larger amplitude at the frequency of 6.7Hz than the linear output at 20 Hz.
The voltage output (peak to peak) of the piezoelectric cantilever measured as a function of frequency (dash line for linear and solid line for non-linear state).
Note should be taken that there are two unexpected small peaks at 12.5 Hz and 17 Hz for the linear response. The peaks at 12.5 Hz and 17 Hz on the experiment data come from the torsion and standing wave oscillations. It is the result of how the piezoelectric cantilever was facilitated with magnet and its fixture as the proof mass. The cantilever is relatively thin and droops naturally due the weight of proof mass a few millimeters (as shown in Figure 1) to a curve. The L-shape fixture that holds the magnet was bolted with a screw on one side parallel to the brass shim. The magnet is then attached on the other side of the L-shape fixture, perpendicular to the brass shim in such way to make magnetic coupling. During the process, the cantilever was deformed and twisted slightly. As a result, the combined proof mass is slightly located off the center of the cantilever beam resulting in weight imbalance and torsion mode resonance. The fixture also creates an area where the free end is rigid with the fixture, which acts like a semi-fixed end, paving a way for a standing wave vibration when the cantilever is excited. Finite Element Analysis (FEA) simulating the structure and dimensions confirms that the first 3 modes of vibration include bending, torsion and standing wave oscillations.
The output of the linear (dash line) and non-linear (solid) system in the time domain: (a) 6.5 Hz (b) 9.5Hz at resonance; (c) 13 Hz ; (d) 16Hz ;(e) 20Hz
The dynamics of the piezoelectric cantilever is modeled by a 1-D driven spring-mass system coupled with the piezoelectric effect under the influence of a magnetic force Fm(z) [17-18]:
with mass m=0.0024 kg, damping coefficient d=0.0075 kg/sec, spring constant k=8.55 N/m, and angular frequency ω. Here, z is the vertical deflection of the cantilever, V is the generated voltage, σ=5x10-6 N/V is the coupling coefficient, and A is the acceleration of the shaker table (A=4.2 m/sec2 measured at resonance frequency). The voltage output is related to the deflection of the piezoelectric cantilever through:
where Rl is the equivalent resistance, Cl is the equivalent capacitance and 1/ (Rl Cl)= 0.01, and θ=1250 is the piezoelectric coupling coefficient in the measured circuit. The transverse magnetic force (in the z direction) is determined from the force between two magnetic dipoles (Kraftmakher, 2007):
where M is the dipole magnetization, u0 is the permeability in air, and η is the horizontal separation between the magnets at z=0. The correction factors a and b are included to compensate for the flexure motion of cantilever and the magnetic force along the cantilever axis [16]. The magnetization M is determined by direct measurement of the axial force between the cantilever and a fixed magnet using a reference scale [22].
The solution to the coupled differential equations (1) and (2) is determined using Maple software to give the voltage output versus time for a given driving frequency, magnetic force function, and separation η. In order to fit our experiment data, the magnetic force Fm(z) was modified by a and b parameters and used for our calculation, where M = 0.011Am2, η = 6.5 mm, a = 1.04 and b = 1.21. As in the experiment, the output is calculated for t = 0 to 10 seconds, and the maximum peak-to-peak output over the last 2 seconds obtained. The result of the frequency domain is showed in Figure 10, which resembles the experimental result as seen in Figure 8.
Both the experiment and simulation figures show broadband vibration for the non-linear configuration between 6-20Hz. The simulation in Figures 11(a)-(e) reproduces many of the features observed in the experiment in Figures 9(a)-(e). The rest of Figures 11-15 reveals more about the complexity of the multiple harmonics in the non-linear systems. The simulations of the time domain with the corresponding frequency selected from experiment are shown in Figures 11(a)-15(a). Figures 11(b)-15(b) illustrate the velocity vs. voltage output of the piezoelectric cantilever in both the linear and non-linear cases. Figures 11(c)-15(c) are the Fourier transform of the coupled cantilever cases in Figures 11(a)-15(a), respectively, showing the compositions of frequency components for the non-linear states. The following section will discuss the multiple harmonic components directly derived from the non-linear dynamics simulations.
The simulated voltage output (peak to peak) of the piezoelectric cantilever is plotted in the frequency domain (dash line for linear and solid line non-linear).
At a driving frequency of 6.5 Hz, as seen in Figure 11(a), both the linear and non-linear cantilever motion follow the vibrations of the shaker table, producing periodic oscillations. The amplitude of the oscillations for the coupled cantilever, however, is approximately 5 times larger than those for the linear cantilever, as seen in the experiment in Figure 9(a). The velocity vs. voltage in Figure 11(b) shows that the coupled cantilever has non-linear component in voltage production. Further analysis through Fourier transformation indicates that the non-linear cantilever shows the combination of the excited 6.5 Hz harmonic (dominant and high amplitude) and the 20 Hz ultraharmonic (3 times the excited frequency), as seen in Figure 11(c).
At the resonant frequency of 9.5 Hz (Figure 12(a)) both non-linear and linear cantilevers oscillate at the driving frequency with equal amplitude of voltage output. The responses for both the coupled and uncoupled cantilever at resonant frequency are almost identical in the voltage output. The velocity vs. voltage in Figure 12(b) shows a little non-linearity at 90o and -90o of the vibration cycles. Through Fourier transformation as seen in Figure 12(c), the non-linear cantilever shows some components of vibration at the excited 9.5 Hz harmonic (dominant) and the 29 Hz ultraharmonic (3 times the excited frequency).
The theoretical analysis of excited frequency at 6.5 Hz. (a) the time domain voltage output, dash line for linear and solid line for non-linear states; (b) the velocity vs. voltage output, dark line for linear and light line for non-linear state; (c) the Fourier transform of the non-linear state from the data of Figure 5(a).
The theoretical analysis of excited frequency at 9.5 Hz. (a) the time domain voltage output, dash line for linear and solid line for non-linear state; (b) the velocity vs. voltage output, light line for linear and dark line for non-linear state; (c) the Fourier transform of the non-linear state from the data of Figure 12(a).
The response for the non-linear cantilever is chaotic at 13 Hz as seen in Figure 13(a), but with average 3 folds larger magnitude than the linear one. The velocity vs. voltage relation in Figure 13(b) shows chaotic motions for the coupled cantilever. Using Fourier transformation for Figure 13(a) results in Figure 13(c), the coupled cantilever shows the linear response of a small portion of 13 Hz component combined with a large amplitude distribution at lower frequency that are attributed to the chaotic motion. Note that the small peaks at 12.5 Hz and 17 Hz are not observed in the simulation as seen and discussed in the experiment section. This small torsion and standing wave bending resonance are not accounted for by the simplified 1-D model used to simulate the spring mass damping model such as an ideal cantilever.
At 16Hz, the non-linear cantilever is periodic (Figure 14(a)) and is 3 times larger (peak to peak) in magnitude than the uncoupled one, with double prone of low frequency in the upper cycle. Apparently, it is and composed of different frequency and multiple haromonic motion, with large magnitude than the uncoupled motion. The evidence is also shown in the velocity vs. voltage relationship in Figure 14(b), where 3 different cyclic loops are identifiable. Fourier transformation from time data in Figure 14(a) proves that the non-linear cantilever delivers ultra-sub-harmonic vibration at n*(16/4) Hz, where, n=integer in Figure 14(c).
The theoretical analysis of excited frequency at 13 Hz. (a) the time domain voltage output, dash line for linear and solid line for non-linear states; (b) the velocity vs. voltage output, light line for linear and dark line for non-linear state; (c) the Fourier transform of the non-linear state from the data of Figure 13(a).
The theoretical analysis of excited frequency at 16 Hz. (a) the time domain voltage output, dash line for linear and solid line for non-linear states; (b) the velocity vs. voltage output, light line for linear and dark line for non-linear state; (c) the Fourier transform of the non-linear state from the data of Figure 14(a).
At 20Hz, the response for the non-linear cantilever is periodic and also 3 folds larger peak to peak magnitude than the linear one as seen in Figure 15 (a). The velocity vs. voltage in Figure 15(b) shows some combination of cyclic motions for the non-linear cantilever. Through Fourier transformation, the coupled cantilever shows subharmonic at 6.7 Hz (dominant), excite frequency/3, and 20 Hz in Figure 15(c).
The combination of the stochastic and various harmonic features have three to five folds greater voltage production than the linear standard narrow band piezoelectric cantilever. Together with the un-damped resonant response enhance the performance well beyond that of a standard energy harvester.
The theoretical analysis of excited frequency at 20 Hz. (a) the time domain voltage output, dash line for linear and solid line for non-linear states; (b) the velocity vs. voltage output, light line for linear and dark line for non-linear state; (c) the Fourier transform of the non-linear state from the data of Figure 15(a).
Figure 16 (a) shows the output of the other PZT cantilever with similar specs as a function of shaker table vibration frequency for the case where the opposing magnet is fixed to the shaker table. The voltage generated by the piezoelectric cantilever beam is rectified, and detected across a 22 µF capacitor and 1 M Ohm resistor in parallel, using the circuit shown in Figure 3 (a). The results from two measurement runs in the coupled state are shown, together with the output of the cantilever measured in the uncoupled state. (This is obtained by removing the opposing magnet.) At the resonance frequency, (measured to be approximately 10 Hz) the output of the cantilever exceeds 16 V, and the peak height, resonance frequency and linewidth are all approximately the same for the coupled and un-coupled states. On either side of the main resonance, however, there are additional output observed for the coupled cantilever, which is not observed in the uncoupled state. As can be seen from a comparison of the two coupled runs, the frequency distribution of the peaks are the result of the multiple harmonics, as predicted in the open circuit.
Voltage output of the piezoelectric cantilever as a function of shaker table frequency for (a) single cantilever (b) double cantilever. Integrated voltage output as a function of frequency for (c) single cantilever and (d) double cantilever.
Also measured was a double cantilever system, (as shown in Fig. 16(b)), in which the second magnet is connected to an opposing cantilever (having resonant frequency of around 60Hz) rather than to a fixed point. As shown in Fig. 16 (b), the results are similar to the single cantilever system, except that the double cantilever system shows a larger overall increase in off-resonance output. The overall improvement in the harvesting efficiency can be illustrated by plotting the integrated voltage output of the cantilever beam as a function of frequency. For both the single (Fig. 16 (c)) and double (Fig. 16 (d)) cantilever systems, the integrated voltage output over the 0-30 Hz bandwidth shows a substantial increase in the coupled versus the uncoupled case. The total improvement is 31%-87%, with some variation between measurement runs.
Piezoelectric cantilevers have been widely studied for energy scavenging applications, but suffer from poor output power outside of a narrow frequency range near the cantilever resonance. In this chapter, we have demonstrated how power output can be enhanced by applying a simple passive external force. When a symmetrical and repulsive magnetic force is applied to a piezoelectric cantilever beam to compensate the cantilever spring force, this lowers the spring potential and increases the output when driven by a random pink noise vibrational source. The principle may be applied to other vibration energy harvesting devices such as electromagnetic and capacitive types in random naturally pink noise environments.
In the parametrically excited piezoelectric cantilever experiments, linear and non-linear performances were compared. Overall, four distinct types of efficiency improvements appear in the non-linear configuration, in which the signal is amplified above the linear cantilever response: low frequency ultraharmonic amplification; stochastic amplifications in multi-frequency and multi-amplitude oscillations; ultra-sub-harmonic amplification at multiple quarter frequencies; subharmonic amplification at one-third frequencies. Taken together, the stochastic, sub-harmonic and ultra-harmonic response produces an average of three to five-fold increase in voltage production. For energy harvesting purposes, the combination of the four features together with the un-damped resonant response enhances the performance well beyond that of a standard energy harvester. Furthermore, an analytical model of the bi-stable dynamics produces results consistent with those observed experimentally. The simulation tool could be deployed in the future investigation for non-linear energy harvester design for broadband and beyond natural harmonic applications.
The effort was funded by the Department Of Energy DE-FC26-06NT42795 and the U.S. Navy under Contract DAAB07-03-D-B010/TO-0198. Technical program oversight under the Navy contract was provided by Naval Surface Warfare Center, Crane Division.
In recent years, researchers in the software engineering (SE) field have turned their interest to data mining (DM) and machine learning (ML)-based studies since collected SE data can be helpful in obtaining new and significant information. Software engineering presents many subjects for research, and data mining can give further insight to support decision-making related to these subjects.
Figure 1 shows the intersection of three main areas: data mining, software engineering, and statistics/math. A large amount of data is collected from organizations during software development and maintenance activities, such as requirement specifications, design diagrams, source codes, bug reports, program versions, and so on. Data mining enables the discovery of useful knowledge and hidden patterns from SE data. Math provides the elementary functions, and statistics determines probability, relationships, and correlation within collected data. Data science, in the center of the diagram, covers different disciplines such as DM, SE, and statistics.
The intersection of data mining and software engineering with other areas of the field.
This study presents a comprehensive literature review of existing research and offers an overview of how to approach SE problems using different mining techniques. Up to now, review studies either introduce SE data descriptions [1], explain tools and techniques mostly used by researchers for SE data analysis [2], discuss the role of software engineers [3], or focus only on a specific problem in SE such as defect prediction [4], design pattern [5], or effort estimation [6]. Some existing review articles having the same target [7] are former, and some of them are not comprehensive. In contrast to the previous studies, this article provides a systematic review of several SE tasks, gives a comprehensive list of available studies in the field, clearly states the advantages of mining SE data, and answers “how” and “why” questions in the research area.
The novelties and main contributions of this review paper are fivefold.
First, it provides a general overview of several SE tasks that have been the focus of studies using DM and ML, namely, defect prediction, effort estimation, vulnerability analysis, refactoring, and design pattern mining.
Second, it comprehensively discusses existing data mining solutions in software engineering according to various aspects, including methods (clustering, classification, association rule mining, etc.), algorithms (k-nearest neighbor (KNN), neural network (NN), etc.), and performance metrics (accuracy, mean absolute error, etc.).
Third, it points to several significant research questions that are unanswered in the recent literature as a whole or the answers to which have changed with the technological developments in the field.
Fourth, some statistics related to the studies between the years of 2010 and 2019 are given from different perspectives: according to their subjects and according to their methods.
Five, it focuses on different machine learning types: supervised and unsupervised learning, especially on ensemble learning and deep learning.
This paper addresses the following research questions:
RQ1. What kinds of SE problems can ML and DM techniques help to solve?
RQ2. What are the advantages of using DM techniques in SE?
RQ3. Which DM methods and algorithms are commonly used to handle SE tasks?
RQ4. Which performance metrics are generally used to evaluate DM models constructed in SE studies?
RQ5. Which types of machine learning techniques (e.g., ensemble learning, deep learning) are generally preferred for SE problems?
RQ6. Which SE datasets are popular in DM studies?
The remainder of this paper is organized as follows. Section 2 explains the knowledge discovery process that aims to extract interesting, potentially useful, and nontrivial information from software engineering data. Section 3 provides an overview of current work on data mining for software engineering grouped under five tasks: defect prediction, effort estimation, vulnerability analysis, refactoring, and design pattern mining. In addition, some machine learning studies are divided into subgroups, including ensemble learning- and deep learning-based studies. Section 4 gives statistical information about the number of highly validated research conducted in the last decade. Related works considered as fundamental by journals with a highly positive reputation are listed, and the specific methods they used and their categories and purposes are clearly expressed. In addition, widely used datasets related to SE are given. Finally, Section 5 offers concluding remarks and suggests future scientific and practical efforts that might improve the efficiency of SE actions.
This section basically explains the consecutive critical steps that should be followed to discover beneficial knowledge from software engineering data. It outlines the order of necessary operations in this process and explains how related data flows among them.
Software development life cycle (SDLC) describes a process to improve the quality of a product in project management. The main phases of SDCL are planning, requirement analysis, designing, coding, testing, and maintenance of a project. In every phase of software development, some software problems (e.g., software bugs, security, or design problems) may occur. Correcting these problems in the early phases leads to more accurate and timely delivery of the project. Therefore, software engineers broadly apply data mining techniques for different SE tasks to solve SE problems and to enhance programming efficiency and quality.
Figure 2 presents the data mining and knowledge discovery process of SE tasks including data collection, data preprocessing, data mining, and evaluation. In the data collection phase, data are obtained from software projects such as bug reports, historical data, version control data, and mailing lists that include various information about the project’s versions, status, or improvement. In the data preprocessing phase, the data are preprocessed after collection by using different methods such as feature selection (dimensionality reduction), feature extraction, missing data elimination, class imbalance analysis, normalization, discretization, and so on. In the next phase, DM techniques such as classification, clustering, and association rule mining are applied to discover useful patterns and relationships in software engineering data and therefore to solve a software engineering problem such as defected or vulnerable systems, reused patterns, or parts of code changes. Mining and obtaining valuable knowledge from such data prevents errors and allows software engineers to deliver the project on time. Finally, in the evaluation phase, validation techniques are used to assess the data mining results such as k-fold cross validation for classification. The commonly used evaluation measures are accuracy, precision, recall, F-score, area under the curve (AUC) for classification, and sum of squared errors (SSE) for clustering.
KDD process for software engineering.
In this review, we examine data mining studies in various SE tasks and evaluate commonly used algorithms and datasets.
A defect means an error, failure, flaw, or bug that causes incorrect or unexpected results in a system [8]. A software system is expected to be without any defects since software quality represents a capacity of the defect-free percentage of the product [9]. However, software projects often do not have enough time or people working on them to extract errors before a product is released. In such a situation, defect prediction methods can help to detect and remove defects in the initial stages of the SDLC and to improve the quality of the software product. In other words, the goal of defect prediction is to produce robust and effective software systems. Hence, software defect prediction (SDP) is an important topic for software engineering because early prediction of software defects could help to reduce development costs and produce more stable software systems.
Various studies have been conducted on defect prediction using different metrics such as code complexity, history-based metrics, object-oriented metrics, and process metrics to construct prediction models [10, 11]. These models can be considered on a cross-project or within-project basis. In within-project defect prediction (WPDP), a model is constructed and applied on the same project [12]. For within-project strategy, a large amount of historical defect data is needed. Hence, in new projects that do not have enough data to train, cross-project strategy may be preferred [13]. Cross-project defect prediction (CPDP) is a method that involves applying a prediction model from one project to another, meaning that models are prepared by utilizing historical data from other projects [14, 15]. Studies in the field of CPDP have increased in recent years [10, 16]. However, there are some deficiencies in comparisons of prior studies since they cannot be replicated because of the difference in utilizing evaluation metrics or preparation way of training data. Therefore, Herbold et al. [16] tried to replicate different CPDP methods previously proposed and find which approach performed best in terms of metrics such as F-score, area under the curve (AUC), and Matthews correlation coefficient (MCC). Results showed that 7- or 8-year approaches may perform better. Another study [17] replicated prior work to demonstrate whether the determination of classification techniques is important. Both noisy and cleaned datasets were used, and the same results were obtained from the two datasets. However, new dataset gave better results for some classification algorithms. For this reason, authors claimed that the selection of classification techniques affects the performance of the model.
Numerous defect prediction studies have been conducted using DM techniques. In the following subsections, we will explain these studies in terms of whether they apply ensemble learning or not. Some defect prediction studies in SE are compared in Table 1. The objective of the studies, the year they were conducted, algorithms, ensemble learning techniques and datasets in the studies, and the type of data mining tasks are shown in this table. The bold entries in Table 1 have better performance than other algorithms in that study.
Ref. | Year | Task | Objective | Algorithms | Ensemble learning | Dataset | Evaluation metrics and results |
---|---|---|---|---|---|---|---|
[18] | 2011 | Classification | Comparative study of various ensemble methods to find the most effective one | NB | Bagging, boosting, RT, RF, RS, AdaBoost, Stacking, and Voting | NASA datasets: CM1 JM1 KC1 KC2 KC3 KC4 MC1 MC2 MW1 PC1 PC2 PC3 PC4 PC5 | 10-fold CV, ACC, and AUC Vote 88.48% random forest 87.90% |
[19] | 2013 | Classification | Comparative study of class imbalance learning methods and proposed dynamic version of AdaBoost.NC | NB, RUS, RUS-bal, THM, SMB, BNC | RF, SMB, BNC, AdaBoost.NC | NASA and PROMISE repository: MC2, KC2, JM1, KC1, PC4, PC3, CM1, KC3, MW1, PC1 | 10-fold CV Balance, G-mean and AUC, PD, PF |
[20] | 2014 | Classification | Comparative study to deal with imbalanced data | Base Classifiers: C4.5, NB Sampling: ROS, RUS, SMOTE | AdaBoost, Bagging, boosting, RF | NASA datasets: CM1, JM1, KC1, KC2, KC3, MC1, MC2, MW1, PC1, PC2, PC3, PC4, PC5 | 5 × 5 CV, MCC, ROC, results change according to characteristics of datasets |
[17] | 2015 | Clustering/classification | To show that the selection of classification technique has an impact on the performance of software defect prediction models | Statistical: NB, Simple Logistic Clustering: KM, EM Rule based: Ripper, Ridor NNs: RBF Nearest neighbor: KNN DTs: J48, LMT | Bagging, AdaBoost, rotation forest, random subspace | NASA: CM1, JM1, KC1, KC3, KC4, MW1, PC1, PC2, PC3, PC4 PROMISE: Ant 1.7, Camel 1.6, Ivy 1.4, Jedit 4, Log4j 1, Lucene 2.4, Poi 3, Tomcat 6, Xalan 2.6, Xerces 1.3 | 10 × 10-fold CV AUC > 0.5 Scott-Knott test α = 0.05, simple logistic, LMT, and RF + base learner outperforms KNN and RBF |
[21] | 2015 | Classification | Average probability ensemble (APE) learning module is proposed by combining feature selection and ensemble learning | APE system combines seven classifiers: SGD, weighted SVMs (W-SVMs), LR, MNB and Bernoulli naive Bayes (BNB) | RF, GB | NASA: CM1, JM1, KC1, KC3, KC4, MW1, PC1, PC2, PC3, PC4 PROMISE (RQ2): Ant 1.7, Camel 1.6, Ivy 1.4, Jedit 4, Log4j 1, Lucene 2.4, Poi 3, Tomcat 6, Xalan 2.6, Xerces 1.3 | 10 × 10-fold CV, AUC > 0.5 Scott-Knott test α = 0.05, simple logistic, LMT, and RF + base learner outperforms KNN and RBF |
[22, 23] | 2016 | Classification | Comparative study of 18 ML techniques using OO metrics on six releases of Android operating system | LR, NB, BN, MLP, RBF SVM, VP, CART, J48, ADT, Nnge, DTNB | Bagging, random forest, Logistic model trees, Logit Boost, Ada Boost | 6 releases of Android app: Android 2.3.2, Android 2.3.7, Android 4.0.4, Android 4.1.2, Android 4.2.2, Android 4.3.1 | 10-fold, inter-release validation AUC for NB, LB, MLP is >0.7 |
[24] | 2016 | Classification | Caret has been applied whether parameter settings can have a large impact on the performance of defect prediction models | NB, KNN, LR, partial least squares, NN, LDA, rule based, DT, SVM | Bagging, boosting | Cleaned NASA JM1, PC5 Proprietary from Prop-1 to Prop-5 Apache Camel 1.2, Xalan 2.5–2.6 Eclipse Platform 2.0–2.1–3.0, Debug 3.4, SWT 3.4, JDT, Mylyn, PDE | Out-of-sample bootstrap validation technique, AUC Caret AUC performance up to 40 percentage points |
[25] | 2017 | Regression | Aim is to validate the source code metrics and identify a suitable set of source code metrics | 5 training algorithms: GD, GDM, GDX, NM, LM | Heterogeneous linear and nonlinear ensemble methods | 56 open-source Java projects from PROMISE Repository | 10-fold CV, t-test, ULR analysis Neural network with Levenberg Marquardt (LM) is the best |
[16] | 2017 | Classification | Replicate 24 CDPD approaches, and compare on 5 different datasets | DT, LR, NB, SVM | LE, RF, BAG-DT, BAG-NB, BOOST-DT, BOOST-NB | 5 available datasets: JURECZKO, NASA MDP, AEEEM, NETGENE, RELINK | Recall, PR, ACC, G-measure, F-score, MCC, AUC |
[26] | 2017 | Classification | Just-in-time defect prediction (TLEL) | NB, SVM, DT, LDA, NN | Bagging, stacking | Bugzilla, Columba, JDT, Platform, Mozilla, and PostgreSQL | 10-fold CV, F-score |
[13] | 2017 | Classification | Adaptive Selection of Classifiers in bug prediction (ASCI) method is proposed. | Base classifiers: LOG (binary logistic regression), NB, RBF, MLP, DT | Voting | Ginger Bread (2.3.2 and 2.3.7), Ice Cream Sandwich (4.0.2 and 4.0.4), and JellyBean (4.1.2, 4.2.2 and 4.3.1) | 10-fold, inter-release validation AUC for NB, LB, MLP is >0.7 |
[27] | 2018 | Classification | MULTI method for JIT-SDP (just in time software defect prediction) | EALR, SL, RBFNet Unsupervised: LT, AGE | Bagging, AdaBoost, Rotation Forest, RS | Bugzilla, Columba, Eclipse JDT, Eclipse Platform, Mozilla, PostgreSQ | CV, timewise-CV, ACC, and POPT MULTI performs significantly better than all the baselines |
[28] | 2007 | Classification | To found pre- and post-release defects for every package and file | LR | — | Eclipse 2.0, 2.1, 3.0 | PR, recall, ACC |
[8] | 2014 | Clustering | Cluster ensemble with PSO for clustering the software modules (fault-prone or not fault-prone) | PSO clustering algorithm | KM-E, KM-M, PSO-E, PSO-M and EM | Nasa MDP, PROMISE | |
[29] | 2015 | Classification | Defect identification by applying DM algorithms | NB, J48, MLP | — | PROMISE, NASA MDP dataset: CM1, JM1, KC1, KC3, MC1, MC2, MW1, PC1, PC2, PC3 | 10-fold CV, ACC, PR, FMLP is the best |
[30] | 2015 | Classification | To show the attributes that predict the defective state of software modules | NB, NN, association rules, DT | Weighted voting rule of the four algorithms | NASA datasets: CM1, JM1, KC1, KC2, PC1 | PR, recall, ACC, F-score NB > NN > DT |
[31] | 2016 | Classification | Authors proposed a model that finds fault-proneness | NB, LR, LivSVM, MLP, SGD, SMO, VP, LR Logit Boost, Decision Stamp, RT, REP Tree | RF | Camel1.6, Tomcat 6.0, Ant 1.7, jEdit4.3, Ivy 2.0, arc, e-learning, berek, forrest 0.8, zuzel, Intercafe, and Nieruchomosci | 10-fold CV, AUC AUC = 0.661 |
[32] | 2016 | Classification | GA to select suitable source code metrics | LR, ELM, SVML, SVMR, SVMP | — | 30 open-source software projects from PROMISE repository from DS1 to DS30 | 5-fold CV, F-score, ACC, pairwise t-test |
[33] | 2016 | — | Weighted least-squares twin support vector machine (WLSTSVM) to find misclassification cost of DP | SVM, NB, RF, LR, KNN, BN, cost-sensitive neural network | — | PROMISE repository: CM1, KC1, PC1, PC3, PC4, MC2, KC2, KC3 | 10-fold CV, PR, recall, F-score, G-mean Wilcoxon signed rank test |
[34] | 2016 | — | A multi-objective naive Bayes learning techniques MONB, MOBNN | NB, LR, DT, MODT, MOLR, MONB | — | Jureczko datasets obtained from PROMISE repository | AUC, Wilcoxon rank test CP MO NB (0.72) produces the highest value |
[35] | 2016 | Classification | A software defect prediction model to find faulty components of a software | Hybrid filter approaches FISHER, MR, ANNIGMA. | — | KC1, KC2, JM1, PC1, PC2, PC3, and PC4 datasets | ACC, ent filters, ACC 90% |
[36] | 2017 | Classification | Propose an hybrid method called TSC-RUS + S | A random undersampling based on two-step cluster (TSC) | Stacking: DT, LR, kNN, NB | NASA MDP: i.e., CM1, KC1, KC3, MC2, MW1, PC1, PC2, PC3, PC4 | 10-fold CV, AUC, (TSC-RUS + S) is the best |
[37] | 2017 | Classification | Analyze five popular ML algorithms for software defect prediction | ANN, PSO, DT, NB, LC | — | Nasa and PROMISE datasets: CM1, JM1, KC1, KC2, PC1, KC1-LC | 10-fold CV ANN < DT |
[38] | 2018 | Classification | Three well-known ML techniques are compared. | NB, DT, ANN | — | Three different datasets DS1, DS2, DS3 | ACC, PR, recall, F, ROC ACC 97% DT > ANN > NB |
[10] | 2018 | Classification | ML algorithms are compared with CODEP | LR, BN, RBF, MLP, alternating decision tree (ADTree), and DT | Max, CODEP, Bagging J48, Bagging NB, Boosting J48, Boosting NB, RF | PROMISE: Ant, Camel, ivy, Jedit, Log4j, Lucene, Poi, Prop, Tomcat, Xalan | F-score, PR, AUC ROC Max performs better than CODEP |
Data mining and machine learning studies on the subject “defect prediction.”
Ensemble learning combines several base learning models to obtain better performance than individual models. These base learners can be acquired with:
Different learning algorithms
Different parameters of the same algorithm
Different training sets
The commonly used ensemble techniques bagging, boosting, and stacking are shown in Figure 3 and briefly explained in this part. Bagging (which stands for bootstrap aggregating) is a kind of parallel ensemble. In this method, each model is built independently, and multiple training datasets are generated from the original dataset through random selection of different feature subsets; thus, it aims to decrease variance. It combines the outputs of each ensemble member by a voting mechanism. Boosting can be described as sequential ensemble. First, the same weights are assigned to data instances; after training, the weight of wrong predictions is increased, and this process is repeated as the ensemble size. Finally, it uses a weighted voting scheme, and in this way, it aims to decrease bias. Stacking is a technique that uses predictions from multiple models via a meta-classifier.
Common ensemble learning methods: (a) Bagging, (b) boosting, (c) stacking.
Some software defect prediction studies have compared ensemble techniques to determine the best performing one [10, 18, 21, 39, 40]. In a study conducted by Wang et al. [18], different ensemble techniques such as bagging, boosting, random tree, random forest, random subspace, stacking, and voting were compared to each other and a single classifier (NB). According to the results, voting and random forest clearly exhibited better performance than others. In a different study [39], ensemble methods were compared with more than one base learner (NB, BN, SMO, PART, J48, RF, random tree, IB1, VFI, DT, NB tree). For boosted SMO, bagging J48, and boosting and bagging RT, performance of base classifiers was lower than that of ensemble learner classifiers.
In study [21], a new method was proposed of mixing feature selection and ensemble learning for defect classification. Results showed that random forests and the proposed algorithm are not affected by poor features, and the proposed algorithm outperforms existing single and ensemble classifiers in terms of classification performance. Another comparative study [10] used seven composite algorithms (Ave, Max, Bagging C4.5, bagging naive Bayes (NB), Boosting J48, Boosting naive Bayes, and RF) and one composite state-of-the art study for cross-project defect prediction. The Max algorithm yielded the best results regarding F-score in terms of classification performance.
Bowes et al. [40] compared RF, NB, Rpart, and SVM algorithms to determine whether these classifiers obtained the same results. The results demonstrated that a unique subset of defects can be discovered by specific classifiers. However, whereas some classifiers are steady in the predictions they make, other classifiers change in their predictions. As a result, ensembles with decision-making without majority voting can perform best.
One of the main problems of SDP is the imbalance between the defect and non-defect classes of the dataset. Generally, the number of defected instances is greater than the number of non-defected instances in the collected data. This situation causes the machine learning algorithms to perform poorly. Wang and Yao [19] compared five class-imbalanced learning methods (RUS, RUS-bal, THM, BNC, SMB) and NB and RF algorithms and proposed the dynamic version of AdaBoost.NC. They utilized balance, G-mean, and AUC measures for comparison. Results showed that AdaBoost.NC and naive Bayes are better than the other seven algorithms in terms of evaluation measures. Dynamic AdaBoost.NC showed better defect detection rate and overall performance than the original AdaBoost.NC. To handle the class imbalance problem, studies [20] have compared different methods (sampling, cost sensitive, hybrid, and ensemble) by taking into account evaluation metrics such as MCC and receiver operating characteristic (ROC).
As shown in Table 1, the most common datasets used in the defect prediction studies [17, 18, 19, 39] are the NASA MDP dataset and PROMISE repository datasets. In addition, some studies utilized open-source projects such as Bugzilla Columba and Eclipse JDT [26, 27], and other studies used Android application data [22, 23].
Although use of ensemble learning techniques has dramatically increased recently, studies that do not use ensemble learning are still conducted and successful. For example, in study [32], prediction models were created using source code metrics as in ensemble studies but by using different feature selection techniques such as genetic algorithm (GA).
To overcome the class imbalance problem, Tomar and Agarwal [33] proposed a prediction system that assigns lower cost to non-defective data samples and higher cost to defective samples to balance data distribution. In the absence of enough data within a project, required data can be obtained from cross projects; however, in this case, this situation may cause class imbalance. To solve this problem, Ryu and Baik [34] proposed multi-objective naïve Bayes learning for cross-project environments. To obtain significant software metrics on cloud computing environments, Ali et al. used a combination of filter and wrapper approaches [35]. They compared different machine learning algorithms such as NB, DT, and MLP [29, 37, 38, 41].
Software effort estimation (SEE) is critical for a company because hiring more employees than required will cause loss of revenue, while hiring fewer employees than necessary will result in delays in software project delivery. The estimation analysis helps to predict the amount of effort (in person hours) needed to develop a software product. Basic steps of software estimation can be itemized as follows:
Determine project objectives and requirements.
Design the activities.
Estimate product size and complexity.
Compare and repeat estimates.
SEE contains requirements and testing besides predicting effort estimation [42]. Many research and review studies have been conducted in the field of SEE. Recently, a survey [43] analyzed effort estimation studies that concentrated on ML techniques and compared them with studies focused on non-ML techniques. According to the survey, case-based reasoning (CBR) and artificial neural network (ANN) were the most widely used techniques. In 2014, Dave and Dutta [44] examined existing studies that focus only on neural network.
The current effort estimation studies using DM and ML techniques are available in Table 2. This table summarizes the prominent studies in terms of aspects such as year, data mining task, aim, datasets, and metrics. Table 2 indicates that neural network is the most widely used technique for the effort estimation task.
Ref. | Year | Task | Objective | Algorithms | Ensemble learning | Dataset | Evaluation metrics and results |
---|---|---|---|---|---|---|---|
[45] | 2008 | Regression | Ensemble of neural networks with associative memory (ENNA) | NN, MLP, KNN | Bagging | NASA, NASA 93, USC, SDR, Desharnais | MMRE, MdMRE and PRED(L) For ENNA PRED(25) = 36.4 For neural network PRED(25) = 8 |
[46] | 2009 | Regression | Authors proposed the ensemble of neural networks with associative memory (ENNA) | NN, MLP, KNN | Bagging | NASA, NASA 93, USC, SDR, Desharnais | Random subsampling, t-test MMRE, MdMRE, and PRED(L) ENNA is the best |
[47] | 2010 | Regression | To show the effectiveness of SVR for SEE | SVR, RBF | — | Tukutuku | LOOCV, MMRE, Pred(25), MEMRE, MdEMRE SVR outperforms others |
[48] | 2011 | Regression | To evaluate whether readily available ensemble methods enhance SEE | MLP, RBF, RT | Bagging | 5 datasets from PROMISE: cocomo81, nasa93, nasa, sdr, and Desharnais 8 datasets from ISBSG repository | MMRE, MdMRE, PRED(25) RTs and Bagging with MLPs perform similarly |
[49] | 2012 | Regression | To show the measures behave in SEE and to create good ensembles | MLP, RBF, REPTree, | Bagging | cocomo81, nasa93, nasa, cocomo2, desharnais, ISBSG repository | MMRE, PRED(25), LSD, MdMRE, MAE, MdAE Pareto ensemble for all measures, except LSD. |
[50] | 2012 | Regression | To use cross-company models to create diverse ensembles able to dynamically adapt to changes | WC RTs, CC-DWM | WC-DWM | 3 datasets from ISBSG repository (ISBSG2000, ISBSG2001, ISBSG) 2 datasets from PROMISE (CocNasaCoc81 and CocNasaCoc81Nasa93) | MAE, Friedman test Only DCL could improve upon RT CC data potentially beneficial for improving SEE |
[51] | 2012 | Regression | To generate estimates from ensembles of multiple prediction methods | CART, NN, LR, PCR, PLSR, SWR, ABE0-1NN, ABE0-5NN | Combining top M solo methods | PROMISE | MAR, MMRE, MdMRE, MMER, MBRE, MIBRE. Combinations perform better than 83% |
[52] | 2012 | Classification/regression | DM techniques to estimate software effort. | M5, CART, LR, MARS, MLPNN, RBFNN, SVM | — | Coc81, CSC, Desharnais, Cocnasa, Maxwell, USP05 | MdMRE, Pred(25), Friedman test Log + OLS > LMS, BC + OLS, MARS, LS-SVM |
[53] | 2013 | Clustering/classification | Estimation of software development effort | NN, ABE, C-means | — | Maxwell | 3-fold CV and LOOCV, RE, MRE, MMRE, PRED |
[54] | 2014 | Regression | ANNs are examined using COCOMO model | MLP, RBFNN, SVM, PSO-SVM Extreme learning Machines | — | COCOMO II Data | MMRE, PRED PSO-SVM is the best |
[55] | 2014 | — | A hybrid model based on GA And ACO for optimization | GA, ACO | — | NASA datasets | MMRE, the proposed method is the best |
[56] | 2015 | Regression | To display the effect of data preprocessing techniques on ML methods in SEE | CBR, ANN, CART Preprocessing rech: MDT, LD, MI, FS, CS, FSS, BSS | — | ISBSG, Desharnais, Kitchenham, USPFT | CV, MBRE, PRED (0.25), MdBRE |
[57] | 2016 | Regression | Four neural network models are compared with each other. | MLP, RBFNN, GRNN, CCNN | — | ISBSG repository | 10-fold CV, MAR The CCNN outperforms the other three models |
[58] | 2016 | Regression | To propose a model based on Bayesian network | GA and PSO | — | COCOMO NASA Dataset | DIR, DRM The proposed model is best |
[59] | 2016 | Classification/regression | A hybrid model using SVM and RBNN compared against previous models | SVM, RBNN | — | Dataset1 = 45 industrial projects Dataset2 = 65 educational projects | LOOCV, MAE, MBRE, MIBRE, SA The proposed approach is the best |
[60] | 2017 | Classification | To estimate software effort by using ML techniques | SVM, KNN | Boosting: kNN and SVM | Desharnais, Maxwell | LOOCV, k-fold CV ACC = 91.35% for Desharnais ACC = 85.48% for Maxwell |
Data mining and machine learning studies on the subject “effort estimation.”
Several studies have compared ensemble learning methods with single learning algorithms [45, 46, 48, 49, 51, 60] and examined them on cross-company (CC) and within-company (WC) datasets [50]. The authors observed that ensemble methods obtained by a proper combination of estimation methods achieved better results than single methods. Various ML techniques such as neural network, support vector machine (SVM), and k-nearest neighbor are commonly used as base classifiers for ensemble methods such as bagging and boosting in software effort estimation. Moreover, their results indicate that CC data can increase performance over WC data for estimation techniques [50].
In addition to the abovementioned studies, researchers have conducted studies without using ensemble techniques. The general approach is to investigate which DM technique has the best effect on performance in software effort estimation. For instance, Subitsha and Rajan [54] compared five different algorithms—MLP, RBFNN, SVM, ELM, and PSO-SVM—and Nassif et al. [57] investigated four neural network algorithms—MLP, RBFNN, GRNN, and CCNN. Although neural networks are widely used in this field, missing values and outliers frequently encountered in the training set adversely affect neural network results and cause inaccurate estimations. To overcome this problem, Khatibi et al. [53] split software projects into several groups based on their similarities. In their studies, the C-means clustering algorithm was used to determine the most similar projects and to decrease the impact of unrelated projects, and then analogy-based estimation (ABE) and NN were applied. Another clustering study by Azzeh and Nassif [59] combined SVM and bisecting k-medoids clustering algorithms; an estimation model was then built using RBFNN. The proposed method was trained on historical use case points (UCP).
Zare et al. [58] and Maleki et al. [55] utilized optimization methods for accurate cost estimation. In the former study, a model was proposed based on Bayesian network with genetic algorithm and particle swarm optimization (PSO). The latter study used GA to optimize the effective factors’ weight, and then trained by ant colony optimization (ACO). Besides conventional effort estimation studies, researchers have utilized machine learning techniques for web applications. Since web-based software projects are different from traditional projects, the effort estimation process for these studies is more complex.
It is observed that PRED(25) and MMRE are the most popular evaluation metrics in effort estimation. MMRE stands for the mean magnitude relative error, and PRED(25) measures prediction accuracy and provides a percentage of predictions within 25% of actual values.
Vulnerability analysis is becoming the focal point of system security to prevent weaknesses in the software system that can be exploited by an attacker. Description of software vulnerability is given in many different resources in different ways [61]. The most popular and widely utilized definition appears in the Common Vulnerabilities and Exposures (CVE) 2017 report as follows:
Vulnerability is a weakness in the computational logic found in software and some hardware components that, when exploited, results in a negative impact to confidentiality, integrity or availability.
Vulnerability analysis may require many different operations to identify defects and vulnerabilities in a software system. Vulnerabilities, which are a special kind of defect, are more critical than other defects because attackers exploit system vulnerabilities to perform unauthorized actions. A defect is a normal problem that can be encountered frequently in the system, easily found by users or developers and fixed promptly, whereas vulnerabilities are subtle mistakes in large codes [62, 63]. Wijayasekara et al. claim that some bugs have been identified as vulnerabilities after being publicly announced in bug databases [64]. These bugs are called “hidden impact vulnerabilities” or “hidden impact bugs.” Therefore, the authors proposed a hidden impact vulnerability identification methodology that utilizes text mining techniques to determine which bugs in bug databases are vulnerabilities. According to the proposed method, a bug report was taken as input, and it produces feature vector after applying text mining. Then, classifier was applied and revealed whether it is a bug or a vulnerability. The results given in [64] demonstrate that a large proportion of discovered vulnerabilities were first described as hidden impact bugs in public bug databases. While bug reports were taken as input in that study, in many other studies, source code is taken as input. Text mining is a highly preferred technique for obtaining features directly from source codes as in the studies [65, 66, 67, 68, 69]. Several studies [63, 70] have compared text mining-based models and software metrics-based models.
In the security area of software systems, several studies have been conducted related to DM and ML. Some of these studies are compared in Table 3, which shows the data mining task and explanation of the studies, the year they were performed, the algorithms that were used, the type of vulnerability analysis, evaluation metrics, and results. In this table, the best performing algorithms according to the evaluation criteria are shown in bold.
Ref. | Year | Task | Objective | Algorithms | Type | Dataset description | Evaluation metrics and results |
---|---|---|---|---|---|---|---|
[71] | 2011 | Clustering | Obtaining software vulnerabilities based on RDBC | RDBC | Static | Database is built by RD-Entropy | FNR, FPR |
[42] | 2011 | Classification/regression | To predict the time to next vulnerability | LR, LMS, MLP, RBF, SMO | Static | NVD, CPE, CVSS | CC, RMSE, RRSE |
[65] | 2012 | Text mining | Analysis of source code as text | RBF, SVM | Static | K9 email client for the Android platform | ACC, PR, recall ACC = 0.87, PR = 0.85, recall = 0.88 |
[64] | 2012 | Classification/text mining | To identify vulnerabilities in bug databases | — | Static | Linux kernel MITRE CVE and MySQL bug databases | BDR, TPR, FPR 32% (Linux) and 62% (MySQL) of vulnerabilities |
[72] | 2014 | Classification/regression | Combine taint analysis and data mining to obtain vulnerabilities | ID3, C4.5/J48, RF, RT, KNN, NB, Bayes Net, MLP, SVM, LR | Hybrid | A version of WAP to collect the data | 10-fold CV, TPD, ACC, PR, KAPPA ACC = 90.8%, PR = 92%, KAPPA = 81% |
[73] | 2014 | Clustering | Identify vulnerabilities from source codes using CPG | — | Static | Neo4J and InfiniteGraph databases | — |
[63] | 2014 | Classification | Comparison of software metrics with text mining | RF | Static | Vulnerabilities from open-source web apps (Drupal, Moodle, PHPMyAdmin) | 3-fold CV, recall, IR, PR, FPR, ACC. Text mining provides benefits overall |
[69] | 2014 | Classification | To create model in the form of a binary classifier using text mining | NB, RF | Static | Applications from the F-Droid repository and Android | 10-fold CV, PR, recall PR and recall ≥ 80% |
[74] | 2015 | Classification | A new approach (VCCFinder) to obtain potentially dangerous codes | SVM-based detection model | — | The database contains 66 GitHub projects | k-fold CV, false alarms <99% at the same level of recall |
[70] | 2015 | Ranking/classification | Comparison of text mining and software metrics models | RF | — | Vulnerabilities from open-source web apps (Drupal, Moodle, PHPMyAdmin) | 10-fold CV Metrics: ER-BCE, ERBPP, ER-AVG |
[75] | 2015 | Clustering | Search patterns for taint-style vulnerabilities in C code | Hierarchical clustering (complete-linkage) | Static | 5 open-source projects: Linux, OpenSSL, Pidgin, VLC, Poppler (Xpdf) | Correct source, correct sanitization, number of traversals, generation time, execution time, reduction, amount of code review <95% |
[76] | 2016 | Classification | Static and dynamic features for classification | LR, MLP, RF | Hybrid | Dataset was created by analyzing 1039 test cases from the Debian Bug Tracker | FPR, FNR Detect 55% of vulnerable programs |
[77] | 2017 | Classification | 1. Employ a deep neural network 2. Combine N-gram analysis and feature selection | Deep neural network | — | Feature extraction from 4 applications (BoardGameGeek, Connectbot, CoolReader, AnkiDroid) | 10 times using 5-fold CV ACC = 92.87%, PR = 94.71%, recall = 90.17% |
[67] | 2017 | Text mining | To analyze characteristics of software vulnerability from source files | — | — | CVE, CWE, NVD databases | PR = 70%, recall = 60% |
[68] | 2017 | Text mining | Deep learning (LSTM) is used to learn semantic and syntactic features in code | RNN, LSTM, DBN | — | Experiments on 18 Java applications from the Android OS platform | 10-fold CV, PR, recall, and F-score Deep Belief Network PR, recall, and F-score > 80% |
[66] | 2018 | Classification | Identify bugs by extracting text features from C source code | NB, KNN, K-means, NN, SVM, DT, RF | Static | NVD, Cat, Cp, Du, Echo, Head, Kill, Mkdir, Nl, Paste, Rm, Seq, Shuf, Sleep, Sort, Tail, Touch, Tr, Uniq, Wc, Whoami | 5-fold CV ACC, TP, TN ACC = 74% |
[78] | 2018 | Regression | A deep learning-based vulnerability detection system (VulDeePecker) | BLSTM NN | Static | NIST: NVD and SAR project | 10-fold CV, PR, recall, F-score F-score = 80.8% |
[79] | 2018 | Classification | A mapping between existing requirements and vulnerabilities | LR, SVM, NB | — | Data is gathered from Apache Tomcat, CVE, requirements from Bugzilla, and source code is collected from Github | PR, recall, F-score LSI > SVM |
Data mining and machine learning studies on the subject “vulnerability analysis.”
Vulnerability analysis can be categorized into three types: static vulnerability analysis, dynamic vulnerability analysis, and hybrid analysis [61, 80]. Many studies have applied the static analysis approach, which detects vulnerabilities from source code without executing software, since it is cost-effective. Few studies have performed the dynamic analysis approach, in which one must execute software and check program behavior. The hybrid analysis approach [72, 76] combines these two approaches.
As revealed in Table 3, in addition to classification and text mining, clustering techniques are also frequently seen in software vulnerability analysis studies. To detect vulnerabilities in an unknown software data repository, entropy-based density clustering [71] and complete-linkage clustering [75] were proposed. Yamaguchi et al. [73] introduced a model to represent a large number of source codes as a graph called control flow graph (CPG), a combination of abstract syntax tree, CFG, and program dependency graph (PDG). This model enabled the discovery of previously unknown (zero-day) vulnerabilities.
To learn the time to next vulnerability, a prediction model was proposed in the study [42]. The result could be a number that refers to days or a bin representing values in a range. The authors used regression and classification techniques for the former and latter cases, respectively.
In vulnerability studies, issue tracking systems like Bugzilla, code repositories like Github, and vulnerability databases such as NVD, CVE, and CWE have been utilized [79]. In addition to these datasets, some studies have used Android [65, 68, 69] or web [63, 70, 72] (PHP source code) datasets. In recent years, researchers have concentrated on deep learning for building binary classifiers [77], obtaining vulnerability patterns [78], and learning long-term dependencies in sequential data [68] and features directly from the source code [81].
Li et al. [78] note two difficulties of vulnerability studies: demanding, intense manual labor and high false-negative rates. Thus, the widely used evaluation metrics in vulnerability analysis are false-positive rate and false-negative rate.
During the past years, software developers have used design patterns to create complex software systems. Thus, researchers have investigated the field of design patterns in many ways [82, 83]. Fowler defines a pattern as follows:
“A pattern is an idea that has been useful in one practical context and will probably be useful in others.” [84]
Patterns display relationships and interactions between classes or objects. Well-designed object-oriented systems have various design patterns integrated into them. Design patterns can be highly useful for developers when they are used in the right manner and place. Thus, developers avoid recreating methods previously refined by others. The pattern approach was initially presented in 1994 by four authors—namely, Erich Gama, Richard Helm, Ralph Johnson, and John Vlissides—called the Gang of Four (GOF) in 1994 [85]. According to the authors, there are three types of design patterns:
Creational patterns provide an object creation mechanism to create the necessary objects based on predetermined conditions. They allow the system to call appropriate object and add flexibility to the system when objects are created. Some creational design patterns are factory method, abstract factory, builder, and singleton.
Structural patterns focus on the composition of classes and objects to allow the establishment of larger software groups. Some of the structural design patterns are adapter, bridge, composite, and decorator.
Behavioral patterns determine common communication patterns between objects and how multiple classes behave when performing a task. Some behavioral design patterns are command, interpreter, iterator, observer, and visitor.
Many design pattern studies exist in the literature. Table 4 shows some design pattern mining studies related to machine learning and data mining. This table contains the aim of the study, mining task, year, and design patterns selected by the study, input data, dataset, and results of the studies.
Ref. | Year | Task | Objective | Algorithms | EL | Selected design patterns | Input data | Dataset | Evaluation metrics and results |
---|---|---|---|---|---|---|---|---|---|
[86] | 2012 | Text classification | Two-phase method: 1—text classification to 2—learning design patterns | NB, KNN, DT, SVM | — | 46 security patterns, 34 Douglass patterns, 23 GoF patterns | Documents | Security, Douglass, GoF | PR, recall, EWM PR = 0.62, recall = 0.75 |
[87] | 2013 | Regression | An approach is to find a valid instance of a DP or not | ANN | — | Adapter, command, composite, decorator, observer, and proxy | Set of candidate classes | JHotDraw 5.1 open-source application | 10 fold CV, PR, recall |
[88] | 2014 | Graph mining | Sub-graph mining-based approach | CloseGraph | — | — | Java source code | Open-source project:YARI, Zest, JUnit, JFreeChart, ArgoUML | No any empirical comparison |
[89] | 2015 | Classification/clustering | MARPLE-DPD is developed to classify instances whether it is a bad or good instance | SVM, DT, RF, K-means, ZeroR, OneR, NB, JRip, CLOPE. | — | Classification for singleton and adapter Classification and clustering for composite, decorator, and factory method | — | 10 open-source software systems DPExample, QuickUML 2001, Lexi v0.1.1 alpha, JRefactory v2.6.24, Netbeans v1.0.x, JUnit v3.7, JHotDraw v5.1, MapperXML v1.9.7, Nutch v0.4, PMD v1.8 | 10-fold CV, ACC, F-score, AUC ACC > =85% |
[90] | 2015 | Regression | A new method (SVM-PHGS) is proposed | Simple Logistic, C4.5, KNN, SVM, SVM-PHGS | — | Adapter, builder, composite, factory method, iterator, observer | Source code | P-mart repository | PR, recall, F-score, FP PR = 0.81, recall =0.81, F-score = 0.81, FP = 0.038 |
[91] | 2016 | Classification | Design pattern recognition using ML algorithms. | LRNN, DT | — | Abstract factory, adapter patterns | Source code | Dataset with 67 OO metrics, extracted by JBuilder tool | 5-fold CV, ACC, PR, recall, F-score ACC = 100% by LRNN |
[92] | 2016 | Classification | Three aspects: design patterns, software metrics, and supervised learning methods | Layer Recurrent Neural Network (LRNN) | RF | Abstract factory, adapter, bridge, singleton, and template method | Source code | Dataset with 67 OO metrics, extracted by JBuilder tool | PR, recall, F-score F-score = 100% by LRNN and RF ACC = 100% by RF |
[93] | 2017 | Classification | 1. Creation of metrics-oriented dataset 2. Detection of software design patterns | ANN, SVM | RF | Abstract factory, adapter, bridge, composite, and Template | Source code | Metrics extracted from source codes (JHotDraw, QuickUML, and Junit) | 5-fold and 10-fold CV, PR, recall, F-score ANN, SVM, and RF yielded to 100% PR for JHotDraw |
[94] | 2017 | Classification | Detection of design motifs based on a set of directed semantic graphs | Strong graph simulation, graph matching | — | All three groups: creational, structural, behavioral | UML class diagrams | — | PR, recall High accuracy by the proposed method |
[95] | 2017 | Text categorization | Selection of more appropriate design patterns | Fuzzy c-means | Ensemble-IG | Various design patterns | Problem definitions of design patterns | DP, GoF, Douglass, Security | F-score |
[96] | 2018 | Classification | Finding design pattern and smell pairs which coexist in the code | J48 | — | Used patterns: adapter, bridge, Template, singleton | Source code | Eclipse plugin Web of Patterns The tool selected for code smell detection is iPlasma | PR, recall, F-score, PRC, ROC Singleton pattern shows no presence of bad smells |
Data mining and machine learning studies on the subject “design pattern mining.”
In design pattern mining, detecting the design pattern is a frequent study objective. To do so, studies have used machine learning algorithms [87, 89, 90, 91], ensemble learning [95], deep learning [97], graph theory [94], and text mining [86, 95].
In study [91], the training dataset consists of 67 object-oriented (OO) metrics extracted by using the JBuilder tool. The authors used LRNN and decision tree techniques for pattern detection. Alhusain et al. [87] generated training datasets from existing pattern detection tools. The ANN algorithm was selected for pattern instances. Chihada et al. [90] created training data from pattern instances using 45 OO metrics. The authors utilized SVM for classifying patterns accurately. Another metrics-oriented dataset was developed by Dwivedi et al. [93]. To evaluate the results, the authors benefited from three open-source software systems (JHotDraw, QuickUML, and JUnit) and applied three classifiers, SVM, ANN, and RF. The advantage of using random forest is that it does not require linear features and can manage high-dimensional spaces.
To evaluate methods and to find patterns, open-source software projects such as JHotDraw, Junit, and MapperXML have been generally preferred by researchers. For example, Zanoni et al. [89] developed a tool called MARPLE-DPD by combining graph matching and machine learning techniques. Then, to obtain five design patterns, instances were collected from 10 open-source software projects, as shown in Table 4.
Design patterns and code smells are related issues: Code smell refers to symptoms in code, and if there are code smells in a software, its design pattern is not well constructed. Therefore, Kaur and Singh [96] checked whether design pattern and smell pairs appear together in a code by using J48 Decision Tree. Their obtained results showed that the singleton pattern had no presence of bad smells.
According to the studies summarized in the table, the most frequently used patterns are abstract factory and adapter. It has recently been observed that studies on ensemble learning in this field are increasing.
One of the SE tasks most often used to improve the quality of a software system is refactoring, which Martin Fowler has described as “a technique for restructuring an existing body of code, altering its internal structure without changing its external behavior” [98]. It improves readability and maintainability of the source code and decreases complexity of a software system. Some of the refactoring types are: Add Parameter, Replace Parameter, Extract method, and Inline method [99].
Code smell and refactoring are closely related to each other: Code smells represent problems due to bad design and can be fixed during refactoring. The main challenge is to obtain which part of the code needs refactoring.
Some of data mining studies related to software refactoring are presented in Table 5. Some studies focus on historical data to predict refactoring [100] or to obtain both refactoring and software defects [101] using different data mining algorithms such as LMT, Rip, and J48. Results suggest that when refactoring increases, the number of software defects decreases, and thus refactoring has a positive effect on software quality.
Ref. | Year | Task | Objective | Algorithms | EL | Dataset | Evaluation metrics and results |
---|---|---|---|---|---|---|---|
[100] | 2007 | Regression | Stages: (1) data understanding, (2) preprocessing, (3) ML, (4) post-processing, (5) analysis of the results | J48, LMT, Rip, NNge | — | ArgoUML, Spring Framework | 10-fold CV, PR, recall, F-score PR and recall are 0.8 for ArgoUML |
[101] | 2008 | Classification | Finding the relationship between refactoring and defects | C4.5, LMT, Rip, NNge | — | ArgoUML, JBoss Cache, Liferay Portal, Spring Framework, XDoclet | PR, recall, F-score |
[102] | 2014 | Regression | Propose GA-based learning for software refactoring based on ANN | GA, ANN | — | Xerces-J, JFreeChart, GanttProject, AntApache, JHotDraw, and Rhino. | Wilcoxon test with a 99% confidence level (α = 0.01) |
[103] | 2015 | Regression | Removing defects with time series in a multi-objective approach | Multi-objective algorithm, based on NSGA-II, ARIMA | FindBugs, JFreeChart, Hibernate, Pixelitor, and JDI-Ford | Wilcoxon rank sum test with a 99% confidence level (α < 1%) | |
[104] | 2016 | Web mining/clustering | Unsupervised learning approach to detect refactoring opportunities in service-oriented applications | PAM, K-means, COBWEB, X-Means | — | Two datasets of WSDL documents | COBWEB and K-means max. 83.33% and 0%, inter-cluster COBWEB and K-means min. 33.33% and 66.66% intra-cluster |
[105] | 2017 | Clustering | A novel algorithm (HASP) for software refactoring at the package level | Hierarchical clustering algorithm | — | Three open-source case studies | Modularization Quality and Evaluation Metric Function |
[99] | 2017 | Classification | A technique to predict refactoring at class level | PCA, SMOTE LS-SVM, RBF | — | From tera- PROMISE Repository seven open-source software systems | 10-fold CV, AUC, and ROC curves RBF kernel outperforms linear and polynomial kernel The mean value of AUC for LS-SVM RBF kernel is 0.96 |
[106] | 2017 | Classification | Exploring the impact of clone refactoring (CR) on the test code size | LR, KNN, NB | RF | data collected from an open-source Java software system (ANT) | PR, recall, accuracy, F-score kNN and RF outperform NB ACC (fitting (98%), LOOCV (95%), and 10 FCV (95%)) |
[107] | 2017 | — | Finding refactoring opportunities in source code | J48, BayesNet, SVM, LR | RF | Ant, ArgoUML, jEdit, jFreeChart, Mylyn | 10-fold CV, PR, recall 86–97% PR and 71–98% recall for proposed tech |
[108] | 2018 | Classification | A learning-based approach (CREC) to extract refactored and non-refactored clone groups from repositories | C4.5, SMO, NB. | RF, Adaboost | Axis2, Eclipse.jdt.core, Elastic Search, JFreeChart, JRuby, and Lucene | PR, recall, F-score F-score = 83% in the within-project F-score = 76% in the cross-project |
[109] | 2018 | Clustering | Combination of the use of multi-objective and unsupervised learning to decrease developer’s effort | GMM, EM | — | ArgoUML, JHotDraw, GanttProject, UTest, Apache Ant, Azureus | One-way ANOVA with a 95% confidence level (α = 5%) |
Data mining and machine learning studies on the subject “refactoring.”
While automated refactoring does not always give the desired result, manual refactoring is time-consuming. Therefore, one study [109] proposed a clustering-based recommendation tool by combining multi-objective search and unsupervised learning algorithm to reduce the number of refactoring options. At the same time, the number of refactoring that should be selected is decreasing with the help of the developer’s feedback.
Since many SE studies that apply data mining approaches exist in the literature, this article presents only a few of them. However, Figure 4 shows the current number of papers obtained from the Scopus search engine for each year from 2010 to 2019 by using queries in the title/abstract/keywords field. We extracted publications in 2020 since this year has not completed yet. Queries included (“data mining” OR “machine learning”) with (“defect prediction” OR “defect detection” OR “bug prediction” OR “bug detection”) for defect prediction, (“effort estimation” OR “effort prediction” OR “cost estimation”) for effort estimation, (“vulnerab*” AND “software” OR “vulnerability analysis”) for vulnerability analysis, and (“software” AND “refactoring”) for refactoring. As seen in the figure, the number of studies using data mining in SE tasks, especially defect prediction and vulnerability analysis, has increased rapidly. The most stable area in the studies is design pattern mining.
Number of publications of data mining studies for SE tasks from Scopus search by their years.
Figure 5 shows the publications studied in classification, clustering, text mining, and association rule mining as a percentage of the total number of papers obtained by a Scopus query for each SE task. For example, in defect prediction, the number of studies is 339 in the field of classification, 64 in clustering, 8 in text mining, and 25 in the field of association rule mining. As can be seen from the pie charts, while clustering is a popular DM technique in refactoring, no study related to text mining is found in this field. In other SE tasks, the preferred technique is classification, and the second is clustering.
Number of publications of data mining studies for SE tasks from Scopus search by their topics.
Defect prediction generally compares learning algorithms in terms of whether they find defects correctly using classification algorithms. Besides this approach, in some studies, clustering algorithms were used to select futures [110] or to compare supervised and unsupervised methods [27]. In the text mining area, to extract features from scripts, TF-IDF techniques were generally used [111, 112]. Although many different algorithms have been used in defect prediction, the most popular ones are NB, MLP, and RBF.
Figure 6 shows the number of document types (conference paper, book chapter, article, book) published between the years of 2010 and 2019. It is clearly seen that conference papers and articles are the most preferred research study type. It is clearly seen that there is no review article about data mining studies in design pattern mining.
The number of publications in terms of document type between 2010 and 2019.
Table 6 shows popular repositories that contain various datasets and their descriptions, which tasks they are used for, and hyperlinks to download. For example, the PMART repository includes source files of java projects, and the PROMISE repository has different datasets with software metrics such as cyclomatic complexity, design complexity, and lines of code. Since these repositories contain many datasets, no detailed information about them has been provided in this article.
Repository | Topic | Description | Web link |
---|---|---|---|
Nasa MDP | Defect Pred. | NASA’s Metrics Data Program | |
Android Git | Defect Pred. | Android version bug reports | |
PROMISE | Defect Pred. Effort Est. | It includes 20 datasets for defect prediction and cost estimation | |
Software Defect Pred. Data | Defect Pred. | It includes software metrics, # of defects, etc. Eclipse JDT: Eclipse PDE: | |
PMART | Design pattern mining | It has 22 patterns 9 Projects, 139 ins. Format: XML Manually detected and validated |
Description of popular repositories used in studies.
Refactoring can be applied at different levels; study [105] predicted refactoring at package level using hierarchical clustering, and another study [99] applied class-level refactoring using LS-SVM as learning algorithm, SMOTE for handling refactoring, and PCA for feature extraction.
Data mining techniques have been applied successfully in many different domains. In software engineering, to improve the quality of a product, it is highly critical to find existing deficits such as bugs, defects, code smells, and vulnerabilities in the early phases of SDLC. Therefore, many data mining studies in the past decade have aimed to deal with such problems. The present paper aims to provide information about previous studies in the field of software engineering. This survey shows how classification, clustering, text mining, and association rule mining can be applied in five SE tasks: defect prediction, effort estimation, vulnerability analysis, design pattern mining, and refactoring. It clearly shows that classification is the most used DM technique. Therefore, new studies can focus on clustering on SE tasks.
LMT | logistic model trees |
Rip | repeated incremental pruning |
NNge | nearest neighbor generalization |
PCA | principal component analysis |
PAM | partitioning around medoids |
LS-SVM | least-squares support vector machines |
MAE | mean absolute error |
RBF | radial basis function |
RUS | random undersampling |
SMO | sequential minimal optimization |
GMM | Gaussian mixture model |
EM | expectation maximizaion |
LR | logistic regression |
SMB | SMOTEBoost |
RUS-bal | balanced version of random undersampling |
THM | threshold-moving |
BNC | AdaBoost.NC |
RF | random forest |
RBF | radial basis function |
CC | correlation coefficient |
ROC | receiver operating characteristic |
BayesNet | Bayesian network |
SMOTE | synthetic minority over-sampling technique |
Open Access publishing helps remove barriers and allows everyone to access valuable information, but article and book processing charges also exclude talented authors and editors who can’t afford to pay. The goal of our Women in Science program is to charge zero APCs, so none of our authors or editors have to pay for publication.
",metaTitle:"What Does It Cost?",metaDescription:"Open Access publishing helps remove barriers and allows everyone to access valuable information, but article and book processing charges also exclude talented authors and editors who can’t afford to pay. The goal of our Women in Science program is to charge zero APCs, so none of our authors or editors have to pay for publication.",metaKeywords:null,canonicalURL:null,contentRaw:'[{"type":"htmlEditorComponent","content":"We are currently in the process of collecting sponsorship. If you have any ideas or would like to help sponsor this ambitious program, we’d love to hear from you. Contact us at info@intechopen.com.
\\n\\nAll of our IntechOpen sponsors are in good company! The research in past IntechOpen books and chapters have been funded by:
\\n\\nWe are currently in the process of collecting sponsorship. If you have any ideas or would like to help sponsor this ambitious program, we’d love to hear from you. Contact us at info@intechopen.com.
\n\nAll of our IntechOpen sponsors are in good company! The research in past IntechOpen books and chapters have been funded by:
\n\n