Thursday, October 3, 2019
Antivirus Research And Development Techniques
Antivirus Research And Development Techniques Antivirus software is the most booming product which has constant developments to be most up to date defensive detecting product competing with all other antivirus software products available in the commercial market. This thesis covers few techniques used by the antivirus products, a general background information about viruses and antivirus products, some research made on antivirus overheads which shows what overheads are introduced to the computer on using an antivirus products, a research made on one of the most important and common technique used by the antivirus software products to detect viruses which is signature based detection, also covers how antivirus software is updated and how new virus signatures are updated to the virus database. There is some research also on selected algorithms used by the techniques, here in this thesis it is explained how each selected algorithm works to detect the code or a file as an infected file or uninfected. In the experimentation, the expe riment is done to detect a virus using three selected popularly known antivirus software products, where reports shown by the three products are compared and concluded. Chapter 1: Introduction A life without computers cannot be imagined in the present life style where it plays a very important role though it might be any field one chooses from the millions. Computer is vulnerable to attacks which are most dangerous and hard to handle with. Just like humans even computers are attacked by viruses. A virus can be in a form of worm, malware or Trojan horses anything that infects the computer. The common source of these viruses is World Wide Web where a malicious person can spread the malware very easily. Many researchers found many methods or procedures to stop the attacks of virus that came up with many techniques or software to remove the viruses which are called Anti-Virus software. A computer virus spreads into the computer through emails, floppy disks, internet and many other sources. The spreading mechanism is usually from one computer to another where it corrupts data or deletes the data from the computer. The viruses mostly spread through internet or through emails which may have some hidden illicit software where the user unknowingly downloads the material into the computer. A virus can attack or cause damage to boot sector, system files, data files, software and also on system bios. There are many newer viruses which attack on many other parts of the computer. Viruses can spread by booting the computer using the infected file, executing or installing the infected file, or by opening the infected data or file. The main hardware sources can be floppy disks, compact disks, USB or external hard drives or a connection with other computer on an unsafe medium. This rapid growth of viruses is challenging the antivirus software in different fields like prevention of viruses, preparation, detection, recovery and control of viruses. Nowadays there are so many antivirus software tools that remove viruses from the PC and helps protect from future attacks. Antivirus raises privacy and security issues of our computers we work on which is a major issue. However, after taking so many safety measures the growth of viruses is rapidly increasing which are most dangerous and wider. In this thesis, a history on viruses and evolution of antivirus software is shown where I will explain about how viruses came into existence and what type of viruses evolved and antivirus software discovery. This general criteria of this thesis is mainly targeted on three selected techniques and is mostly concentrated one technique out of the selected three techniques and scanning methods of antivirus products and also gives a basic scenario of how an antivirus product adopts a framework to update the virus database and also gives some information about how a general computer gets an information to update the product to make it ready to defend against the zero-day viruses. A brief comparison of viruses based on types where the definitions and related threats of viruses will be explained and the working effects of each type of viruses are explained. The working of antivirus software on different types of viruses is explained. Analysis of the current antivirus techniques, showing both advantages and disadvantages. In chapter 2 gives you the general outline of the thesis in which you can know a general history of the viruses, evolution of the antivirus software. A definition to the virus, types of viruses, the most common methods or techniques used. In chapter 3 Literature Review, shows the research and review of some selected papers or literature that I found interesting about w antivirus software. In this chapter, there is research in which some antivirus products, techniques and algorithms compared according to the developments in the recent times. Chapter 4 Experimentation part of the thesis where the comparison of different commercial antivirus products based on their efficiency to detect a virus is shown and also the results are based on false positives, false negatives and hit ratios shown by each antivirus product. Chapter 5 Conclusion concludes the thesis summarizing research and experimentation done on antivirus products. Appendix holds relevant information about the undefined key words or frameworks used in this thesis. Chapter 2 Overview This chapter gives general information about the viruses and antivirus giving some basic information about the virus history and when the antivirus software evolved. There different types of viruses and are classified according to the attacking features. This chapter will lead to better understanding of the techniques used by the antivirus products and also gives you basic knowledge about different antivirus products. 2.1 History of Viruses The computer virus is a program that copies itself to the computer without user permission and infects the system (Vinod et al. 2009). Virus basically means an infection which can be of many types of malware which include worms, trojan horses, rootkits, spyware and adware. The first work on computer programs was done by John Von Neumann in 1949 (wiki 2010). In his work he suggested that a computer program (the term virus was still not invented) can self-reproduce. The first virus was discovered in early 1990s which is Creeper virus. Creeper copies itself to other computers over a network and shows messages on the infected machine: IM THE CREEPER: CATCH ME IF YOU CAN. It was harmless but to catch the Creeper and stop it the Reaper was released. In 1974 Rabbit a program that spreads and multiples itself quickly and crashes the infected system after it reaches a certain limit or number of copies. In 1980s the virus named Elk Cloner has infected many PCs. The Apple II computer which was released in 1977 loads its operating system from the floppy disks, using these characteristics the Elk Cloner installed itself to the boot sector of the floppy disk and was loaded already before the operating system. Ã ©Brain was the first stealth IBM-compatible virus. This stealth virus hides itself from being known and when detected it attempts to read the infected boot sector and displays the original, uninfected data. In 1987 the most dangerous virus got into news was Vienna virus which was first to infect the .COM files. Whenever the infected file was called it infects the other .COM files in the same directory. It was the first virus that was successfully neutralized by Bernd Fix and which leads to the idea of antivirus software. Then there were many viruses which were Cascade virus the first self-encrypting virus, Suriv Family virus which was a memory resident DOS file virus. Extremely dangerous virus was Datacrime virus which destructs FAT tables and cause loss of data. In 1990s there was Chameleon Virus, Concept virus and then CIH virus and in 2000s there were ILOVEYOU virus, My Doom Sasser. (Loebenberegr 2007) Vinod et al. 2009 defines computer virus as A program that infects other program by modifying them and their location such that a call to an infected program is a call to a possibly evolved, functional similar, copy of virus. To protect from the attacks, the antivirus software companies include many different methodologies for protecting against the virus attacks. 2.2 Virus Detectors The virus detector scans the file or a program to check whether file/program is malicious or benign. In this research there will be usage of some technical terms and detection methods which are defined below. The main goal for testing the file/program is to find for false positives, false negatives and hit ratio.(Vinod et. al. 2009) False Positive: This takes place when the scanner detects a non-infected file as a virus by error. They can be a waste of time and resources. False Negatives: This occurs when the scanners fail to detect the virus in an infected files. Hit Ratio: This happens when the virus scanner scans the virus. Detections are based on 3 types of malware which are: Basic In basic type the malware attacks the program at the entry point as shown in the figure 2.2.1. The control is transferred to virus payload as the entry point itself is infected. Infected Code Main Code Entry Infected by virus Figure 2.2.1 Attacking system by basic malware. (Vinod et al 2009) Polymorphic Polymorphic viruses are viruses which mutates by hiding the original code the virus consists of encrypted malware code along with decrypted unit. They create new mutants very time it is executed. The figure 2.2.2 shows how the main code or original code is encrypted by infected file to produce a decrypted virus code. Virus Code Decrypted Code Main Code Entry Encrypted by infected file Figure 2.2.2 Attacking system by polymorphic viruses. (Vinod et al 2009) Metamorphic Metamorphic viruses can reprogram themselves using some obfuscation techniques so that the new variants are not same as the original. It sees that the signatures of the subsets are not same as the main set. Form B Virus A Form A S1 S2 S3 Figure 2.2.3 Attacking system by metamorphic viruses. (Vinod et al 2009) The above figure 2.2.3 shows that the original virus and form of that virus have different signatures where s1, s2 s3 are different signatures. 2.3 Detection Methods 2.3.1 Signature based detection Here the scanners search for signatures which are sequence of bytes within the virus code and shows that the programs scanned are malicious. The signatures are developed easy if the network behavior is identified. Signature based detection is based on pattern matching. The pattern matching techniques evolved from times when the operating system was DOS. The viruses then were parasitic in nature and used to attack the host files and most common executable files. (Daniel, Sanok 2005) 2.3.2 Heuristic based detection Heuristics describe a method of scanning a virus by evaluating the patterns of behaviors. It takes the possibility of the file or program being a virus by testing the uniqueness and behavior matching them to the database of the antivirus heuristic which contains number of indicators. It is helpful to discover those viruses which does not have signatures or hides their signatures. It is also helpful to detect the metamorphic viruses (Daniel, Sanok 2005) 2.3.3 Obfuscation Technique This technique is used by the viruses to transform an original program into virus program using some transformation functions which makes the virus program irreversible, performs comparably with original program and has the functions of the original program. This technique is used mainly by metamorphic and polymorphic viruses. (Daniel, Sanok 2005) Antivirus Products There are many antivirus products available in the commercial market. Some of the most commonly used antivirus products are: McAfee G Data Symantec Avast Kaspersky Trend Micro AVG Bit Defender Norton ESET Nod32 Chapter 3: Literature Review 3.1 Antivirus workload characterization A research done by (Derek, Mischa, David 2005) shows an antivirus software package takes many ranges of techniques to check whether the file is infected or not. But from the observations of (Derek, Mischa, David 2005) to best difference between some antivirus software packages compare the overheads introduced by the respective antivirus software during on-access execution. When running antivirus software there is usage of two main models which are: on-demand. on-access. On-demand involves the scanning of the user specified files where as on-access can be a process that checks the system-level and the user-level operations and scans when an event occurs. The paper discusses the behavior of four different anti-virus software packages which run on a Intel Pentium IV being installed with Windows XP Professional. Considering three different test scenarios: A small executable file is copied from the CDROM to the hard disk. Executing a calc.exe And also executing wordpad.exe. All these executable files are running on the Windows XP Professional operating system. The antivirus packages used in this experiment were Cillin, F-Port, McAfee and Norton. The execution of the files are done using the before mentioned antivirus packages. Figure 3.1.1 shows the usage of these packages introduces some overheads during the execution which increases the time of execution. Fig 3.1.1 Performance degradation of antivirus packages (Derek, Mischa, David 2005) Then a test was made to know about the extra instructions executed when the file system operations is performed and also when loading and executing a binary. Taking the both scenarios a small binary of very less size is involved. It is found that the execution is dominated by some hot basic blocks in each antivirus package. A basic block is considered hot if it is visited more than fifty thousand times. To detect the behavior of antivirus software packages the (Derek, Mischa, David 2005) used the platform which was majorly targeted by the virus attacks and also must have the existence of some of the commercial antivirus software. A framework of simulator is introduced here called Virustech Simics this has architectural structure as shown in table 3.1.1. Virustech Simics is a simulator that includes a cycle-accurate micro-architectural model and used to get cycle-accurate performance numbers. Table 3.1.1 Virustech Simics architectural structures (Derek, Mischa, David 2005) Processor Model Processor Operating Frequency L1 Trace Cache L1 Data Cache L2 Cache Main Memory Intel Pentium 4 2.0A 2GHz 12K entry 8KB 512KB 256MB The goal behind the model is to confine the execution of antivirus software on a system. To achieve metrics the stream executed is passed to the simulator. To simulate the micro-processor, simics are configured. The host (simulator) executes the operating system loaded via simulated hard drive. On top of the operating system the researchers have installed and run the antivirus software and also the test scenarios are taken (see figure 3.1.2). After this the comparison is done between the baseline configuration execution (without the antivirus software installed) and the systems that are installed with four different antivirus packages. L2 Cache Copy/execute process Antivirus Process L1 Inst Cache L1 data Cache Operating System (Windows XP) Inst Stream Simulate micro-architecture Simulated Architecture HOST Fig 3.1.2 Multi Level architectural Micro Architectural simulation environment (Derek, Mischa, and David 2005) The table 3.1.2 shows the summary of five configurations. For each experiment an image file is created and loaded as a CDROM in the machine. The execution of the utility (contains special instructions) at the start and end of each collection was done in order to assist accurate profile collection. Table 3.1.2: Five environments evaluated: Base has no antivirus software running (Derek, Mischa, David 2005) Configuration Anti-Virus edition Version Base NAV PC-Cillin McAfee F-Port Norton Anti-Virus Professional 2004 Trend Micro Internet Security McAfee Virus scan professional F-Port Antivirus for windows 10.0.0.109 11.0.0.1253 8.0.20 3.14b The three different operations invoke anti-virus scanning. In first, a file from the CDROM to the hard drive was copied, and then the operating system accessories: calculator and wordpad are run accessing through a shortcut. After experimentation it is found that there is less than one percent difference in the work load parameters throughout the profile runs. Then on doing the antivirus characterization it is seen that there is a gradual increase in the cache activity which shows that the overheads released is smallest for F-Port and highest for Norton. The impact on memory while running the antivirus software shows that Norton and McAfee have larger footprints that the Base case, F-Port Cillin. 3.2 Development techniques a framework showing malware detection using combination of techniques There are several developments in techniques used by antivirus software. These techniques must be able to detect viruses which were not detected by previous techniques and this is what we say a development in technique. Antivirus software not only does detect a virus but also worms, Trojan horses, spyware and other malicious codes which constitute malware. Malware is a code or a program which intents to damage the computer with its malicious code. We can filter malware by use of specific antivirus software that installs detection techniques and algorithms. Several commercial antivirus programs uses a common technique called signature-based matching; this technique must be often updated to store new malware signatures in virus dictionary. As the technology advances plenty of malware writers aim to employ better hiding techniques, importantly rootkits became a security issue because of its higher hiding ability. There is a development of many new detection methods which are used to detect malware, machine learning technique and data mining technique. In this research Zolkipli, M.F.; Jantan, A.,2010 have proposed a new framework to detect malware for which there is a combination of two techniques signature based technique and machine learning technique. This framework has three main sections which are signature-based detection, genetic algorithm based detection signature generator. Zolkipli, M.F.; Jantan, A., 2010 defines malware as the software that performs actions intended by an attacker without consent of the owner when executed. Every malware has precise individuality, goal attack and transmission method. According to Zolkipli, M.F.; Jantan, A., 2010 virus is that malware, which when executed tries to replicate itself into other executable code within a host. What so ever, as technology advances creating malware became sophisticated and extensively improved since early days. Signature-based matching technique is most common approach to detect malware, this technique works by contrasting file content with the signature by using an approach called string scan that search for pre-defined bit patterns. There are some limitations which needs to be solved to this technique though it is popular and very reliable for host-based security tool. The problem with signature-based matching technique I it fails to detect zero-day virus attack or zero-day malware attack. Zero-day malware attack are also called new launch malware. To store and capture a new virus pattern for upcoming use, some number of computers needs to be infected. Figure 3.2.1 shows an automatic malware removal and system repair was developed by F.Hsu et al. 2006 which has three important parts such as monitor, a logger, and a recovery agent. The framework solves two problems: Determines the un-trusted program that breaks the system integrity. Removal of un-trusted program Untrusted Process Trusted Process Logger Recovery agent Monitor Operating System Figure 3.2.1: Framework for monitoring, logging recovery by F.Hsu et al. 2006 The framework is used to monitor and enter logs of the un-trusted program. This framework is capable of defending known and unknown malware, though it does not need any prior information of the un-trusted programs. And from the user side there is no need of modifying any current programs and need not observe that the program is running in the framework as the framework is invisible to both known and unknown malware. A sample of this framework was used on the windows environment and shows that all the malware changes can be detected compared to the commercial tools which use the signature based technique. Machine learning algorithm was tested and applied on the malware detection technique. In order to classify the signature-based technique limitations that particular technique was using an adaptive data compression. The two restrictions of signature-based technique according to Zolkipli, M.F.; Jantan, A., 2010 are: It is not compulsory that all malicious programs have bit patterns which are proof of their malicious nature and are also not recorded in virus dictionaries. Many forms of bit patterns are taken by obfuscated malware that will not work on signature-based technique. Genetic Algorithm (GA) takes the full advantage of system limitations that are used to detect zero day malware or the day malware was launched. The algorithm was used to develop a detection technique called IMAD that analyzes the new malware. To oppose the restrictions of signature-based detection technique this technique has been developed. Data mining is another technique which was applied on malware detection much before. The standard data mining algorithm classifies every block file content as normal or used to categorize potentially the malware. To defeat the limitations of signature-based antivirus programs an Intelligent Malware Detection System known as IMDS was developed. This system used Object Oriented Association which adapts OOA_Fast_FPGrowth algorithm. A complete experimentation on windows API file sequence was done which re called PE files. The huge gathering of PE files was taken from the King Soft Corporation antivirus laboratory which is used to compare many malware detection approaches. The results show that IMDS system shows the best results than Norton and McAfee. The proposed framework has two techniques combined which are signature-based technique and GA technique. It was designed to resolve two challenges of malware detections. How to detect newly launched malware (Zolkipli, M.F.; Jantan, A., 2010) How to generate signature from infected file (Zolkipli, M.F.; Jantan, A., 2010) Signature Generator S-Based Detection GA Detection Figure 3.2.2: Framework for malware detection technique (Zolkipli, M.F.; Jantan, A., 2010) The main components are s-based detection, s-based generator and GA detection(see figure 3.2.2). The s-based detection acts first in defending the malware, then GA detection is the second layer which is another defense layer that is used to detect newly launched malware. After creating the new signature from zero-day malware these signatures are used by signature based detection technique. Signature based detection is a fixed examining method used on every antivirus product. This is also called a static analysis method. This decides whether the code is malicious or not by using its malware characterization. This technique is sometimes also called scan strings. In general every malware has one or more patterns of signature which has unique characters. Antivirus software searches through data stream bytes, when a program is executed. Database of antivirus software has thousands of signatures it scans through each signature comparing it with the program code which is executed. For comparing purposes searching algorithm is used, the comparison is usually between program code content with the signature database. The Zolkipli, M.F.; Jantan, A., 2010 chooses this technique at the beginning of the framework because of its effective detection of well known viruses. This technique was used in this framework in order to develop the competence of computer operation. G.A detection technique is one of the most popular technique that is used to detect newly launched malware. This is used to learn approaches to resolve algebraic or statistical research problems. This is a machine learning technique which applies genetic programming that learns a evolving population. Chromosomes are used for data representation which is used in this algorithm, chromosomes are bit string values, new chromosomes are developed from a bit string combinations from existing chromosomes. Basing the nature of the problem the solution for the problem is given. Crossover and mutation are 2 types of basic operations in GA, to solve the issues concerned with polymorphic viruses and new types of malware this technique was introduced in this framework. By using this technique codes of malware using hidden technique can also be detected which only because of its learning and filtering aspects of virus behavior.( Zolkipli, M.F.; Jantan, A., 2010) S-based generator generate string patterns are used by signatures which are used to characterize and identify the viruses. Forensic experts started creating signatures once a new virus sample is found, based on the virus behavior these signatures are created. All the antivirus products creates their own signatures and accessing records they are encrypted in case there are more than one antivirus software installed on the computer. As soon as a signature is created the signature database is updated with it. Every computer user requires updating the antivirus product with the database in order to defense against the new viruses. Signature pattern is 16 bytes and to detect 16 bit virus 16 bytes is more than enough.( Zolkipli, M.F.; Jantan, A., 2010) This generator takes the behavior of virus which identified by the GA detection. The signature pattern of the virus is generated and is added to virus database as a new signature for the signature based detection. To replace the forensic experts task this framework was proposed. This creation of framework was lot useful in detecting the new virus signature, and to improve the efficiency and performance of the computer. 3.3 Improving speed of signature scanners using BMH algorithm. This paper discusses about the problem of detecting viruses using signature scanning method that relies on fast pattern matching algorithm So basically in this technique the pattern is a virus signature which is searched for anywhere in the file. This algorithm is an expensive task which affects the performance frequently. Many users may find it impatient if the pattern matching algorithm does not work fast and consumes lot of time. So to avoid this faster pattern matching algorithm is used to the scanner which is Boyer-Moore Horspool algorithm when compare d to Boyer-Moore algorithm and Turbo Boyer Moore algorithm proved to be the fastest pattern matching algorithm. In technical terms, a virus has three parts which are trigger, infection mechanism and payload. The main mechanism which is infection mechanism part actually looks for fatalities and frequently avoids multiple infections. After looking for fatalities it might overwrite the fatalities or can attach itself at the beginning of the file or at the end of the fle. Trigger is actually a event which specifies when the payload has to be executed. The payload is the foundation of malicious behavior which actually can be corruption of boot sector or manipulating files. To detect a virus and to disinfect the infected file are two most important tasks of algorithms used by antivirus software. So defense system code of the algorithm must have a part that is able to detect any type of virus code. There are four types of basic detection techniques. Integrity Checking Signature Scanning Activity Monitoring Heuristic Method. Integrity checking technique: This program gives checker codes that can be checksums, CRCs or hashes of files that are used to check viruses. Regularly the checksum are re-computed and is compared against the previous checksums. In case the two checksums does not match it is indicated that the file is infected since the file is modified. This technique detects the virus presence by detecting the change in files and also is capable to detect new or unknown viruses. But this technique has several drawbacks. Firstly, the primary checksum calculation has to be performed on a virus less clean system so the technique can never detect viruses if system is infected. Secondly there are lots of false positives if the system is modified during execution. (Sunitha Kanaujiya, et., al 2010) Signature scanning technique: This technique is used on large scale to detect virus. This reads data from a system and to that it applies pattern matching algorithm to list of existing virus patterns in case it matches with the existing patterns it is a virus. This scanning technique is effective but the pattern database needs frequent updating which is very easy. There are several advantages of this scanner one of it is the scanning speed for this technique can be increased, it can also be used to detect other types of malicious programs like Trojan horses, worms, logic bombs, etc. So mainly for the virus it is only signature of the virus which is needed and update it to the database. This technique is used on many viruses due to this reason. Activity monitoring technique: This technique is used to monitor the behavior of programs executed by some other programs these monitoring programs are known as behavior monitor and they stay in main memory. The behavior monitors alarms or do some action to prevent the program when it tries to do some unusual activities like interrupting tables, partition tables or boot sectors. The database maintains every virus behavior that is supposed to be. The main disadvantage is when the new virus uses another infecting method that is not in the database and in this scenario finding virus is helpless. Secondly viruses avoid defense by activating earlier in the boot sequence prior to the behavior monitors. And also viruses modify the monitors
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.