 聯系方式 您當前位置：首頁 >> Matlab編程Matlab編程

日期：2019-06-13 10:11

Math 160 (De Loera) Final Project

Due date: June 13th

INSTRUCTIONS

This is the final project of the course (35 points total). No collaboration is

allowed. You are supposed to developed everything on your own. You can ask

me in class or in office hours. If you use a source outside the notes of class you

have to cite it properly.

Solutions need to be submitted via CANVAS using the uploading online capabilities.

Submit files that contains all code (with comments), and all pdf latex docs.

Do not forget to submit the data you used (specially if you reformatted things)!

Make sure we can run the code from a simple call. E.g., all data files should be

included. We will not download files for you or retype or copy paste your code.

If it does not run straight out of the box, you will receive zero points!!. Be organized

and use the notation appropriately. Show your work on every problem.

Correct answers with no support work will not receive full credit.

1. How to influence the opinion of people when on a budget. (11 points)

Apparently it is in fashion to use social networks, like facebook, to influence voters in elections and

other awful things (for example https://www.nytimes.com/2019/05/17/technology/facebook-ai-schroepfer.

html?action=click&module=Editors%20Picks&pgtype=Homepage)

In this problem, you will consider a simplified model of influence in social networks: the social

network is represented by an undirected graph G = (V, E) and for a set of nodes S V , we denote

by N(S) the set of their neighbors:

N(S) = {v ∈ V | u ∈ S, (u, v) ∈ E}.

The influence I(S) of a set of nodes is measured by I(S) = |N(S)|.

Imagine you work now for facebook and you are given the dataset available at http://www.math.

ucdavis.edu/~deloera/TEACHING/MATH160/facebookgraph.txt. This dataset is in fact a subgraph

of the real Facebook social graph. Each line in the file contains the id of two users, indicating

that these two users are friends on Facebook.

a. As the designer of a marketing campaign to influence the opinion of voters, your goal is to find

a subset S V of at most K nodes whose influence is maximal. Write a mathematical model

to solve this problem. Can you solve the problem for the data you were given using SCIP? If

not, can you write a practical method to give an approximation to the optimum? Extra bonus

if you can prove a guarantee of approximation.

b. Using any of your models/methods from part (a) write a computer function which, given the

social network described in the dataset and a budget K ∈ N

+, returns an approximately

optimal set of nodes S for the influence function I(S). The function should return both the

users to influence and the value (total amount of influence) obtained.

c. Plot the influence I(S) obtained by your function as a function of the budget K. E.g., what

happens when K = 1 and K = |V |?

d. In the definition of I(S) a node with largest degree is one of the most influential persons, i.e.,

he/she is a VIP. Can you think of a way to adapt the pagerank algorithm to identify VIP

nodes in the graph? Can you invent of a third additional way to define who are the VIPs in

this graph? What else could you use to measure influence?

e. A vertex-cover of a graph G is a set S of vertices of G such that each edge of G is incident with

at least one vertex of S. The vertex cover number τ (G) is the size of a minimum vertex cover

in G. A dominating set for a graph is a subset D of vertices such that every vertex not in D is

adjacent to at least one member of D. The domination number γ(G) is the smallest dominating

set. Are there any relationships among the influence function I(S), τ (G), and γ(G)? Explain.

Formulate a discrete models that given a graph find a vertex cover with smallest number of

vertices and a smallest dominating set. Explain the reasoning on your variables and constraints.

f. Show that if M is a matching and S is some vertex cover |S| ≥ |M|. In particular show that

max{|M| : M is a matching of G} ≤ min{|S| : S is a vertex-cover of G}.

HINT: How many vertices do you need to cover just the edges of M?

2. Clustering and pairing senators by their political ideas (12 points)

The zip-file https://www.math.ucdavis.edu/~deloera/TEACHING/MATH160/2-congress-datafiles.

zip contains two csv files with (real!) voting data. One file, I call F114, has information of how all

USA senators (in the 114th congress) voted in 15 different bills. 1 means voted YES, 0 means voted

NO, 0.5 means ABSTAINED. The other file, I call FYemen, contains votes for the 116 congress

on single issue (the legality of the US involvement on the ongoing war in Yemen). This will be used

later.

We always assume senators of the same party vote similarly, but let us analyze the votes using

mathematics and uncover some information!

Clustering is an operation in data science where we divide data up into a fixed number of clusters

while trying to ensure that the items in each cluster are as similar as possible (e.g., if we cluster by

political party we presumably get similar voting records, similar opinions). K-means is a clustering

algorithm that allows you to cluster into K clusters based on distances. It is implemented in

a. Use K-means on F114, with K = 2 to get a clustering in two groups. Obviously, do not use

the labels of the real affiliations I gave you! Choose one to be called democrats, the other to

be the republicans (do not look at the original file). Make your best mathematical guess of the

party affiliation.

You have obtained a classification of senators into two classes. Does your classification coincide

exactly with the real party affiliation? Or are there surprises? To measure this compute the

false-positive rate and false-negative rate of your classification:

false-positive rate := number of senators incorrectly labeled as democrats

number of democratic senators

false-negative rate := number of senators incorrectly labeled as republicans

number of republican senators

b. What if you cluster with K = 3? That would after all mean there are three factions in the

senate (moderates, conservatives, liberals?).

c. Next write MATLAB code to do clustering using the Eigenvalue Method I explained in class.

d. Clustering can also be model as an integer optimization problem. Write such a model and code

it in SCIP. Run it in the senator’s data.

3. More matching problems from social sciences (12 points).

E.g., how do people find a spouse? How do students get matched to medical schools? How do

workers get assigned to jobs? How do organ donors get match to patients in need? We saw in class

one can use matchings in graphs. Here we discuss this topic more and explore a new model for

agreement between pairs of people. We will use it for the senator’s data.

We say matching M is said to be maximal if there is no matching M0 with M ? M0

. A maximal

matching of the largest cardinality is a maximum matching. The size of a maximum matching is

denoted by OPTG. In the following we will explore several approaches to find a maximum matching

in a graph:

a. Connecting with the previous problem. Let M be a matching and let us denote by V (M) the

set of endpoints of edges in M:

V (M) := {u |e ∈ M, u ∈ e}

Show that when M is a maximal matching of G, V (M) is a vertex cover of G.vLet M be a

maximal matching of G, show that |V (M)| ≤ 2OPTG.

b. Next consider the greedy algorithm for finding a maximum matching: Start with an arbitrary

edge as the very first matching. Find another edge that does not have a vertex in common

with the current matching. If one exists add it to the current matching. Repeat until no more

edges can be added. What is the running time of the greedy algorithm on a graph with n

nodes and m edges?

Give an example where the greedy algorithm fails to find a maximum matching. But show

that the greedy method always yields a solution that has at least half as many edges as a true

maximum matching.

c. Propose an integer-optimization model, one you will implement in SCIP (i.e., now based on

linear equations, inequalities), to construct a maximum matching of any graph G. What if now

each edge i, j has a weight, i.e., each edge in a matching is not counted 1 time, but with wi,j

value. Describe algorithms used to find the maximum weight matching (we saw one in class).

d. Imagine now: Use the data of US senators to construct a graph. The nodes represent US

senators, edges appear with weights where they agree or disagree on 15 possible topics. Thus

weights are now integer numbers between -15 and 15. The weights are calculated by adding -1

or 1 when the person i and person j disagree or agree on a topic. For example if they agree in

only 4 of the topics the weight of arc is 7. Calculate the maximum weight matching on the

senator’s graph.

e. Suppose now you are planning for a debate tournament of USA senators. A debate is only

interesting if the pair of opponents disagree in most issues (negative weights of edges) What is

the largest number of debate matches you can set up? Do you think Hall’s marriage theorem

we saw in class relates to this

e. In the max-cut problem One wants to find a subset S of the nodes such that the number of

edges between S and the complementary subset is as large as possible. What is the max-cut of

the senator’s graph (no weights!)? Now, in the weighted Max-Cut each edge has a real number,

its weight, and the objective is to maximize the total weight of the edges between S and its

complement. What is the max cut now using the senator’s graph with weights?

f. BONUS 5 extra points: This is a harder open-ended challenge. Do NOT attempt this

problem unless you have finished the regular problems. Attempt at your own risk!!

Using both senator data files, in particular FYemen, to do the following: create a new graph

for the 116 congress. The graph keeps those senators that were members in both sessions of

congress and adds new names (e.g., Kamala Harris was not a senator in 114 congress) replacing

those (e.g., John McCain) who left the senate. So, the new graph has still the same number

of nodes and arcs, but we do not know the disagreement weights of some of the senators (e.g.

Kamala Harris vs Ted Cruz)!!

Your final task is to create a mathematical model to predict the most likely disagreement

weights (i.e., missing weights on new graph). Try to give good reasons for your method and

support it with mathematical arguments. Calculate the missing weights and use the data

available to you to make a convincing case you are right.

【上一篇】：到頭了
【下一篇】：沒有了