Atari AI.

Ես ուզում էի ստեղծել խելացի գործակալ, որը կարող է խաղալ տիկին Փաք-Մենի դասական խաղը: Ես որոշեցի օգտագործել Q-Learning ալգորիթմը գործակալին վերապատրաստելու համար, քանի որ այն հանրաճանաչ և հաստատված ամրապնդման ուսուցման տեխնիկա է:

Ես սկսեցի ստեղծելով ALE ինտերֆեյսը, որը թույլ կտա ինձ շփվել խաղի հետ: Ես միացրեցի ցուցադրման էկրանը և ձայնը, որպեսզի կարողանայի տեսնել և լսել խաղը, երբ խաղում էր գործակալը: Այնուհետև ես բեռնեցի Ms. Pac-Man ROM-ը և ստացա գործողությունների տարածքի չափը, որը կորոշեր իմ Q-աղյուսակի չափը:

Հաջորդը, ես փորձեցի բեռնել նախկինում պատրաստված Q-աղյուսակը, բայց քանի որ այն չկար, ես ստիպված էի սկսել նոր Q-աղյուսակից, որը լցված էր պատահական արժեքներով: Ես սահմանեցի ուսուցման տոկոսադրույքը (ալֆա), զեղչի գործակիցը (գամմա) և հետախուզման արագությունը (էպսիլոն) համապատասխանաբար 0,5, 0,9 և 9,0:

Ամեն ինչ տեղում, ես պատրաստ էի սկսել մարզել իմ գործակալին: Ես ցուցադրեցի տիկին Փաք-Մանի 10,000 դրվագ, որտեղ գործակալը գործողություններ էր կատարում և պարգևներ ստանում Q-աղյուսակի արժեքների հիման վրա: Յուրաքանչյուր գործողությունից հետո ես թարմացնում էի Q-աղյուսակը՝ հիմնվելով դիտարկված պարգևի և հաջորդ վիճակի արժեքների վրա: Ես նաև տպել էի յուրաքանչյուր դրվագի ընդհանուր պարգևը, որպեսզի կարողանամ հետևել գործակալի առաջընթացին:

Երբ գործակալը խաղում էր, նա սովորեց իր փորձից և աստիճանաբար բարելավեց իր կատարումը: Ամեն 500 դրվագը ես պահում էի Q-աղյուսակը, որպեսզի համոզվեմ, որ չեմ կորցնի գործակալի առաջընթացը, եթե ինչ-որ բան սխալ լինի:

Ի վերջո, 10,000 պարապմունքներից հետո իմ գործակալը բավականին հմուտ էր դարձել միսիս Փաք-Մենին խաղալու հարցում: Այն կարողացավ հեշտությամբ նավարկել լաբիրինթոսում, ուտել կետերը և հեշտությամբ խուսափել ուրվականներից: Ես հպարտ էի այն ամենով, ինչ արել էի և ոգևորված էի տեսնելով, թե ինչ այլ խաղեր կարող եմ պատրաստել իմ գործակալին հաջորդ խաղում:

ստորև ներկայացված է օգտագործված կոդը.

import pickle
import numpy as np
import gymnasium as gym
from ale_py import ALEInterface

# Create the ALE interface
ale = ALEInterface()

# Enable the display screen to show the game screen while the AI is playing.
ale.setBool('display_screen', True)

# Enable sound
ale.setBool('sound', True)

# Load the Ms. PacMan ROM
#ale.loadROM("C:\\Users\\jimbu\\Atari\\Ms. PacMan.a26") #Windows
ale.loadROM("/Users/beusse/Atari/Ms. PacMan.a26") #Mac

# Get the size of the action space
num_actions = len(ale.getMinimalActionSet())
valid_actions = set(ale.getMinimalActionSet())

# Define the Q-table with the size of the state space and action space
screen_height, screen_width = ale.getScreenDims()
q_table = np.random.rand(screen_height * screen_width, num_actions)

# load the q_table
try:
    with open('q_table_Ms.PacMan.pkl', 'rb') as f:
        q_table = pickle.load(f)
        print("Loading previously trained Q-table")
except:
    q_table = np.random.rand(screen_height * screen_width, num_actions)
    print("Q-table not found, starting with new Q-table")

# Define the learning rate, discount factor, and exploration rate (epsilon)
alpha = 0.5
gamma = 0.9
epsilon = 0.5

#A good starting point for the learning rate (alpha) is typically between 0.1 and 0.5. 
#A low learning rate means that the agent will update its Q-values slowly, which can
# make the learning process more stable but also slower. A high learning rate means that
# the agent will update its Q-values quickly, which can make the learning process faster
# but also more unstable.

#A good starting point for the discount factor (gamma) is typically between 0.5 and 0.9.
# A low discount factor means that the agent will prioritize short-term rewards over
# long-term rewards, while a high discount factor means that the agent will prioritize
# long-term rewards over short-term rewards.

#A good starting point for the exploration rate (epsilon) is typically between 0.1 and 0.5.
#A low exploration rate means that the agent will mostly follow the Q-table, while a high
#exploration rate means that the agent will explore the state space more.

# Define the number of episodes to train the AI
num_episodes = 10000

# Train the AI
for episode in range(num_episodes):
    # Reset the environment
    ale.reset_game()

    # Set the initial reward to zero
    total_reward = 0

    # Run the episode
    while not ale.game_over():
        
        # Get the current state
        state = ale.getScreenRGB()
        state = state.flatten() 

        # Choose an action according to the Q-table and an exploration strategy
        if np.random.rand() < epsilon:
            action = np.random.randint(num_actions)
        else:
            action = np.argmax(q_table[state])
            
        # Take the action and observe the next state and reward
        reward = ale.act(action)

        # Get the next state
        next_state = ale.getScreenRGB()
        next_state = next_state.flatten()

        # Update the Q-value for the current state and action
        q_table[state, action] = (1 - alpha) * q_table[state, action] + alpha * (reward + gamma * np.max(q_table[next_state]))

        # Update the total reward
        total_reward += reward

        # Update the current state
        state = next_state

        # Check if the episode is over
        if ale.game_over():
           break

        # Save the Q-table every 500 episodes
        if episode % 500 == 0:
            with open('q_table_Ms.PacMan', 'wb') as f:
                pickle.dump(q_table, f)

    # Print the total reward for the episode
    print("Episode: {}, Total reward: {}".format(episode, total_reward))

թեմայի վերաբերյալ նյութեր:

Նոր նյութեր

Օգտագործելով Fetch Vs Axios.Js-ը՝ HTTP հարցումներ կատարելու համար

JavaScript-ը կարող է ցանցային հարցումներ ուղարկել սերվեր և բեռնել նոր տեղեկատվություն, երբ դա անհրաժեշտ լինի: Օրինակ, մենք կարող ենք օգտագործել ցանցային հարցումը պատվեր ներկայացնելու,..

Տիրապետել հանգստության արվեստին. մշակողի ուղեցույց՝ ճնշման տակ ծաղկելու համար

Տիրապետել հանգստության արվեստին. մշակողի ուղեցույց՝ ճնշման տակ ծաղկելու համար Ինչպե՞ս հանգստացնել ձեր միտքը և աշխատեցնել ձեր պրոցեսորը: Ինչպես մնալ հանգիստ և զարգանալ ճնշման տակ...

Մեքենայի ուսուցում բանկային և ֆինանսների ոլորտում

Բարդ, խելացի անվտանգության համակարգերը և հաճախորդների սպասարկման պարզեցված ծառայությունները բիզնեսի հաջողության բանալին են: Ֆինանսական հաստատությունները, մասնավորապես, պետք է առաջ մնան կորի..

Ես AI-ին հարցրի կյանքի իմաստը, այն ինչ ասում էր, ցնցող էր:

Այն պահից ի վեր, երբ ես իմացա Արհեստական ինտելեկտի մասին, ես հիացած էի այն բանով, թե ինչպես է այն կարողանում հասկանալ մարդկային նորմալ տեքստը, և այն կարող է առաջացնել իր սեփական արձագանքը դրա..

Ինչպես սովորել կոդավորումը Python-ում վագրի պես:

Սովորելու համար ծրագրավորման նոր լեզու ընտրելը բարդ է: Անկախ նրանից, թե դուք սկսնակ եք, թե առաջադեմ, դա օգնում է իմանալ, թե ինչ թեմաներ պետք է սովորել: Ծրագրավորման լեզվի հիմունքները, դրա..

C++-ի օրական բիթ(ե) | Ամենաերկար պալինդրոմային ենթաշարը

C++ #198-ի ամենօրյա բիթ(ե), Ընդհանուր հարցազրույցի խնդիր. Ամենաերկար պալինդրոմային ենթատող: Այսօր մենք կանդրադառնանք հարցազրույցի ընդհանուր խնդրին. Ամենաերկար palindromic substring...

Kydavra ICAReducer՝ ձեր տվյալների ծավալայինությունը նվազեցնելու համար

Ի՞նչ է ICAReducer-ը: ICAReducer-ն աշխատում է հետևյալ կերպ. այն նվազեցնում է նրանց միջև բարձր փոխկապակցված հատկանիշները մինչև մեկ սյունակ: Բավականին նման է PCAreducer-ին, չնայած այն..

Պիտակներ

Machine Learning JavaScript Data Science Artificial Intelligence Python Software Development Web Development Coding Deep Learning React AI Software Engineering Nodejs Typescript Java Javascript Tips Tech Algorithms Front End Development Programming Languages iOS Data Business NLP Development Reactjs Tutorial CSS Learning Swift Angular API Javascript Development Startup Android Computer Science Learning To Code Reinforcement Learning