Difference Between Weights and Biases: Another way of Looking at Forward Propagation

{
  localUrl: '../page/8r4.html',
  arbitalUrl: 'https://arbital.com/p/8r4',
  rawJsonUrl: '../raw/8r4.json',
  likeableId: '4080',
  likeableType: 'page',
  myLikeValue: '0',
  likeCount: '1',
  dislikeCount: '0',
  likeScore: '1',
  individualLikes: [
    'AltoClef'
  ],
  pageId: '8r4',
  edit: '2',
  editSummary: '',
  prevEdit: '1',
  currentEdit: '2',
  wasPublished: 'true',
  type: 'wiki',
  title: 'Difference Between Weights and Biases: Another way of Looking at Forward Propagation',
  clickbait: 'My understanding on Forward Propagation',
  textLength: '3823',
  alias: '8r4',
  externalUrl: '',
  sortChildrenBy: 'likes',
  hasVote: 'false',
  voteType: '',
  votesAnonymous: 'false',
  editCreatorId: 'AltoClef',
  editCreatedAt: '2017-10-15 10:37:37',
  pageCreatorId: 'AltoClef',
  pageCreatedAt: '2017-10-15 09:15:12',
  seeDomainId: '0',
  editDomainId: '2835',
  submitToDomainId: '0',
  isAutosave: 'false',
  isSnapshot: 'false',
  isLiveEdit: 'true',
  isMinorEdit: 'false',
  indirectTeacher: 'false',
  todoCount: '2',
  isEditorComment: 'false',
  isApprovedComment: 'false',
  isResolved: 'false',
  snapshotText: '',
  anchorContext: '',
  anchorText: '',
  anchorOffset: '0',
  mergedInto: '',
  isDeleted: 'false',
  viewCount: '17',
  text: '## What are Weights and Biases\n\nConsider the following forward propagation algorithm:\n$$\n\\vec{y_{n}}=\\mathbf{W_n}^T \\times  \\vec{y_{n-1}} + \\vec{b_n}\n$$\nwhere $n$ is the number of the layers, $\\vec{y_n}$ is the output of the $n^{th}$ layer, expressed as a $l_n \\times 1$ ($l_n$ is the number of neurons of the $n^th$ layer) vector. $\\mathbf{W_n}$ is a $l_{n-1} \\times l_{n}$ matrix storing all the weights of every connection between layer $n$ and $n-1$, thus needing to be transposed for the sake of the product. $\\vec{b_n}$, again, is the biases of the connections between the $n^th$ and $(n-1)^th$ layers, in the shape of $l_n\\times1$.\n\nAs one can see, both weights and biases are just changeable and derivable(thus trainable) factors that contributes to the final results.\n\n## Why do we need both of them, and why are Biases Optional?\n\nNeural network, indeed a better version of the perceptron model, where the output of each neuron(perceptron) owns a linear correlation with the output, rather than simply outputting plain 0/1. (This relation is further more projected to the activation function to make it non-linear, which will be discussed later) \n\nTo create a linear correlation, the easiest way is to scale the input with a certain coefficient $w$, output the scaled input. \n$$\nf(x)=w\\times x\n$$\n\nThis model works alright, even with one neuron it could perfectly fit a linear function like $f(x)=m\\times x$, and certain non-linear relations could be fit with neurons work in layers. \n\nHowever, this new neuron without biases, lack of a significant ability even comparing to perceptron: it always fires regardless the input thus failing to fit functions like $y=mx+b$. It's impossible to disable the output of a specific neuron on certain threshold value of the input. Even that adding more layers and neurons a lot eases and hides this issue, neural networks without biases are likely to perform a worse job than those with biases.(Consider the total layers/neurons are the same)\n\nIn conclusion, the biases are supplements to the weights to help a network better fit the pattern, which are not necessary but helps the network to perform better. \n\n## Another way of writing the Forward Propagation\n\nInterestingly, the forward propagation algorithm \n$$\n\\vec{y_{n}}=\\mathbf{W_n}^T \\times  \\vec{y_{n-1}} + 1 \\times \\vec{b_n}\n$$\ncould also be written like this:\n$$\n\\vec{y_{n}}=\n\\left[ \\begin{array}{c}\n                x, \\\\ 1\n\\end{array} \\right]^T\n\\cdot\n\\left[ \\begin{array}{c}\n                \\mathbf{W_n},\n                \\\\ \\vec{b_n}\n\\end{array} \\right]\n$$,which is\n$$\n\\vec{y_{n}} = \\vec{y_{new_{n-1}}}^T \\times \\vec{W_{new}} \n$$.\nThis is a way of rewriting the equation makes the adjustment by gradient really easy to write.\n\n## How to update them?\n\nIt's super easy after the rewrite:\n$$\n\\vec{W_{new}} =\\vec{W_{new}}-\\frac{\\delta W_{new}}{\\delta Error}\n$$.\n\n## The Activation Function\n\nThere is one more compoment yet to be mentioned--the Activation Function. It's basically a function takes the output of a neuron as an input and output whatever value defined as the final output of the neuron.\n$$\n\\vec{W_{new}} =Activation(\\vec{W_{new}}-\\frac{\\delta W_{new}}{\\delta Error})\n$$\nThere are copious types of them around, but all of them have at least one shared property that there are all *Non-linear*! \n\nThat's basically what they are designed for. Activation Functions project output to a non-linear function, thus introducing non-linearity into the model. \n\nConsider non-linear-seperatable problems like the the XOR problem, giving the network the ability to draw non-linear sperators may help the classification.\n\nAlso, there's another purpose of the activation function, which is to project a huge input, into the space between -1 and 1, thus making the followed-up calculations easier and faster. \n\n2017/10/15',
  metaText: '',
  isTextLoaded: 'true',
  isSubscribedToDiscussion: 'false',
  isSubscribedToUser: 'false',
  isSubscribedAsMaintainer: 'false',
  discussionSubscriberCount: '1',
  maintainerCount: '1',
  userSubscriberCount: '0',
  lastVisit: '',
  hasDraft: 'false',
  votes: [],
  voteSummary: 'null',
  muVoteSummary: '0',
  voteScaling: '0',
  currentUserVote: '-2',
  voteCount: '0',
  lockedVoteType: '',
  maxEditEver: '0',
  redLinkCount: '0',
  lockedBy: '',
  lockedUntil: '',
  nextPageId: '',
  prevPageId: '',
  usedAsMastery: 'false',
  proposalEditNum: '0',
  permissions: {
    edit: {
      has: 'false',
      reason: 'You don't have domain permission to edit this page'
    },
    proposeEdit: {
      has: 'true',
      reason: ''
    },
    delete: {
      has: 'false',
      reason: 'You don't have domain permission to delete this page'
    },
    comment: {
      has: 'false',
      reason: 'You can't comment in this domain because you are not a member'
    },
    proposeComment: {
      has: 'true',
      reason: ''
    }
  },
  summaries: {},
  creatorIds: [
    'AltoClef'
  ],
  childIds: [],
  parentIds: [],
  commentIds: [],
  questionIds: [],
  tagIds: [],
  relatedIds: [],
  markIds: [],
  explanations: [],
  learnMore: [],
  requirements: [],
  subjects: [],
  lenses: [],
  lensParentId: '',
  pathPages: [],
  learnMoreTaughtMap: {},
  learnMoreCoveredMap: {},
  learnMoreRequiredMap: {},
  editHistory: {},
  domainSubmissions: {},
  answers: [],
  answerCount: '0',
  commentCount: '0',
  newCommentCount: '0',
  linkedMarkCount: '0',
  changeLogs: [
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '22834',
      pageId: '8r4',
      userId: 'AltoClef',
      edit: '2',
      type: 'newEdit',
      createdAt: '2017-10-15 10:37:37',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '22833',
      pageId: '8r4',
      userId: 'AltoClef',
      edit: '1',
      type: 'newEdit',
      createdAt: '2017-10-15 09:15:12',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    }
  ],
  feedSubmissions: [],
  searchStrings: {},
  hasChildren: 'false',
  hasParents: 'false',
  redAliases: {},
  improvementTagIds: [],
  nonMetaTagIds: [],
  todos: [],
  slowDownMap: 'null',
  speedUpMap: 'null',
  arcPageIds: 'null',
  contentRequests: {}
}